The Return of the Scala Rule Tutorial: The Execution

This builds on the first part of the tutorial. In this post, we will make the the rule actually produce an executable.

Capturing the output from scalac

At the end of the tutorial last time, we were calling scalac, but ignoring the result:

(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'external/scala/bin/scalac HelloWorld.scala; echo '''blah''' > bazel-out/local_darwin-fastbuild/bin/hello-world.sh')

If you look at the directory where the action is running (/private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg in my case) you can see that HelloWorld.class and HelloWorld$.class is created. This directory is called the execution root, it is where bazel executes build actions. Bazel uses separate directory trees for source code, executing build actions, and output files (bazel-out/). Files won’t get moved from the execution root to the output tree unless we tell Bazel we want them.

We want our compiled scala program to end up in bazel-out/, but there’s a small complication. With languages like Java (and Scala), a single source file might contain inner classes that cause multiple .class files to be generated by a single compile action. Bazel cannot know until it runs the action how many class files are going to be generated. However, Bazel requires that each action declare, in advance, what its outputs will be. The way to get around this is to package up the .class files and make the resulting archive the build output.

In this example, we’ll add the .class files into a .jar. Let’s add that to the outputs, which should now look like this:

  outputs = {
    'jar': "%{name}.jar",
    'sh': "%{name}.sh",
  },

In the impl function, our command is getting a bit complicated so I’m going to change it to an array of commands and then join them on “n” in the action:

def impl(ctx):
    cmd = [
        "%s %s" % (ctx.file._scalac.path, ctx.file.src.path),
        "find . -name '*.class' -print > classes.list",
        "jar cf %s @classes.list" % (ctx.outputs.jar.path),
    ]

    ctx.action(
        inputs = [ctx.file.src],
	command = "n".join(cmd),
        outputs = [ctx.outputs.jar]
    )

This will compile the src, find all of the .class files, and add them to the output jar. If we run this, we get:

$ bazel build -s :hello-world
INFO: Found 1 target...
>>>>> # //:hello-world [action 'Unknown hello-world.jar']
(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'external/scala/bin/scalac HelloWorld.scala
find . -name '''*.class''' -print > classes.list
jar cf bazel-out/local_darwin-fastbuild/bin/hello-world.jar @classes.list')
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
INFO: Elapsed time: 4.774s, Critical Path: 4.06s

Let’s take a look at what hello-world.jar contains:

$ jar tf bazel-bin/hello-world.jar
META-INF/
META-INF/MANIFEST.MF
HelloWorld$.class
HelloWorld.class

Looks good! However, we cannot actually run this jar, because java doesn’t know what the main class should be:

$ java -jar bazel-bin/hello-world.jar 
no main manifest attribute, in bazel-bin/hello-world.jar

Similar to the java_binary rule, let’s add a main_class attribute to scala_binary and put it in the jar’s manifest. Add 'main_class' : attr.string(), to scala_binary‘s attrs and change cmd to the following:

    cmd = [
        "%s %s" % (ctx.file._scalac.path, ctx.file.src.path),
        "echo Manifest-Version: 1.0 > MANIFEST.MF",
        "echo Main-Class: %s >> MANIFEST.MF" % ctx.attr.main_class,
        "find . -name '*.class' -print > classes.list",
	"jar cfm %s MANIFEST.MF @classes.list" % (ctx.outputs.jar.path),
    ]

Remember to update your actual BUILD file to add a main_class attribute:

# BUILD
load("/scala", "scala_binary")

scala_binary(
    name = "hello-world",
    src = "HelloWorld.scala",
    main_class = "HelloWorld",
)

Now building and running gives you:

$ bazel build :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
INFO: Elapsed time: 4.663s, Critical Path: 4.05s
$ java -jar bazel-bin/hello-world.jar 
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Predef$
	at HelloWorld$.main(HelloWorld.scala:4)
	at HelloWorld.main(HelloWorld.scala)
Caused by: java.lang.ClassNotFoundException: scala.Predef$
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 2 more

Closer! Now it cannot find some scala libraries it needs. You can add it manually on the command line to see that our jar does actually does work if we specify the scala library jar, too:

$ java -cp $(bazel info output_base)/external/scala/lib/scala-library.jar:bazel-bin/hello-world.jar HelloWorld
Hello, world!

So we need our rule to generate an executable that basically runs this command, which can be accomplished by adding another action to our build. First we’ll add a dependency on scala-library.jar by adding it as a hidden attribute:

        '_scala_lib': attr.label(
            default=Label("@scala//:lib/scala-library.jar"),
            allow_files=True,
            single_file=True),

Making scala_binarys executable

Let’s pause here for a moment and switch gears: we’re going to tell bazel that scala_binarys are binaries. To do this, we add executable = True to the attrs and get rid of the reference to hello-world.sh in the outputs:

...
    outputs = {
        'jar': "%{name}.jar",
    },
    implementation = impl,
    executable = True,
)

This says that scala_binary(name = "foo", ...) should have an action that creates a binary called foo, which can be referenced via ctx.outputs.executable in the implementation function. We can now use bazel run :hello-world (instead of bazel build :hello-world; ./bazel-bin/hello-world.sh).

The executable we want to create is the java command from above, so we add the second action to impl, this one a file action (since we’re just generating a file with certain content, not executing a series of commands to generate a .jar):

    cp = "%s:%s" % (ctx.outputs.jar.basename, ctx.file._scala_lib.path)
    content = [
	"#!/bin/bash",
        "echo Running from $PWD",
	"java -cp %s %s" % (cp, ctx.attr.main_class),
    ]
    ctx.file_action(
	content = "n".join(content),
	output = ctx.outputs.executable,
    )

Note that I also added a line to the file to echo where it is being run from. If we now use bazel run, you’ll see:

$ bazel run :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
  bazel-bin/hello-world
INFO: Elapsed time: 2.694s, Critical Path: 0.08s

INFO: Running command line: bazel-bin/hello-world
Running from /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world.runfiles
Error: Could not find or load main class HelloWorld
ERROR: Non-zero return code '1' from command: Process exited with status 1.

Whoops, it’s not able to find the jars! And what is that path, hello-world.runfiles, it’s running the binary from?

The runfiles directory

bazel run runs the binary from the runfiles directory, a directory that is different than the source root, execution root, and output tree mentioned above. The runfiles directory should contain all of the resources needed by the executable during execution. Note that this is not the execution root, which is used during the bazel build step. When you actually execute something created by bazel, its resources need to be in the runfiles directory.

In this case, our executable needs to access hello-world.jar and scala-library.jar. To add these files, the API is somewhat strange. You must return a struct containing a runfiles object from the rule implementation. Thus, add the following as the last line of your impl function:

return struct(runfiles = ctx.runfiles(files = [ctx.outputs.jar, ctx.file._scala_lib]))

Now if you run it again, it’ll print:

$ bazel run :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
  bazel-bin/hello-world
INFO: Elapsed time: 0.416s, Critical Path: 0.00s

INFO: Running command line: bazel-bin/hello-world
Running from /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world.runfiles
Hello, world!

Hooray!

However! If we run it as bazel-bin/hello-world, it won’t be able to find the jars (because we’re not in the runfiles directory). To find the runfiles directory regardless of where the binary is run from, change your content variable to the following:

    content = [
        "#!/bin/bash",
        "case "$0" in",
        "/*) self="$0" ;;",
        "*)  self="$PWD/$0";;",
        "esac",
        "(cd $self.runfiles; java -cp %s %s)" % (cp, ctx.attr.main_class),
    ]

This way, if it’s run from bazel run, $0 will be the absolute path to the binary (in my case, /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world). If it’s run via bazel-bin/hello-world, $0 will be just that: bazel-bin/hello-world. Either way, we’ll end up in the runfiles directory before executing the command.

Now our rule is successfully generating a binary. You can see the full code for this example on GitHub.

In the final part of this tutorial, we’ll fix the remaining issues:

  • No support for multiple source files, never mind dependencies.
  • [action 'Unknown hello-world.jar'] is pretty ugly.

Until next time!

2 thoughts on “The Return of the Scala Rule Tutorial: The Execution

  1. This has been great – the need for writing or understanding other developers’ custom rules while still trying to grok the Bazel Way is a big drag. I have been following along using Bazel 0.6.1 on OSX. I figured out how to get part 1 almost working using the cfg=”host”. My current issue is the build cannot find the scalac compiler. I know this by looking at the output from the build:

    cat /private/var/tmp/_bazel_johnferguson/0516c5fa4e8865dd38d08261954254c9/execroot/__main__/bazel-out/_tmp/action_outs/stderr-1

    Which shows me:

    /bin/bash: external/scala/bin/scalac: No such file or directory

    Which makes sense because when ls at external/scala I do not see bin

    BUILD.bazel WORKSPACE scala-2.11.7 scala-2.11.7.tgz

    So in the WORKSPACE I tried strip_prefix=”scala-2.11.7″ but the stderr output for the build still says:

    /bin/bash: external/scala/bin/scalac: No such file or directory

    But if I do this from my project directory:

    /private/var/tmp/_bazel_johnferguson/0516c5fa4e8865dd38d08261954254c9/execroot/__main__/external/scala/bin/scalac

    I see scalac help listing. So it is there…. any suggestions on how to debug this part and get it compiling?

    Like

    1. Glad this is helpful, but I’ve left Google now and I am no longer working on Bazel. I’m not sure what’s going wrong, but if you’d like some help, I recommend asking on StackOverflow or the Bazel mailing list.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: