The Return of the Scala Rule Tutorial: The Execution

This builds on the first part of the tutorial. In this post, we will make the the rule actually produce an executable.

Capturing the output from scalac

At the end of the tutorial last time, we were calling scalac, but ignoring the result:

(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'external/scala/bin/scalac HelloWorld.scala; echo '''blah''' > bazel-out/local_darwin-fastbuild/bin/hello-world.sh')

If you look at the directory where the action is running (/private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg in my case) you can see that HelloWorld.class and HelloWorld$.class is created. This directory is called the execution root, it is where bazel executes build actions. Bazel uses separate directory trees for source code, executing build actions, and output files (bazel-out/). Files won’t get moved from the execution root to the output tree unless we tell Bazel we want them.

We want our compiled scala program to end up in bazel-out/, but there’s a small complication. With languages like Java (and Scala), a single source file might contain inner classes that cause multiple .class files to be generated by a single compile action. Bazel cannot know until it runs the action how many class files are going to be generated. However, Bazel requires that each action declare, in advance, what its outputs will be. The way to get around this is to package up the .class files and make the resulting archive the build output.

In this example, we’ll add the .class files into a .jar. Let’s add that to the outputs, which should now look like this:

  outputs = {
    'jar': "%{name}.jar",
    'sh': "%{name}.sh",
  },

In the impl function, our command is getting a bit complicated so I’m going to change it to an array of commands and then join them on “n” in the action:

def impl(ctx):
    cmd = [
        "%s %s" % (ctx.file._scalac.path, ctx.file.src.path),
        "find . -name '*.class' -print > classes.list",
        "jar cf %s @classes.list" % (ctx.outputs.jar.path),
    ]

    ctx.action(
        inputs = [ctx.file.src],
	command = "n".join(cmd),
        outputs = [ctx.outputs.jar]
    )

This will compile the src, find all of the .class files, and add them to the output jar. If we run this, we get:

$ bazel build -s :hello-world
INFO: Found 1 target...
>>>>> # //:hello-world [action 'Unknown hello-world.jar']
(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'external/scala/bin/scalac HelloWorld.scala
find . -name '''*.class''' -print > classes.list
jar cf bazel-out/local_darwin-fastbuild/bin/hello-world.jar @classes.list')
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
INFO: Elapsed time: 4.774s, Critical Path: 4.06s

Let’s take a look at what hello-world.jar contains:

$ jar tf bazel-bin/hello-world.jar
META-INF/
META-INF/MANIFEST.MF
HelloWorld$.class
HelloWorld.class

Looks good! However, we cannot actually run this jar, because java doesn’t know what the main class should be:

$ java -jar bazel-bin/hello-world.jar 
no main manifest attribute, in bazel-bin/hello-world.jar

Similar to the java_binary rule, let’s add a main_class attribute to scala_binary and put it in the jar’s manifest. Add 'main_class' : attr.string(), to scala_binary‘s attrs and change cmd to the following:

    cmd = [
        "%s %s" % (ctx.file._scalac.path, ctx.file.src.path),
        "echo Manifest-Version: 1.0 > MANIFEST.MF",
        "echo Main-Class: %s >> MANIFEST.MF" % ctx.attr.main_class,
        "find . -name '*.class' -print > classes.list",
	"jar cfm %s MANIFEST.MF @classes.list" % (ctx.outputs.jar.path),
    ]

Remember to update your actual BUILD file to add a main_class attribute:

# BUILD
load("/scala", "scala_binary")

scala_binary(
    name = "hello-world",
    src = "HelloWorld.scala",
    main_class = "HelloWorld",
)

Now building and running gives you:

$ bazel build :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
INFO: Elapsed time: 4.663s, Critical Path: 4.05s
$ java -jar bazel-bin/hello-world.jar 
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Predef$
	at HelloWorld$.main(HelloWorld.scala:4)
	at HelloWorld.main(HelloWorld.scala)
Caused by: java.lang.ClassNotFoundException: scala.Predef$
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 2 more

Closer! Now it cannot find some scala libraries it needs. You can add it manually on the command line to see that our jar does actually does work if we specify the scala library jar, too:

$ java -cp $(bazel info output_base)/external/scala/lib/scala-library.jar:bazel-bin/hello-world.jar HelloWorld
Hello, world!

So we need our rule to generate an executable that basically runs this command, which can be accomplished by adding another action to our build. First we’ll add a dependency on scala-library.jar by adding it as a hidden attribute:

        '_scala_lib': attr.label(
            default=Label("@scala//:lib/scala-library.jar"),
            allow_files=True,
            single_file=True),

Making scala_binarys executable

Let’s pause here for a moment and switch gears: we’re going to tell bazel that scala_binarys are binaries. To do this, we add executable = True to the attrs and get rid of the reference to hello-world.sh in the outputs:

...
    outputs = {
        'jar': "%{name}.jar",
    },
    implementation = impl,
    executable = True,
)

This says that scala_binary(name = "foo", ...) should have an action that creates a binary called foo, which can be referenced via ctx.outputs.executable in the implementation function. We can now use bazel run :hello-world (instead of bazel build :hello-world; ./bazel-bin/hello-world.sh).

The executable we want to create is the java command from above, so we add the second action to impl, this one a file action (since we’re just generating a file with certain content, not executing a series of commands to generate a .jar):

    cp = "%s:%s" % (ctx.outputs.jar.basename, ctx.file._scala_lib.path)
    content = [
	"#!/bin/bash",
        "echo Running from $PWD",
	"java -cp %s %s" % (cp, ctx.attr.main_class),
    ]
    ctx.file_action(
	content = "n".join(content),
	output = ctx.outputs.executable,
    )

Note that I also added a line to the file to echo where it is being run from. If we now use bazel run, you’ll see:

$ bazel run :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
  bazel-bin/hello-world
INFO: Elapsed time: 2.694s, Critical Path: 0.08s

INFO: Running command line: bazel-bin/hello-world
Running from /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world.runfiles
Error: Could not find or load main class HelloWorld
ERROR: Non-zero return code '1' from command: Process exited with status 1.

Whoops, it’s not able to find the jars! And what is that path, hello-world.runfiles, it’s running the binary from?

The runfiles directory

bazel run runs the binary from the runfiles directory, a directory that is different than the source root, execution root, and output tree mentioned above. The runfiles directory should contain all of the resources needed by the executable during execution. Note that this is not the execution root, which is used during the bazel build step. When you actually execute something created by bazel, its resources need to be in the runfiles directory.

In this case, our executable needs to access hello-world.jar and scala-library.jar. To add these files, the API is somewhat strange. You must return a struct containing a runfiles object from the rule implementation. Thus, add the following as the last line of your impl function:

return struct(runfiles = ctx.runfiles(files = [ctx.outputs.jar, ctx.file._scala_lib]))

Now if you run it again, it’ll print:

$ bazel run :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
  bazel-bin/hello-world
INFO: Elapsed time: 0.416s, Critical Path: 0.00s

INFO: Running command line: bazel-bin/hello-world
Running from /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world.runfiles
Hello, world!

Hooray!

However! If we run it as bazel-bin/hello-world, it won’t be able to find the jars (because we’re not in the runfiles directory). To find the runfiles directory regardless of where the binary is run from, change your content variable to the following:

    content = [
        "#!/bin/bash",
        "case "$0" in",
        "/*) self="$0" ;;",
        "*)  self="$PWD/$0";;",
        "esac",
        "(cd $self.runfiles; java -cp %s %s)" % (cp, ctx.attr.main_class),
    ]

This way, if it’s run from bazel run, $0 will be the absolute path to the binary (in my case, /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world). If it’s run via bazel-bin/hello-world, $0 will be just that: bazel-bin/hello-world. Either way, we’ll end up in the runfiles directory before executing the command.

Now our rule is successfully generating a binary. You can see the full code for this example on GitHub.

In the final part of this tutorial, we’ll fix the remaining issues:

  • No support for multiple source files, never mind dependencies.
  • [action 'Unknown hello-world.jar'] is pretty ugly.

Until next time!

Tutorial: how to write Scala rules for Bazel

Bazel comes with built-in support for several languages and allows you to write your own support for any other languages in Python.

Although you could probably get more abstract, let’s define a rule as something that takes some files, does something to them, and then gives you some output files. Specifically, for this example, we want a scala_binary rule where we can give it a Scala source file and it turns it into an executable binary.

Part 1: Creating a Scala source file

Let’s create a simple example of a Scala source file (mercilessly ripped from the Scala hello world example):

// HelloWorld.scala
object HelloWorld {
  def main(args: Array[String]) {
    println("Hello, world!")
  }
}

Note that I’ve never used Scala before today, so please let me know in the comments if I’ve made an mistakes.

Before proceeding, I think it’s a good idea to try building this without bazel (especially if you’re not too familiar with the language’s build tool… ahem) as a sanity check:

$ scalac HelloWorld.scala
$ scala HelloWorld
Hello, world!

Looking good! Now, let’s try to get bazel building that.

Adding a BUILD file and dummy scala_binary rule

We’ll create a BUILD file that references our (currently non-existent) scala_binary rule. This lets us plan out what we’ll need our rule to look like:

# BUILD
load('/scala', 'scala_binary')

scala_binary(
    name = "hello-world",
    src = "HelloWorld.scala",
)

The load() statement means that we’ll declare the scala_binary rule in a file called scala.bzl in the root of the workspace (due to the ‘/’ prefix on ‘/scala’). Let’s create that file now:

# scala.bzl
def impl(ctx):
    pass

scala_binary = rule(
    attrs = {
        'src': attr.label(
            allow_files=True,
            single_file=True),
    },
    outputs = {'sh': "%{name}.sh"},
    implementation = impl,
)

scala_binary‘s definition says that rules can have one attribute (other than name), src, which is a single file. The rule is supposed to output a file called name.sh, so for our example we should end up with hello-world.sh. The implementation of our rule should be happening in the function impl. The rule implementation doesn’t do anything yet, but we can at least try building now:

$ touch WORKSPACE # if you haven't already...
$ bazel build :hello-world
ERROR: /Users/kchodorow/blerg/BUILD:4:1: in scala_binary rule //:hello-world: 
: The following files have no generating action:
hello-world.sh
.
ERROR: Analysis of target '//:hello-world' failed; build aborted.
INFO: Elapsed time: 0.286s

The error is expected: our rule definition says that hello-world.sh should be an output, but there’s no code creating it yet. Let’s add some functionality to the implementation function by replacing existing function with the following:

def impl(ctx):
    ctx.action(
        inputs = [ctx.file.src],
        command = "echo %s > %s" % (ctx.file.src.path, ctx.outputs.sh.path),
        outputs = [ctx.outputs.sh]
    )

This adds an action to the build. It says that, if the inputs have changed (the src file), run the command (which right now is just echoing src‘s path) to the output file. Note that ctx.action(...) doesn’t actually run the action, it just adds that action to “things that need to be run in the future” for the rule.

Now if we build :hello-world again, we get:

$ bazel build -s :hello-world
INFO: Found 1 target...
>>>>> # //:hello-world [action 'Unknown hello-world.sh']
(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'echo HelloWorld.scala > bazel-out/local_darwin-fastbuild/bin/hello-world.sh')
Target //:hello-world up-to-date:
  bazel-bin/hello-world.sh
INFO: Elapsed time: 0.605s, Critical Path: 0.02s

I used bazel’s -s option here, which is very helpful for debugging what your rule is doing. It prints all of the subcommands a build is running. As you can see, now our rule has an action (>>>>> # //:hello-world [action 'Unknown hello-world.sh']) that creates bazel-bin/hello-world.sh by echoing the source file name. You can verify this by cating bazel-bin/hello-world.sh.

Adding a dependency on the scala compiler

We want our rule to actually call scalac. Even if you have the scala compiler installed on your system, you cannot simply create an action with a command = 'scalac MySourceFile.scala' line, as actions are run in a “clean room” environment: nothing* is there that you don’t specify.

As you probably don’t want to add the scala compiler to your workspace, open up your WORKSPACE file and add it as an external dependency:

# WORKSPACE
new_http_archive(
    name = "scala",
    url = "http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz",
    sha256 = "ffe4196f13ee98a66cf54baffb0940d29432b2bd820bd0781a8316eec22926d0",
    build_file = "scala.BUILD",
)

Also create the scala.BUILD file in the root of your workspace:

# scala.BUILD
exports_files([
    "bin/scala",
    "bin/scalac",
    "lib/scala-library.jar"
])

Now add a dependency on scalac to your scala_binary rule by adding a “hidden attribute.” Add calling scalac in your impl, so your scala.bzl file looks something like this:

def impl(ctx):
    ctx.action(
        inputs = [ctx.file.src],
        command = "%s %s; echo 'blah' > %s" % (
            ctx.file._scalac.path, ctx.file.src.path, ctx.outputs.sh.path),
        outputs = [ctx.outputs.sh]
    )

scala_binary = rule(
    attrs = {
        'src': attr.label(
            allow_files=True,
            single_file=True),
        '_scalac': attr.label(
            default=Label("@scala//:bin/scalac"),
            executable=True,
            allow_files=True,
            single_file=True),
    },
    outputs = {'sh': "%{name}.sh"},
    implementation = impl,
)

Building now shows that scalac is successfully being run on our source file!

$ bazel build -s :hello-world
INFO: Found 1 target...
>>>>> # //:hello-world [action 'Unknown hello-world.sh']
(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'external/scala/bin/scalac HelloWorld.scala; echo '''blah''' > bazel-out/local_darwin-fastbuild/bin/hello-world.sh')
Target //:hello-world up-to-date:
  bazel-bin/hello-world.sh
INFO: Elapsed time: 4.634s, Critical Path: 4.11s

There are still many issues with this implementation:

  • The output from calling scalac doesn’t actually go anywhere, hello-world.sh is still a dummy file.
  • No support for multiple source files, never mind dependencies.
  • [action 'Unknown hello-world.sh'] is pretty ugly.
  • You can’t call bazel run //hello-world, even though the output should be executable.

However, this post is already running long, so let’s wrap it up here and get to some of these issue in the next post. Until next time!

References

* Obviously there are some commands there (our original rule uses echo, for instance) and you can see what’s in the empty environment by writing env to an output file in an action. This can actually cause issues: sometimes commands in the default PATH have different behavior on different systems. To get a completely hermetic build, you should really provide every command your rule uses. However, we’re just using echo for debugging here anyway, so we’ll let it slide.