Build, y u go slow?

yuno

When a build is taking too long, it can be very helpful to know what it’s doing. Bazel has built-in tooling that lets you visualize what each thread is doing at any given moment of a build and which build steps are slowing down your overall build.

To try out Bazel’s profiling tools, build your favorite (or, rather, least-favorite) target with the --profile option:

$ bazel build --profile=myprofile.out //snail:slow-lib

This will write the profile to a file called myprofile.out in the current directory. Once your build finishes, you can take a look at this file, but it’s not really designed to be read by humans. Instead, plug it into Bazel’s analyze-profile command:

$ bazel analyze-profile --html myprofile.out

Now you wait for Bazel to analyze the profile info and enjoy this picture of a snail I saw walking Domino the other day (penny included for scale):

Make your project build faster than this.
Make your project build faster than this.

(After taking this photo, I moved the snail onto the grass, since I’m pretty sure it was not delighted to be in the middle of a NYC sidewalk.)

Ding, analysis is probably done. Now you can open up myprofile.out.html and see your build, broken down into hundreds or thousands of individual steps. A screenshot of the output:

Screen Shot 2015-09-18 at 2.51.55 PM

I uploaded the HTML page so you can see the whole thing here and play with it (it will open in a new tab).

I used the //android target from the example app for the profile above, since it’s a little more meaty than a toy example.

The chart shows what all 200 build threads were doing during the build at any given time (one thread per row). The build is divided into several “phases” which are shown as different colored columns on the chart:

  1. The first 1.5 seconds (darkish grey) were spent initializing the build command, which means it was just parsing options and setting up the cache.
  2. The next ~1 second (green) was the loading phase, where Bazel figures out which packages it will need, downloads external dependencies, and finds and parses BUILD files.
  3. The next ~100ms (sliver of light grey) was the analyze dependencies phase, where Bazel figured out which dependencies were cached and clean and so did not need to be rebuilt.
  4. Finally, Bazel moved into the build phase (pink background), actually building all of the things that needed to be built.

There are several other phases that you can see on the legend, but they are, for the most part, too short to even be visible on the chart.

Below the chart, there’s a section for “Execution phase.” “Execution” can be a little confusing in this context: it’s referring to executing the build, not running your program. The execution phase maps to the pink phase in the chart above. In this section is a sub-section called “Critical path, ” which breaks down what your build was waiting on:

Critical path (13.339 s):
    Id        Time Percentage   Description
  6722     48.1 ms    0.36%   Zipaligning apk
  6721      344 ms    2.58%   Generating signed apk
  6720      540 ms    4.05%   Converting bazel-out/local_darwin-fastbuild/bin/android/android_deploy.jar to dex format
  6719      230 ms    1.73%   Building deploy jar android/android_deploy.jar
  6718     1.051 s    7.88%   Building android/libandroid.jar (0 files)
  6717     15.7 ms    0.12%   Extracting interface //android activities
  6716     1.744 s   13.07%   Building android/libactivities.jar (1 files)
  6715      785 ms    5.89%   Processing resources
  6712     3.737 s   28.01%   Building external/default_android_tools/src/tools/android/java/com/google/devtools/build/android/libandroid_builder_lib.jar (17 files) [for host]
  6711     4.843 s   36.30%   Writing file external/default_android_tools/src/tools/android/java/com/google/devtools/build/android/libandroid_builder_lib.jar-2.params [for host]
           1.73 ms    0.01%   [2 middleman actions]

As you can see, the build was “blocked” for more than 8 seconds building libandroid_builder_lib.jar and its params file. Luckily, these files shouldn’t change between builds (unless you update your Android SDK, which shouldn’t happen between every build). If I make a change to libactivities.jar (the actual meat of the program) and rebuild, I get:

$ bazel build --profile profile2 //android
INFO: Writing profile data to '/Users/kchodorow/gitroot/examples/tutorial/profile2'
INFO: Found 1 target...
INFO: From Generating unsigned apk:

THIS TOOL IS DEPRECATED. See --help for more information.

INFO: From Generating signed apk:

THIS TOOL IS DEPRECATED. See --help for more information.

Target //android:android up-to-date:
  bazel-bin/android/android_deploy.jar
  bazel-bin/android/android_unsigned.apk
  bazel-bin/android/android.apk
INFO: Elapsed time: 4.545s, Critical Path: 2.72s

Here is the new profile page for this incremental build.

Note that the critical path is only 2.718 seconds now, not 13.339! If we look at the profile, we can see that the new critical path is much more svelte:

Critical path (2.718 s):
    Id        Time Percentage   Description
   543     52.3 ms    1.92%   Zipaligning apk
   542      342 ms   12.59%   Generating signed apk
   541      577 ms   21.22%   Converting bazel-out/local_darwin-fastbuild/bin/android/android_deploy.jar to dex format
   540      245 ms    9.00%   Building deploy jar android/android_deploy.jar
   539     1.502 s   55.26%   Building android/libactivities.jar (1 files)

Now building libactivities.jar is the most heavyweight operation on the critical path, so we could tackle that by perhaps breaking it into separate libraries that don’t all have to be recompiled every time something changes.

The profiles Bazel generates can be… dense… so feel free to ask on the mailing list if you need any help interpreting them. Also, if you’re interested in more on the subject, check out the documentation on profiling.

Debugging flaky tests with Bazel

Suppose you have a test that is passing… most of the time. When you start debugging it, you might try running the test and, unhelpfully, it passes:

$ bazel test :flaker
INFO: Found 1 test target...
Target //:flaker up-to-date:
  bazel-bin/flaker
INFO: Elapsed time: 0.223s, Critical Path: 0.04s
//:flaker                                                                PASSED

Executed 1 out of 1 tests: 1 test passes.

At this point, if you simply run bazel test :flaker again, Bazel knows that no files that affect the test have changed, so it won’t bother re-running it:

$ bazel test :flaker
INFO: Found 1 test target...
Target //:flaker up-to-date:
  bazel-bin/flaker
INFO: Elapsed time: 0.207s, Critical Path: 0.00s
//:flaker                                                   (1/0 cached) PASSED

Executed 0 out of 1 tests: 1 test passes.

Note the “cached” message: your test wasn’t even run! This is usually a good thing: you don’t want to waste processing power on retesting something that you already know passes. However, it isn’t very convenient if you know that your test is flaky. So what do you do now?

You can always run the test manually (./bazel-bin/flaker) over and over, or write a script that will run it, but neither is very convenient.

Enter Bazel’s --runs_per_test option.

Specifying this option runs your test multiple times, prints a summary of what happened, and (by default) only keeps the logs for the failing tests:

$ bazel test --runs_per_test=10 :flaker
INFO: Found 1 test target...
FAIL: //:flaker (run 10 of 10) (see /private/var/tmp/_bazel_kchodorow/16a1114002542b106523c47d490a1041/test/bazel-out/local_darwin-fastbuild/testlogs/flaker/test_run_10_of_10.log).
FAIL: //:flaker (run 4 of 10) (see /private/var/tmp/_bazel_kchodorow/16a1114002542b106523c47d490a1041/test/bazel-out/local_darwin-fastbuild/testlogs/flaker/test_run_4_of_10.log).
FAIL: //:flaker (run 5 of 10) (see /private/var/tmp/_bazel_kchodorow/16a1114002542b106523c47d490a1041/test/bazel-out/local_darwin-fastbuild/testlogs/flaker/test_run_5_of_10.log).
FAIL: //:flaker (run 9 of 10) (see /private/var/tmp/_bazel_kchodorow/16a1114002542b106523c47d490a1041/test/bazel-out/local_darwin-fastbuild/testlogs/flaker/test_run_9_of_10.log).
FAIL: //:flaker (run 3 of 10) (see /private/var/tmp/_bazel_kchodorow/16a1114002542b106523c47d490a1041/test/bazel-out/local_darwin-fastbuild/testlogs/flaker/test_run_3_of_10.log).
Target //:flaker up-to-date:
  bazel-bin/flaker
INFO: Elapsed time: 0.828s, Critical Path: 0.42s
//:flaker                                                                FAILED

Executed 1 out of 1 tests: 1 fails locally.

This ran the test 10 times, 5 runs of which failed. --runs_per_test lets you easily run a flaky test hundreds of times in a row (if necessary) to track down what’s going on. Bazel creates unique log files for each failing run automatically and discards the passing tests’ logs.

Getting more logs

By default, Bazel doesn’t keep logs around for passing tests (since you don’t usually care what happened when a test passed). However, during development, sometimes even passing logs can be handy. You can get logs for your passing tests using the --test_output=all flag:

$ bazel test --test_output=all :flaker
INFO: Found 1 test target...
INFO: From Testing //:flaker:
==================== Test output for //:flaker:
Should I pass or should I fail?
Thinking about it...



Okay, I'll pass.
================================================================================
Target //:flaker up-to-date:
  bazel-bin/flaker
INFO: Elapsed time: 0.873s, Critical Path: 0.71s
//:flaker                                                                PASSED

Executed 1 out of 1 tests: 1 test passes.

This way you can see stdout/stderr from tests that you’re still working on (or ones that are unexpectedly passing).

Turning off caching

Finally, as mentioned above, Bazel tries to cache test runs whenever possible. If you don’t want to use --runs_per_test but still want to rerun a test, you can specify --cache_test_results=no:

$ bazel test :flaker
INFO: Found 1 test target...
Target //:flaker up-to-date:
  bazel-bin/flaker
INFO: Elapsed time: 0.282s, Critical Path: 0.04s
//:flaker                                                                PASSED

Executed 1 out of 1 tests: 1 test passes.
$ bazel test --cache_test_results=no :flaker
INFO: Found 1 test target...
FAIL: //:flaker (see /private/var/tmp/_bazel_kchodorow/16a1114002542b106523c47d490a1041/test/bazel-out/local_darwin-fastbuild/testlogs/flaker/test.log).
Target //:flaker up-to-date:
  bazel-bin/flaker
INFO: Elapsed time: 0.178s, Critical Path: 0.03s
//:flaker                                                                FAILED

Executed 1 out of 1 tests: 1 fails locally.

It’s a bit redundant (you can get the same behavior from --runs_per_test), but sometimes I’m just in more of a --cache_test_results mood.

There are several other features Bazel has to simplify testing and debugging, but these are a few of the flags that I find most helpful.

Check out the Bazel user manual for more detailed documentation on:

Happy testing!

From http://memegenerator.net/instance/52587305.

The Return of the Scala Rule Tutorial: The Execution

This builds on the first part of the tutorial. In this post, we will make the the rule actually produce an executable.

Capturing the output from scalac

At the end of the tutorial last time, we were calling scalac, but ignoring the result:

(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'external/scala/bin/scalac HelloWorld.scala; echo '''blah''' > bazel-out/local_darwin-fastbuild/bin/hello-world.sh')

If you look at the directory where the action is running (/private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg in my case) you can see that HelloWorld.class and HelloWorld$.class is created. This directory is called the execution root, it is where bazel executes build actions. Bazel uses separate directory trees for source code, executing build actions, and output files (bazel-out/). Files won’t get moved from the execution root to the output tree unless we tell Bazel we want them.

We want our compiled scala program to end up in bazel-out/, but there’s a small complication. With languages like Java (and Scala), a single source file might contain inner classes that cause multiple .class files to be generated by a single compile action. Bazel cannot know until it runs the action how many class files are going to be generated. However, Bazel requires that each action declare, in advance, what its outputs will be. The way to get around this is to package up the .class files and make the resulting archive the build output.

In this example, we’ll add the .class files into a .jar. Let’s add that to the outputs, which should now look like this:

  outputs = {
    'jar': "%{name}.jar",
    'sh': "%{name}.sh",
  },

In the impl function, our command is getting a bit complicated so I’m going to change it to an array of commands and then join them on “n” in the action:

def impl(ctx):
    cmd = [
        "%s %s" % (ctx.file._scalac.path, ctx.file.src.path),
        "find . -name '*.class' -print > classes.list",
        "jar cf %s @classes.list" % (ctx.outputs.jar.path),
    ]

    ctx.action(
        inputs = [ctx.file.src],
	command = "n".join(cmd),
        outputs = [ctx.outputs.jar]
    )

This will compile the src, find all of the .class files, and add them to the output jar. If we run this, we get:

$ bazel build -s :hello-world
INFO: Found 1 target...
>>>>> # //:hello-world [action 'Unknown hello-world.jar']
(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'external/scala/bin/scalac HelloWorld.scala
find . -name '''*.class''' -print > classes.list
jar cf bazel-out/local_darwin-fastbuild/bin/hello-world.jar @classes.list')
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
INFO: Elapsed time: 4.774s, Critical Path: 4.06s

Let’s take a look at what hello-world.jar contains:

$ jar tf bazel-bin/hello-world.jar
META-INF/
META-INF/MANIFEST.MF
HelloWorld$.class
HelloWorld.class

Looks good! However, we cannot actually run this jar, because java doesn’t know what the main class should be:

$ java -jar bazel-bin/hello-world.jar 
no main manifest attribute, in bazel-bin/hello-world.jar

Similar to the java_binary rule, let’s add a main_class attribute to scala_binary and put it in the jar’s manifest. Add 'main_class' : attr.string(), to scala_binary‘s attrs and change cmd to the following:

    cmd = [
        "%s %s" % (ctx.file._scalac.path, ctx.file.src.path),
        "echo Manifest-Version: 1.0 > MANIFEST.MF",
        "echo Main-Class: %s >> MANIFEST.MF" % ctx.attr.main_class,
        "find . -name '*.class' -print > classes.list",
	"jar cfm %s MANIFEST.MF @classes.list" % (ctx.outputs.jar.path),
    ]

Remember to update your actual BUILD file to add a main_class attribute:

# BUILD
load("/scala", "scala_binary")

scala_binary(
    name = "hello-world",
    src = "HelloWorld.scala",
    main_class = "HelloWorld",
)

Now building and running gives you:

$ bazel build :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
INFO: Elapsed time: 4.663s, Critical Path: 4.05s
$ java -jar bazel-bin/hello-world.jar 
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Predef$
	at HelloWorld$.main(HelloWorld.scala:4)
	at HelloWorld.main(HelloWorld.scala)
Caused by: java.lang.ClassNotFoundException: scala.Predef$
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 2 more

Closer! Now it cannot find some scala libraries it needs. You can add it manually on the command line to see that our jar does actually does work if we specify the scala library jar, too:

$ java -cp $(bazel info output_base)/external/scala/lib/scala-library.jar:bazel-bin/hello-world.jar HelloWorld
Hello, world!

So we need our rule to generate an executable that basically runs this command, which can be accomplished by adding another action to our build. First we’ll add a dependency on scala-library.jar by adding it as a hidden attribute:

        '_scala_lib': attr.label(
            default=Label("@scala//:lib/scala-library.jar"),
            allow_files=True,
            single_file=True),

Making scala_binarys executable

Let’s pause here for a moment and switch gears: we’re going to tell bazel that scala_binarys are binaries. To do this, we add executable = True to the attrs and get rid of the reference to hello-world.sh in the outputs:

...
    outputs = {
        'jar': "%{name}.jar",
    },
    implementation = impl,
    executable = True,
)

This says that scala_binary(name = "foo", ...) should have an action that creates a binary called foo, which can be referenced via ctx.outputs.executable in the implementation function. We can now use bazel run :hello-world (instead of bazel build :hello-world; ./bazel-bin/hello-world.sh).

The executable we want to create is the java command from above, so we add the second action to impl, this one a file action (since we’re just generating a file with certain content, not executing a series of commands to generate a .jar):

    cp = "%s:%s" % (ctx.outputs.jar.basename, ctx.file._scala_lib.path)
    content = [
	"#!/bin/bash",
        "echo Running from $PWD",
	"java -cp %s %s" % (cp, ctx.attr.main_class),
    ]
    ctx.file_action(
	content = "n".join(content),
	output = ctx.outputs.executable,
    )

Note that I also added a line to the file to echo where it is being run from. If we now use bazel run, you’ll see:

$ bazel run :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
  bazel-bin/hello-world
INFO: Elapsed time: 2.694s, Critical Path: 0.08s

INFO: Running command line: bazel-bin/hello-world
Running from /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world.runfiles
Error: Could not find or load main class HelloWorld
ERROR: Non-zero return code '1' from command: Process exited with status 1.

Whoops, it’s not able to find the jars! And what is that path, hello-world.runfiles, it’s running the binary from?

The runfiles directory

bazel run runs the binary from the runfiles directory, a directory that is different than the source root, execution root, and output tree mentioned above. The runfiles directory should contain all of the resources needed by the executable during execution. Note that this is not the execution root, which is used during the bazel build step. When you actually execute something created by bazel, its resources need to be in the runfiles directory.

In this case, our executable needs to access hello-world.jar and scala-library.jar. To add these files, the API is somewhat strange. You must return a struct containing a runfiles object from the rule implementation. Thus, add the following as the last line of your impl function:

return struct(runfiles = ctx.runfiles(files = [ctx.outputs.jar, ctx.file._scala_lib]))

Now if you run it again, it’ll print:

$ bazel run :hello-world
INFO: Found 1 target...
Target //:hello-world up-to-date:
  bazel-bin/hello-world.jar
  bazel-bin/hello-world
INFO: Elapsed time: 0.416s, Critical Path: 0.00s

INFO: Running command line: bazel-bin/hello-world
Running from /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world.runfiles
Hello, world!

Hooray!

However! If we run it as bazel-bin/hello-world, it won’t be able to find the jars (because we’re not in the runfiles directory). To find the runfiles directory regardless of where the binary is run from, change your content variable to the following:

    content = [
        "#!/bin/bash",
        "case "$0" in",
        "/*) self="$0" ;;",
        "*)  self="$PWD/$0";;",
        "esac",
        "(cd $self.runfiles; java -cp %s %s)" % (cp, ctx.attr.main_class),
    ]

This way, if it’s run from bazel run, $0 will be the absolute path to the binary (in my case, /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg/bazel-out/local_darwin-fastbuild/bin/hello-world). If it’s run via bazel-bin/hello-world, $0 will be just that: bazel-bin/hello-world. Either way, we’ll end up in the runfiles directory before executing the command.

Now our rule is successfully generating a binary. You can see the full code for this example on GitHub.

In the final part of this tutorial, we’ll fix the remaining issues:

  • No support for multiple source files, never mind dependencies.
  • [action 'Unknown hello-world.jar'] is pretty ugly.

Until next time!

Tutorial: how to write Scala rules for Bazel

Bazel comes with built-in support for several languages and allows you to write your own support for any other languages in Python.

Although you could probably get more abstract, let’s define a rule as something that takes some files, does something to them, and then gives you some output files. Specifically, for this example, we want a scala_binary rule where we can give it a Scala source file and it turns it into an executable binary.

Part 1: Creating a Scala source file

Let’s create a simple example of a Scala source file (mercilessly ripped from the Scala hello world example):

// HelloWorld.scala
object HelloWorld {
  def main(args: Array[String]) {
    println("Hello, world!")
  }
}

Note that I’ve never used Scala before today, so please let me know in the comments if I’ve made an mistakes.

Before proceeding, I think it’s a good idea to try building this without bazel (especially if you’re not too familiar with the language’s build tool… ahem) as a sanity check:

$ scalac HelloWorld.scala
$ scala HelloWorld
Hello, world!

Looking good! Now, let’s try to get bazel building that.

Adding a BUILD file and dummy scala_binary rule

We’ll create a BUILD file that references our (currently non-existent) scala_binary rule. This lets us plan out what we’ll need our rule to look like:

# BUILD
load('/scala', 'scala_binary')

scala_binary(
    name = "hello-world",
    src = "HelloWorld.scala",
)

The load() statement means that we’ll declare the scala_binary rule in a file called scala.bzl in the root of the workspace (due to the ‘/’ prefix on ‘/scala’). Let’s create that file now:

# scala.bzl
def impl(ctx):
    pass

scala_binary = rule(
    attrs = {
        'src': attr.label(
            allow_files=True,
            single_file=True),
    },
    outputs = {'sh': "%{name}.sh"},
    implementation = impl,
)

scala_binary‘s definition says that rules can have one attribute (other than name), src, which is a single file. The rule is supposed to output a file called name.sh, so for our example we should end up with hello-world.sh. The implementation of our rule should be happening in the function impl. The rule implementation doesn’t do anything yet, but we can at least try building now:

$ touch WORKSPACE # if you haven't already...
$ bazel build :hello-world
ERROR: /Users/kchodorow/blerg/BUILD:4:1: in scala_binary rule //:hello-world: 
: The following files have no generating action:
hello-world.sh
.
ERROR: Analysis of target '//:hello-world' failed; build aborted.
INFO: Elapsed time: 0.286s

The error is expected: our rule definition says that hello-world.sh should be an output, but there’s no code creating it yet. Let’s add some functionality to the implementation function by replacing existing function with the following:

def impl(ctx):
    ctx.action(
        inputs = [ctx.file.src],
        command = "echo %s > %s" % (ctx.file.src.path, ctx.outputs.sh.path),
        outputs = [ctx.outputs.sh]
    )

This adds an action to the build. It says that, if the inputs have changed (the src file), run the command (which right now is just echoing src‘s path) to the output file. Note that ctx.action(...) doesn’t actually run the action, it just adds that action to “things that need to be run in the future” for the rule.

Now if we build :hello-world again, we get:

$ bazel build -s :hello-world
INFO: Found 1 target...
>>>>> # //:hello-world [action 'Unknown hello-world.sh']
(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'echo HelloWorld.scala > bazel-out/local_darwin-fastbuild/bin/hello-world.sh')
Target //:hello-world up-to-date:
  bazel-bin/hello-world.sh
INFO: Elapsed time: 0.605s, Critical Path: 0.02s

I used bazel’s -s option here, which is very helpful for debugging what your rule is doing. It prints all of the subcommands a build is running. As you can see, now our rule has an action (>>>>> # //:hello-world [action 'Unknown hello-world.sh']) that creates bazel-bin/hello-world.sh by echoing the source file name. You can verify this by cating bazel-bin/hello-world.sh.

Adding a dependency on the scala compiler

We want our rule to actually call scalac. Even if you have the scala compiler installed on your system, you cannot simply create an action with a command = 'scalac MySourceFile.scala' line, as actions are run in a “clean room” environment: nothing* is there that you don’t specify.

As you probably don’t want to add the scala compiler to your workspace, open up your WORKSPACE file and add it as an external dependency:

# WORKSPACE
new_http_archive(
    name = "scala",
    url = "http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz",
    sha256 = "ffe4196f13ee98a66cf54baffb0940d29432b2bd820bd0781a8316eec22926d0",
    build_file = "scala.BUILD",
)

Also create the scala.BUILD file in the root of your workspace:

# scala.BUILD
exports_files([
    "bin/scala",
    "bin/scalac",
    "lib/scala-library.jar"
])

Now add a dependency on scalac to your scala_binary rule by adding a “hidden attribute.” Add calling scalac in your impl, so your scala.bzl file looks something like this:

def impl(ctx):
    ctx.action(
        inputs = [ctx.file.src],
        command = "%s %s; echo 'blah' > %s" % (
            ctx.file._scalac.path, ctx.file.src.path, ctx.outputs.sh.path),
        outputs = [ctx.outputs.sh]
    )

scala_binary = rule(
    attrs = {
        'src': attr.label(
            allow_files=True,
            single_file=True),
        '_scalac': attr.label(
            default=Label("@scala//:bin/scalac"),
            executable=True,
            allow_files=True,
            single_file=True),
    },
    outputs = {'sh': "%{name}.sh"},
    implementation = impl,
)

Building now shows that scalac is successfully being run on our source file!

$ bazel build -s :hello-world
INFO: Found 1 target...
>>>>> # //:hello-world [action 'Unknown hello-world.sh']
(cd /private/var/tmp/_bazel_kchodorow/92df5f72e3c78c053575a1a42537d8c3/blerg && 
  exec env - 
  /bin/bash -c 'external/scala/bin/scalac HelloWorld.scala; echo '''blah''' > bazel-out/local_darwin-fastbuild/bin/hello-world.sh')
Target //:hello-world up-to-date:
  bazel-bin/hello-world.sh
INFO: Elapsed time: 4.634s, Critical Path: 4.11s

There are still many issues with this implementation:

  • The output from calling scalac doesn’t actually go anywhere, hello-world.sh is still a dummy file.
  • No support for multiple source files, never mind dependencies.
  • [action 'Unknown hello-world.sh'] is pretty ugly.
  • You can’t call bazel run //hello-world, even though the output should be executable.

However, this post is already running long, so let’s wrap it up here and get to some of these issue in the next post. Until next time!

References

* Obviously there are some commands there (our original rule uses echo, for instance) and you can see what’s in the empty environment by writing env to an output file in an action. This can actually cause issues: sometimes commands in the default PATH have different behavior on different systems. To get a completely hermetic build, you should really provide every command your rule uses. However, we’re just using echo for debugging here anyway, so we’ll let it slide.

Trimming the (build) tree with Bazel

Jonathan Lange wrote a great blog post about how Bazel caches tests. Basically: if you run a test, change your code, then run a test again, the test will only be rerun if you changed something that could actually change the outcome of the test. Bazel takes this concept pretty far to minimize the work your build needs to do, in some ways that aren’t immediately obvious.

Let’s take an example. Say you’re using Bazel to “build” rigatoni arrabiata, which could be represented as having the following dependencies:

recipe

Each food is a library which depends on the libraries below it. Suppose you change a dependency, like the garlic:

change-garlic

Bazel will stat the files of the “garlic” library and notice this change, and then make a note that the things that depend on “garlic” may have also changed:

dirty

The fancy term for this is “invalidating the upward transitive closure” of the build graph, aka “everything that depends on a thing might be dirty.” Note that Bazel already knows that this change doesn’t affect several of the libraries (rigatoni, tomato-puree, and red-pepper), so they definitely don’t have to be rebuilt.

Bazel will then evaluate the “sauce” node and figures out if its output has changed. This is where the secret sauce (ha!) happens: if the output of the “sauce” node hasn’t changed, Bazel knows that it doesn’t have to recompile rigatoni-arrabiata (the top node), because none of its direct dependencies changed!

The sauce node is no longer "maybe dirty" and so its reverse dependencies (rigatoni-arrabiata) can also be marked as clean.
The sauce node is no longer “maybe dirty” and so its reverse dependencies (rigatoni-arrabiata) can also be marked as clean.

In general, of course, changing the code for a library will change its compiled form, so the “maybe dirty” node will end up being marked as “yes, dirty” and re-evaluated (and so on up the tree). However, Bazel’s build graph lets you compile the bare minimum for a well-structured library, and in some cases avoid compilations altogether.

Positive reinforcement learning through barbacoa

Domino hanging out at the beach over July 4th weekend.
Domino hanging out at the beach over July 4th weekend.

Yesterday I had some extra barbacoa that Domino was super excited about and Andrew suggested I use it to teach him (Domino, not Andrew) how to lie down on command. I waited until he lay down on his own, said “yes!” and gave him a piece of barbacoa. He leapt up and ate the barbacoa and then stood there, waiting for more. After a minute or so, he gave up and lay down again. I said “yes!” and held out another piece. This repeated ~10 times, at which point I was starting to think that he would never figure it out, when something changed. Domino seemed to realize that there was something he was doing that was turning me into a barbacoa vending machine, he just had to figure out what. He tried grabbing his bone (which he had happened to be playing with one time while laying down), sitting, moving away, and finally got it after another few repetitions. Then he lay down like a champ for the last ~5 pieces of barbacoa.

About an hour later, Andrew made a late-night snack and Domino came over, made meaningful eye contact, and then flopped down like a rug. It was adorable, and amazing to see the actual moment learning happened.

Now I just need to attach a cue to it!

Pain-free OAuth with AppEngine

meme-5828337892065280

I do a lot of side projects (or at least start them) and implementing authentication is always like chewing on razor blades. I started another project recently using AppEngine and, bracing myself with a lot of caffeine and “suck it up, princess” attitude, I started doing “oauth appengine” searches.

After digging through some documentation, I realized that AppEngine actually does OAuth right. All you have to do is add the following to your src/main/webapp/WEB-INF/web.xml file:

    
        
           my-thing
           /members-only/*
        
        
           *
        
    

The <url-pattern>/members-only/*</url-pattern> means that when someone goes to any page under members-only/, they’ll have to go through the “login with Google” flow.

On the server side, there’s no annoying URL encodings to get right or tokens to keep track of. You can just access the logged-in person’s username, e.g., in Java:

public class DemoServlet extends HttpServlet {
    @Override
    public void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws IOException {
    	String account = req.getUserPrincipal().getName();

        ...
    }
}

That’s it! This is such a killer feature to me, I can’t believe I never knew about about it before.

(One other thing that I can’t believe I never knew about is SimpleHTTPServer. python -m SimpleHTTPServer will serve static files from the current directory. I’m pretty sure everyone already knows about this, but just in case there’s someone else out there…)

API changes with extra cheese, hold the fear

Rihanna's dress

When you make a change, how do you know what tests to run? If you’re lucky, no one else depends on your code so you can just run your own tests and you’re done. However, if you’re writing useful code, other people are probably going to start depending on it. Once that happens, it becomes difficult to make changes without breaking them. Bazel can make this easier, by letting you figure out all of the targets that are depending on your code.

Suppose we are working on the pizza library and we need some cheese, so we create a cheese library and depend on it from pizza. If we look at our build graph, it will look something like this:

graph

//italian:pizza is depending on //ingredients:cheese, as expected.

A few weeks later, the macaroni team discovers that it could also use cheese, so it starts depending on our library. Now our build graph looks like this:

graph

Both our team’s pizza target and the macaroni team’s mac_lib target are depending on //ingredients:cheese. However, Team Macaroni never told us that they’re depending on cheese, so as far as we know, we’re still its only users. Suppose we decide to make a backwards-breaking change (e.g., make Cheese::setMilkfat() private). We make our change, run all of the pizza– and cheese-related tests, submit it… and break //american:mac_and_cheese as well as a dozen other projects who were calling setMilkfat() (that we didn’t know about).

If we had known that other people were depending on our code, we could have let them know that they needed to update their API usage. But how could we find out? With Bazel, we can query for everyone depending on our library:

$ bazel query 'rdeps(//..., //ingredients:cheese)'

This means: “query for every target in our workspace that depends on //ingredients:cheese.”

Now we can check that everything in our code base still builds with our cheese changes by running:

$ bazel build $(bazel query 'rdeps(//..., //ingredients:cheese)')

Just because they built doesn’t mean they work correctly! We can then find all of the tests that depend on cheese and run them:

$ bazel test $(bazel query 'kind(test, rdeps(//..., //ingredients:cheese))')

Unpacking that from the innermost parentheses, that means: “find the targets depending on //ingredients:cheese (rdeps(...)), search those for targets that are tests (kind(test, ...)), and run all of those targets (bazel test ...).”

Running that set of builds and tests is a pretty good check that everything that depends on cheese still works. I mean, if they didn’t write a test for it, it can’t matter too much, right?

macandcheese1

Right.

Have you ever looked at your build? I mean, really looked at your build?

Bazel has a feature that lets you see a graph of your build dependencies. It could help you debug things, but honestly it’s just really cool to see what your build is doing.

To try it out, you’ll need a project that uses Bazel to build. If you don’t have one handy, here’s a tiny workspace you can use:

$ git clone https://github.com/kchodorow/tiny-workspace.git
$ cd tiny-workspace

Now run bazel query in your tiny-workspace/ directory, asking it to search for all dependencies of //:main and format the output as a graph:

$ bazel query 'deps(//:main)' --output graph > graph.in

This creates a file called graph.in, which is a text representation of the build graph. You can use dot (install with sudo apt-get install graphviz) to create a png from this:

$ dot -Tpng  graph.png

If you open up graph.png, you should see something like this:

graph

You can see //:main depends on one file (//:main.cc) and four targets (//:x, //tools/cpp:stl, //tools/default:crosstool, and //tools/cpp:malloc). All of the //tools targets are implicit dependencies of any C++ target: every C++ build you do needs the right compiler, flags, and libraries available, but it crowds your result graph. You can exclude these implicit dependencies by removing them from your query results:

$ bazel query --noimplicit_deps 'deps(//:main)' --output graph > simplified_graph.in

Now the resulting graph is just:

graph

Much neater!

If you’re interested in further refining your query, check out the docs on querying.