Build, y u go slow?

yuno

When a build is taking too long, it can be very helpful to know what it’s doing. Bazel has built-in tooling that lets you visualize what each thread is doing at any given moment of a build and which build steps are slowing down your overall build.

To try out Bazel’s profiling tools, build your favorite (or, rather, least-favorite) target with the --profile option:

$ bazel build --profile=myprofile.out //snail:slow-lib

This will write the profile to a file called myprofile.out in the current directory. Once your build finishes, you can take a look at this file, but it’s not really designed to be read by humans. Instead, plug it into Bazel’s analyze-profile command:

$ bazel analyze-profile --html myprofile.out

Now you wait for Bazel to analyze the profile info and enjoy this picture of a snail I saw walking Domino the other day (penny included for scale):

Make your project build faster than this.
Make your project build faster than this.

(After taking this photo, I moved the snail onto the grass, since I’m pretty sure it was not delighted to be in the middle of a NYC sidewalk.)

Ding, analysis is probably done. Now you can open up myprofile.out.html and see your build, broken down into hundreds or thousands of individual steps. A screenshot of the output:

Screen Shot 2015-09-18 at 2.51.55 PM

I uploaded the HTML page so you can see the whole thing here and play with it (it will open in a new tab).

I used the //android target from the example app for the profile above, since it’s a little more meaty than a toy example.

The chart shows what all 200 build threads were doing during the build at any given time (one thread per row). The build is divided into several “phases” which are shown as different colored columns on the chart:

  1. The first 1.5 seconds (darkish grey) were spent initializing the build command, which means it was just parsing options and setting up the cache.
  2. The next ~1 second (green) was the loading phase, where Bazel figures out which packages it will need, downloads external dependencies, and finds and parses BUILD files.
  3. The next ~100ms (sliver of light grey) was the analyze dependencies phase, where Bazel figured out which dependencies were cached and clean and so did not need to be rebuilt.
  4. Finally, Bazel moved into the build phase (pink background), actually building all of the things that needed to be built.

There are several other phases that you can see on the legend, but they are, for the most part, too short to even be visible on the chart.

Below the chart, there’s a section for “Execution phase.” “Execution” can be a little confusing in this context: it’s referring to executing the build, not running your program. The execution phase maps to the pink phase in the chart above. In this section is a sub-section called “Critical path, ” which breaks down what your build was waiting on:

Critical path (13.339 s):
    Id        Time Percentage   Description
  6722     48.1 ms    0.36%   Zipaligning apk
  6721      344 ms    2.58%   Generating signed apk
  6720      540 ms    4.05%   Converting bazel-out/local_darwin-fastbuild/bin/android/android_deploy.jar to dex format
  6719      230 ms    1.73%   Building deploy jar android/android_deploy.jar
  6718     1.051 s    7.88%   Building android/libandroid.jar (0 files)
  6717     15.7 ms    0.12%   Extracting interface //android activities
  6716     1.744 s   13.07%   Building android/libactivities.jar (1 files)
  6715      785 ms    5.89%   Processing resources
  6712     3.737 s   28.01%   Building external/default_android_tools/src/tools/android/java/com/google/devtools/build/android/libandroid_builder_lib.jar (17 files) [for host]
  6711     4.843 s   36.30%   Writing file external/default_android_tools/src/tools/android/java/com/google/devtools/build/android/libandroid_builder_lib.jar-2.params [for host]
           1.73 ms    0.01%   [2 middleman actions]

As you can see, the build was “blocked” for more than 8 seconds building libandroid_builder_lib.jar and its params file. Luckily, these files shouldn’t change between builds (unless you update your Android SDK, which shouldn’t happen between every build). If I make a change to libactivities.jar (the actual meat of the program) and rebuild, I get:

$ bazel build --profile profile2 //android
INFO: Writing profile data to '/Users/kchodorow/gitroot/examples/tutorial/profile2'
INFO: Found 1 target...
INFO: From Generating unsigned apk:

THIS TOOL IS DEPRECATED. See --help for more information.

INFO: From Generating signed apk:

THIS TOOL IS DEPRECATED. See --help for more information.

Target //android:android up-to-date:
  bazel-bin/android/android_deploy.jar
  bazel-bin/android/android_unsigned.apk
  bazel-bin/android/android.apk
INFO: Elapsed time: 4.545s, Critical Path: 2.72s

Here is the new profile page for this incremental build.

Note that the critical path is only 2.718 seconds now, not 13.339! If we look at the profile, we can see that the new critical path is much more svelte:

Critical path (2.718 s):
    Id        Time Percentage   Description
   543     52.3 ms    1.92%   Zipaligning apk
   542      342 ms   12.59%   Generating signed apk
   541      577 ms   21.22%   Converting bazel-out/local_darwin-fastbuild/bin/android/android_deploy.jar to dex format
   540      245 ms    9.00%   Building deploy jar android/android_deploy.jar
   539     1.502 s   55.26%   Building android/libactivities.jar (1 files)

Now building libactivities.jar is the most heavyweight operation on the critical path, so we could tackle that by perhaps breaking it into separate libraries that don’t all have to be recompiled every time something changes.

The profiles Bazel generates can be… dense… so feel free to ask on the mailing list if you need any help interpreting them. Also, if you’re interested in more on the subject, check out the documentation on profiling.

A Quick Intro to mongosniff

Writing an application on top of a framework on top of a driver on top of the database is a bit like playing telephone: you say “insert foo” and the database says “purple monkey dishwasher.” mongosniff lets you see exactly what the database is hearing and saying.

It comes with the binary distribution, so if you have mongod you should have mongosniff.

To try it out, first start up an instance of mongod normally:

$ ./mongod

When you start up mongosniff you have to tell it to listen on the loopback (localhost) interface. This interface is usually called “lo”, but my Mac calls it “lo0”, so run ifconfig to make sure you have the name right. Now run:

$ sudo ./mongosniff --source NET lo
sniffing... 27017 

Note the “sudo”: this has never worked for me from my user account, probably because of some stupid network permissions thing.

Now start up the mongo shell and try inserting something:

> db.foo.insert({x:1})

If you look at mongosniff‘s output, you’ll see:

127.0.0.1:57856  -->> 127.0.0.1:27017 test.foo  62 bytes  id:430131ca   1124151754
        insert: { _id: ObjectId('4c7fb007b5d697849addc650'), x: 1.0 }
127.0.0.1:57856  -->> 127.0.0.1:27017 test.$cmd  76 bytes  id:430131cb  1124151755
        query: { getlasterror: 1.0, w: 1.0 }  ntoreturn: -1 ntoskip: 0
127.0.0.1:27017  <<--  127.0.0.1:57856   76 bytes  id:474447bf  1195657151 - 1124151755
        reply n:1 cursorId: 0
        { err: null, n: 0, wtime: 0, ok: 1.0 }

There are three requests here, all for one measly insert. Dissecting the first request, we can learn:

source -->> destination
Our client, mongo in this case, is running on port 57856 and has sent a message to the database (127.0.0.1:27017).

db.collection
This request is for the “test” database’s “foo” collection.
length bytes
The length of the request is 62 bytes. This can be handy to know if your requests are edging towards the maximum request length (16 MB).
id:hex-id id
This is the request id in hexadecimal and decimal (in case you don’t have a computer handy, apparently). Every request to and from the database has a unique id associated with it for tax purposes.
op: content
This is the actual meat of the request: we’re inserting this document. Notice that it’s inserting the float value 1.0, even though we typed 1 in the shell. This is because JavaScript only has one number type, so every number typed in the shell is converted to a double.

The next request in the mongosniff output is a database command: it checks to make sure the insert succeeded (the shell always does safe inserts).

The last message sniffed is a little different: it is going from the database to the shell. It is the database response to the getlasterror command. It shows that there was only one document returned (reply n:1) and that there are no more results waiting at the database (cursorId: 0). If this were a “real” query and there was another batch of results to be sent from the database, cursorId would be non-zero.

Hopefully this will help some people decipher what the hell is going on!