––thursday #7: git-new-workdir

Often I’ll fix a bug (call it “bug A”), kick off some tests, and then get stuck. I’d like to start working on bug B, but I can’t because the tests are running and I don’t want to change the repo while they’re going. Luckily, there’s a git tool for that: git-new-workdir. It basically creates a copy of your repo somewhere else on the filesystem, with all of your local branches and commits.

git-new-workdir doesn’t actually come with git-core, but you should have a copy of the git source anyway, right?

$ git clone https://github.com/git/git.git

Copy the git-new-workdir script from contrib/workdir to somewhere on your $PATH. (There are some other gems in the contrib directory, so poke around.)

Now go back to your repository and do:

$ git-new-workdir ./ ../bug-b

This creates a directory one level up called bug-b, with a copy of your repo.

Thanks to Andrew for telling me about this.

Edited to add: Justin rightly asks, what’s the difference between this and local clone? The difference is that git-new-workdir creates softlinks everything in your .git directory, so your commits in bug-b appear in your original repository.

––thursday #5: diagnosing high readahead

Having readahead set too high can slow your database to a crawl. This post discusses why that is and how you can diagnose it.

The #1 sign that readahead is too high is that MongoDB isn’t using as much RAM as it should be. If you’re running Mongo Monitoring Service (MMS), take a look at the “resident” size on the “memory” chart. Resident memory can be thought of as “the amount of space MongoDB ‘owns’ in RAM.” Therefore, if MongoDB is the only thing running on a machine, we want resident size to be as high as possible. On the chart below, resident is ~3GB:

Is 3GB good or bad? Well, it depends on the machine. If the machine only has 3.5GB of RAM, I’d be pretty happy with 3GB resident. However, if the machine has, say, 15GB of RAM, then we’d like at least 15GB of the data to be in there (the “mapped” field is (sort of) data size, so I’m assuming we have 60GB of data).

Assuming we’re accessing a lot of this data, we’d expect MongoDB’s resident set size to be 15GB, but it’s only 3GB. If we try turning down readahead and the resident size jumps to 15GB and our app starts going faster. But why is this?

Let’s take an example: suppose all of our docs are 512 bytes in size (readahead is set in 512-byte increments, called sectors, so 1 doc = 1 sector makes the math easier). If we have 60GB of data then we have ~120 million documents (60GB of data/(512 bytes/doc)). The 15GB of RAM on this machine should be able to hold ~30 million documents.

Our application accesses documents randomly across our data set, so we’d expect MongoDB to eventually “own” (have resident) all 15GB of RAM, as 1) it’s the only thing running and 2) it’ll eventually fetch at least 15GB of the data.

Now, let’s set our readahead to 100 (100 512-byte sectors, aka 100 documents): blockdev --set-ra 100. What happens when we run our application?

Picture our disk as looking like this, where each o is a document:

...
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
... // keep going for millions more o's

Let’s say our app requests a document. We’ll mark it with “x” to show that the OS has pulled it into memory:

...
ooooooooooooooooooooooooo
ooooxoooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
...

See it on the third line there? But that’s not the only doc that’s pulled into memory: readahead is set to 100 so the next 99 documents are pulled into memory, too:

...
ooooooooooooooooooooooooo
ooooxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxooooooooooooooooooooo
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
...
Is your OS returning this with every document?

Now we have 100 docs in memory, but remember that our application is accessing documents randomly: the likelihood of the next document we access is in that block of 100 docs is almost nil. At this point, there’s 50KB of data in RAM (512 bytes * 100 docs = 51,200 bytes) and MongoDB’s resident size has only increase by 512 bytes (1 doc).

Our app will keep bouncing around the disk, reading docs from here and there and filing up memory with docs MongoDB never asked for until RAM is completely full of junk that’s never been used. Then, it’ll start evicting things to make room for new junk as our app continues to make requests.

Working this out, there’s a 25% chance of our app requesting a doc that’s already in memory, so 75% of the requests are going to go to disk. Say we’re doing 2 requests a sec. Then 1 hour of requests is 2 requests * 3600 seconds/hour = 7200 requests, 4800 of which are going to disk (.75 * 7200). If each request pulls back 50KB, that’s 240MB read from disk/hour. If we set readahead to 0, we’ll have 2MB read from disk/hour.

Which brings us to the next symptom of a too-high readahead: unexpectedly high disk IO. Because most of the data we want isn’t in memory, we keep having to go to disk, dragging shopping-carts full of junk into RAM, perpetuating the high disk io/low resident mem cycle.

The general takeaway is that a DB is not a “normal” workload for an OS. The default settings may screw you over.

––thursday #4: blockdev

Disk IO is slow. You just won’t believe how vastly, hugely, mind-bogglingly slow it is. I mean, you may think your network is slow, but that’s just peanuts to disk IO.

The image below helps visualize how slow (post continues below).

(Originally found on Hacker News and inspired by Gustavo Duarte’s blog.)

The kernel knows how slow the disk is and tries to be smart about accessing it. It not only reads the data you requested, it also returns a bit more. This way, if you’re reading through a file or watching a movie (sequential access), your system doesn’t have to go to disk as frequently because you’re pulling more data back than you strictly requested each time.

You can see how far the kernel reads ahead using the blockdev tool:

$ sudo blockdev --report
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw   256   512  4096          0     80026361856   /dev/sda
rw   256   512  4096       2048     80025223168   /dev/sda1
rw   256   512  4096          0   2000398934016   /dev/sdb
rw   256   512  1024       2048        98566144   /dev/sdb1
rw   256   512  4096     194560      7999586304   /dev/sdb2
rw   256   512  4096   15818752     19999490048   /dev/sdb3
rw   256   512  4096   54880256   1972300152832   /dev/sdb4

Readahead is listed in the “RA” column. As you can see, I have two disks (sda and sdb) with readahead set to 256 on each. But what unit is that 256? Bytes? Kilobytes? Dolphins? If we look at the man page for blockdev, it says:

$ man blockdev
...
       --setra N
              Set readahead to N 512-byte sectors.
...

This means that my readahead is 512 bytes*256=131072 or 128KB. That means that, whenever I read from disk, the disk is actually reading at least 128KB of data, even if I only requested a few bytes.

So what value should you set your readahead to? Please don’t set it to a number you find online without understanding the consequences. If you Google for “blockdev setra”, the first result uses blockdev –setra 65536, which translates to 32MB of readahead. That means that, whenever you read from disk, the disk is actually doing 32MB worth of work. Please do not set your readahead this high if you’re doing a lot of random-access reads and writes, as all of the extra IO can slow things down a lot (and if your low on memory, you’ll be forcing the kernel to fill up your RAM with data you won’t need).

Getting a good readahead value can help disk IO issues to some extent, but if you are using MongoDB (in particular), please consider your typical document size and access patterns before changing your blockdev settings. I’m not recommending any particular value because what’s perfect for one application/machine can be death for another.

I’m really enjoying these –thursday posts because every week people have commented with different/better/interesting ways of doing what I talked about (or ways of telling the difference between stalagmites and stalactites), which is really cool. So I’m throwing this out there: how would you figure out what a good readahead setting is? Next week I’m planning to do iostat for –thursday which should cover this a bit, but please leave a comment if you have any ideas.

––thursday #3: a handy git prompt

Everyone says to use git branches early and often, but I inevitably lose track of what branch I’m on. My workflow generally goes something like:

  1. Check out a branch
  2. Lunch!
  3. Get back to my desk and make an emergency bug fix
  4. Commit emergency fix
  5. Suddenly realize I’m not on the branch I meant to be on, optimally (but not always) before I try to push

To keep track, I’ve modified my command prompt to display what I’m working on using __git_ps1, which prints a nice “you are here” string for your prompt:

$ __git_ps1
 (master)
$ git checkout myTag123
$ __git_ps1
 ((myTag123))

(Newlines added for clarity, it actually displays as ” (master)$ git checkout…”, which is generally what you want so your prompt doesn’t contain a newline.) I’m not sure what’s up with all the ‘()’s, but it does let you distinguish between branches (‘(branchname)’) and tags (‘((tagname))’).

__git_ps1 will even show you if you’re in the middle of a bisect or a merge, which is another common workflow for me:

  1. Merge something and there are a ton of conflicts
  2. Boldly decide to deal with it tomorrow and go home for the day
  3. The next day, write and emergency patch for something
  4. Check git status
  5. Suddenly I remember that I’m in the middle of a merge
  6. Do unpleasant surgery to get my git repo back to a sane state

If you tend to fall into a similar workflow, I highly recommend modifying your prompt to something like:

$ PS1='w$(__git_ps1)> '
~/gitroot/mongo (v2.0)> 

And suddenly life is better, which is what Linux is all about.

You can run PS1=... on a per-shell basis (or export PS1 for all shells this session) or add that line to your ~/.bashrc file to get that prompt in all future shells.

Finally, __git_ps1 isn’t a binary in your path, it’s a function (so it won’t show up if you run which __git_ps1). You can see its source by running type __git_ps1.

Note: unfortunately this doesn’t work for me on OS X, so I think it might be Linux-only.

––thursday #2: diff ‘n patch

I’m trying something new: every Thursday I’ll do a short post on how to do something with the command line.

I always seem to either create or apply patches in the wrong direction. It’s like stalagmites vs. stalactites, which I struggled with until I heard the nemonic: “Stalagmites might hang from the ceiling… but they don’t.”

Moving right along, you can use diff to get line-by-line changes between any two files. Generally I use git diff because I’m dealing with a git repo, so that’s what I’ll use here.

Let’s get a diff of MongoDB between version 2.0.2 and 2.0.3.

$ git clone git://github.com/mongodb/mongo.git
$ cd mongo
$ git diff r2.0.2..r2.0.3 > mongo.patch

This takes all of the changes between 2.0.2 and 2.0.3 (r2.0.2..r2.0.3) and dumps them into a file called mongo.patch (that’s the > mongo.patch part).

Now, let’s get the code from 2.0.2 and apply mongo.patch, effectively making it 2.0.3 (this is kind of a silly example but if you’re still with me after the stalagmite thing, I assume you don’t mind silly examples):

$ git checkout r2.0.2
Note: checking out 'r2.0.2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 514b122... BUMP 2.0.2
$ 
$ patch -p1 < mongo.patch

What intuitive syntax!

What does the -p1 mean? How many forward slashes to remove from the path in the patch, of course.

To take an example, if you look at the last 11 lines of the patch, you can see that it is the diff for the file that changes the version number. It looks like this:

$ tail -n 11 mongo.patch
--- a/util/version.cpp
+++ b/util/version.cpp
@@ -38,7 +38,7 @@ namespace mongo {
      *      1.2.3-rc4-pre-
      * If you really need to do something else you'll need to fix _versionArray()
      */
-    const char versionString[] = "2.0.2";
+    const char versionString[] = "2.0.3";
 
     // See unit test for example outputs
     static BSONArray _versionArray(const char* version){

Note the a/util/version.cpp and b/util/version.cpp. These indicate the file the patch should be applied to, but there are no a or b directories in the MongoDB repository. The a and b prefixes indicate that one is the previous version and one is the new version. And -p says how many slashes to strip from this path. An example may make this clearer:

  • -p0 (equivalent to not specifying -p): “apply this patch to a/util/version.cpp” (which doesn’t exist)
  • -p1: “apply this patch to util/version.cpp” ← bingo, that’s what we want
  • -p2: “apply this patch to version.cpp” (which doesn’t exist)

So, we use -p1, because that makes the patch’s paths match the actually directory structure. If someone sent you a patch and the path is something like /home/bob/bobsStuff/foo.txt and your name is not Bob, you’re just trying to patch foo.txt, you’d probably want to use -p4.

On the plus side, if you’re using patches generated by git, they’re super-easy to apply. Git chose the intuitive verb “apply” to patch a file. If you have a patch generated by git diff, you can patch your current tree with:

$ git apply mongo.patch

So, aside from the stupid choice of verbiage, that is generally easier.

Did I miss anything? Get anything wrong? Got a suggestion for next week? Leave a comment below and let me know!

––thursday #1: screen

I’m trying something new: every Thursday I’ll go over how to do something with the command line. Let me know what you think.

If you are using a modern-ish browser, you probably use tabs to keep multiple things open at once: your email, your calendar, whatever you’re actually doing, etc. You can do the same thing with the shell using screen: in a single terminal, you can compile a program while you’re editing a file and watching another process out of the corner of your eye.

Note that screen is super handy when SSH’d into a box. SSH in once, then start screen and open up all of the windows you need.

Using screen

To start up screen, run:

$ screen

Now your shell will clear and screen will give you a welcome message.


Screen version 4.00.03jw4 (FAU) 2-May-06

Copyright (c) 1993-2002 Juergen Weigert, Michael Schroeder
Copyright (c) 1987 Oliver Laumann

...




                          [Press Space or Return to end.]

As it says at the bottom, just hit Return to clear the welcome message. Now you’ll see an empty prompt and you can start working normally.

Let’s say we have three things we want to do:

  1. Run top
  2. Edit a file
  3. Tail a log

Go ahead and start up top:

$ top

Well, now we need to edit a file but top‘s using the shell. What to do now? Just create a new window. While top is still running, hit ^A c (I’m using ^A as shorthand for Control-a, so this means “hit Control-a, then hit c”) to create a new window. The new window gets put right on top of the old one, so you’ll see a fresh shell and be at the prompt again. But where did top go? Not to worry, it’s still there. We can switch back to it with ^A n or ^A p (next or previous window).

Now we can start up our editor and begin editing a file. But now we want to tail a file, so we create another new window with ^A c and run our tail -f filename. We can continue to use ^A n and ^A p to switch between the three things we’re doing (and open more windows as necessary).

Availability

screen seems pretty ubiquitous, it has been on every Linux machine I’ve ever tried running it on and even OS X (although it may be part of XCode, I haven’t checked).

Note for Emacs Users

^A is an annoying escape key, as it is also go-to-beginning-of-line shortcut in Emacs (and the shell). To fix this, create a .screenrc file and add one line to change this to something else:

# use ^T
escape ^Tt
# or ^Y
escape ^Yy

The escape sequence is 3 characters: carat, T, and t. (It is not using the single special character “^T”.) The traditional escape key is actually Ctrl-^, as the carat is the one character Emacs doesn’t use for anything. In a .screenrc file, this results in the rather bizarre string:

escape ^^^^

…which makes sense when you think about it, but looks a bit weird.

Odds and Ends

As long as you’re poking at the .screenrc file, you might want to turn off the welcome message, too:

startup_message off

Run ^A ? anytime for help, or check out the manual’s list of default bindings.

Did I miss anything? Get anything wrong? Got a suggestion for next week? Leave a comment below and let me know!