Kristina Chodorow's Blog

Introducing Noodlin – A Brainstorming App

I’ve been working on a web app, Noodlin, for brainstorming online. Basically, Noodlin just lets you create notes and connect them. I’ve been using it for taking notes during meetings, figuring out who gets what for the holidays, and organizing The Definitive Guide. I think it might be handy for people studying for finals this time of year, too.

I find it really difficult to be creative while looking at a text editor: it’s just not a good form factor for organizing thoughts, taking notes, and coming up with new ideas. There’s a whole class of problems where people say, “Let’s go find a whiteboard” or start sticking post-its to a wall. Noodlin is an attempt to make this kind of thinking easier to do on a computer.

Some features that I think are cool:

You can collaborate with friends and coworkers: anyone can add their own notes and connections to a board (each collaborator gets their own “layer”).
Boards are infinite in size: drag the board itself to move it around.
Notes can use Markdown for nice formatting.
You can create private boards by clicking on the lock beside the board title. Share with select users by adding a note with @theirUsername to the board and then selecting “Share” from the menu that appears. Everyone else will see a 404.

There are still a few rough edges, too: it doesn’t work very well on IE or mobile devices. I’m looking for a graphic designer to do some freelance work, does anyone know someone good? Looking for someone who can do stuff in with an Edward Gorey/Don Kenn look.

I’ve been working on Noodlin as a side project for a couple of months now and it’s an interesting change from database programming: most of the app is JavaScipt, there are only a few thousand lines of server-side code.

Please give it a try and let me know what you think!

MongoDB changing default: now write errors are reported

I’m really happy to share that, in a coordinated effort, all official MongoDB drivers are changing their defaults to return a response from writes today.

I think that this is kind of a turning point: MongoDB is finally “newbie safe.” You can just spin up a mongod and it’ll default to journaling being on. Then you write to it from a client and it’ll default to telling you if a write didn’t go through.

There are a lot of awesome “quiet” changes like this going into 2.4. I’m actually pretty excited about all these little improvements: sexy features like aggregation and TTL collections are all well and good, but we’re in the process of making some structural improvements that should pay big dividends in the future. Now, back to coding so you guys might actually get some new $-mods in 2.4 🙂

TDG Update

Screen shot of my PDF viewer with TDG’s titlebar

I just hit 300 pages! (O’Reilly has a nice system where it automatically compiles my XML into a PDF, so I can obsessively check page count). The last edition topped out at just over 200 pages, which was nice: you could actually sit down and read the thing in a reasonable amount of time and not have your lap fall asleep.

I don’t think this edition is going to be small enough to do that because MongoDB itself has gotten bigger. I couldn’t cover all of the things a developer needs to know in less than, well, 300 pages (and I’m not even close to done). I feel like Mike and I did a good job covering MongoDB two years ago (I’ve had to change almost nothing in the first few chapters), but it’s just a larger product now.

Finally, thank you to everyone that sent me schemas! I got a lot bigger response than I expected and I’m running way behind on getting back to people, so I’m sorry if I haven’t emailed you back yet! However, I really appreciate all of your contributions.

Got any advice?

I was interviewing an potential summer intern yesterday (hey college students, apply to be an intern at 10gen!) and at the end she asked me, “I’ve never been interviewed by a female programmer before. Do you have any advice for me, being a female in a computer science?”

I had no idea what to tell her. Other than specific, non-gendered stuff like “learn how to use Linux” and general platitudes like “don’t let the bastards grind you down,” I couldn’t think of anything to say. So, what advice would you give a female CS student?

The most memorable piece of advice I got in college from a female engineer was, “You will cry at work. Try to make it to the bathroom before you start bawling,” which wasn’t terribly helpful. (And I haven’t yet, HA!)

How MongoDB’s Journaling Works

I was working on a section on the gooey innards of journaling for The Definitive Guide, but then I realized it’s an implementation detail that most people won’t care about. However, I had all of these nice diagrams just laying around.

Good idea, Patrick!

So, how does journaling work? Your disk has your data files and your journal files, which we’ll represent like this:

When you start up mongod, it maps your data files to a shared view. Basically, the operating system says: “Okay, your data file is 2,000 bytes on disk. I’ll map that to memory address 1,000,000-1,002,000. So, if you read the memory at memory address 1,000,042, you’ll be getting the 42nd byte of the file.” (Also, the data won’t necessary be loaded until you actually access that memory.)

This memory is still backed by the file: if you make changes in memory, the operating system will flush these changes to the underlying file. This is basically how mongod works without journaling: it asks the operating system to flush in-memory changes every 60 seconds.

However, with journaling, mongod makes a second mapping, this one to a private view. Incidentally, this is why enabling journalling doubles the amount of virtual memory mongod uses.

Note that the private view is not connected to the data file, so the operating system cannot flush any changes from the private view to disk.

Now, when you do a write, mongod writes this to the private view.

mongod will then write this change to the journal file, creating a little description of which bytes in which file changed.

The journal appends each change description it gets.

At this point, the write is safe. If mongod crashes, the journal can replay the change, even though it hasn’t made it to the data file yet.

The journal will then replay this change on the shared view.

Then mongod remaps the shared view to the private view. This prevents the private view from getting too “dirty” (having too many changes from the shared view it was mapped from).

Finally, at a glacial speed compared to everything else, the shared view will be flushed to disk. By default, mongod requests that the OS do this every 60 seconds.

And that’s how journaling works. Thanks to Richard, who gave the best explanation of this I’ve heard (Richard is going to be teaching an online course on MongoDB this fall, if you’re interested in more wisdom from the source).

Go Get a Hot Water Bottle

If you don’t own one, go order an old-school hot water bottle. You can get one on Amazon for ~$10 and they feel amazing when you have a fever and your feet are freezing. They are also super-easy to use: just fill it up with hot tap water and they let out a nice even heat for ~8 hours. I am just so impressed with this technology. It’s like the Apple of cozy feet.

Anyway, I recommend getting one before you get sick.

––thursday #7: git-new-workdir

Often I’ll fix a bug (call it “bug A”), kick off some tests, and then get stuck. I’d like to start working on bug B, but I can’t because the tests are running and I don’t want to change the repo while they’re going. Luckily, there’s a git tool for that: git-new-workdir. It basically creates a copy of your repo somewhere else on the filesystem, with all of your local branches and commits.

git-new-workdir doesn’t actually come with git-core, but you should have a copy of the git source anyway, right?

$ git clone https://github.com/git/git.git

Copy the git-new-workdir script from contrib/workdir to somewhere on your $PATH. (There are some other gems in the contrib directory, so poke around.)

Now go back to your repository and do:

$ git-new-workdir ./ ../bug-b

This creates a directory one level up called bug-b, with a copy of your repo.

Thanks to Andrew for telling me about this.

Edited to add: Justin rightly asks, what’s the difference between this and local clone? The difference is that git-new-workdir creates softlinks everything in your .git directory, so your commits in bug-b appear in your original repository.

How to Make Your First MongoDB Commit

10gen is hiring a lot of people straight out of college, so I thought this guide would be useful.

Basically, the idea is: you have found and fixed a bug (so you’ve cloned the mongo repository, created a branch named SERVER-1234, and committed your change on it). You’ve had your fix code-reviewed (this page is only accessible to 10gen wiki accounts). Now you’re ready to submit your change, to be used and enjoyed by millions (no pressure). But how do you get it into the main repo?

Basically, this is the idea: there’s the main MongoDB repo on Github, which you don’t have access to (yet):

However, you can make your own copy of the repo, which you do have access to:

So, you can put your change in your repo and then ask one of the developers to merge it in, using a pull request.

That’s the 1000-foot overview. Here’s how you do it, step-by-step:

Create a Github account.
Go to the MongoDB repository and hit the “Fork” button.
Now, if you go to https://www.github.com/yourUsername/mongo, you’ll see that you have a copy of the repository (replace yourUsername with the username you chose in step 1). Now you have this setup:

Add this repository as a remote locally:

$ git remote add me git@github.com:yourUsername/mongo.git

Now you have this:

Now push your change from your local repo to your Github repo, do:
```
$ git push me SERVER-1234
```
Now you have to make a pull request. Visit your fork on Github and click the “Pull Request” button.
This will pull up Github’s pull request interface. Make sure you have the right branch and the right commits.
Hit “Send pull request” and you’re done!

A Neat C Preprocessor Trick

I’ve been looking at Clang and they define lexer tokens in a way that I thought was clever.

The challenge is: how do you keep a single list of language tokens but use them as both an enum and a list of strings?

Clang defines C token types in a file, TokenKinds.def, with all of the names of the different C language tokens (pretend C only has four tokens for now):

#ifndef TOK
#define TOK(X)
#endif

TOK(comment)
TOK(identifier)
TOK(string_literal)
TOK(char_constant)

#undef TOK

If you just #include this file, the preprocessor defines TOK(X) as “” (nothing), so the whole thing becomes an empty file.

However! When they want a declaration of all possible tokens that could be used, they makes an enum of this list like this:

enum TokenKind = {
#define TOK(X) X,
#include "clang/Basic/TokenKinds.def"
    NUM_TOKENS
};

Because TOK is defined when TokenKinds.def is included, the preprocessor will spit out something like:

enum TokKind = {
    comment,
    identifier,
    string_literal,
    char_constant,
    NUM_TOKENS
};

This has the nice property that you can check if a type is valid by making sure that it is less than NUM_TOKENS. But if we’re going to put the tokens into that enum, woudln’t it be clearer just to put them there, instead of in a separate file? Maybe, but doing it this way gives them a nice way to get a string representation of the types, too. In another file, they do:

const char* const TokNames[] = {
#define TOK(X) #X,
#include "clang/Basic/TokenKinds.def"
    0
};

“#X” means that the preprocessor replaces X and surrounds it in quotes, so that turns into:

const char* const TokNames[] = {
    "comment",
    "identifier",
    "string_literal",
    "char_constant",
    0
};

Now if they have a token, they can say TokNames[token.kind] to get the string name of that token. It lets them use the token types efficiently, print them out nicely for debugging, and not have to maintain multiple lists of tokens.

Call for Schemas

mongodbtdg — The Return of the Mongoose Lemur

I just started working on MongoDB: The Definitive Guide, 2nd Edition! I’m planning to add:

Lots of ops info
Real-world schema design examples
Coverage of new features since 2010… so quite a few

However, I need your help on the schema design part! I want to include some real-world schemas people have used and why they worked (or didn’t). If you’re working on something 1) interesting and 2) non-confidential and you’d like to either share or get some free advice (or both), please email me (kristina at 10gen dot com) or leave a comment below. I’ll set up a little interview with you.

I am particularly looking for “cool” projects (video games, music, TV, sports), recognizable companies (Fortune 50 & HackerNews 500*), and geek elite (Linux development, research labs, robots, etc.). However, if you’re working on something you think is interesting that doesn’t fall into any of those categories, I’d love to hear about it!

* There isn’t really a HackerNews 500, I mean projects that people in the tech world recognize and thinks are pretty cool (DropBox, Github, etc.).