Kristina Chodorow's Blog

How to Choose a Shard Key: The Card Game

Choosing the right shard key for a MongoDB cluster is critical: a bad choice will make you and your application miserable. Shard Carks is a cooperative strategy game to help you choose a shard key. You can try out a few pre-baked strategies I set up online (I recommend reading this post first, though). Also, this won’t make much sense unless you understand the basics of sharding.

Mapping from MongoDB to Shard Carks

This game maps the pieces of a MongoDB cluster to the game “pieces:”

Shard – a player.
Some data – a playing card. In this example, one card is ~12 hours worth of data.
Application server – the dealer: passes out cards (data) to the players (shards).
Chunk – 0-4 cards defined by a range of cards it can contain, “owned” by a single player. Each player can have multiple chunks and pass chunks to other players.

Instructions

Before play begins, the dealer orders the deck to mimic the application being modeled. For this example, we’ll pretend we’re programming a news application, where users are mostly concerned with the latest few weeks of information. Since the data is “ascending” through time, it can be modeled by sorting the cards in ascending order: two through ace for spades, then two through ace of hearts, then diamonds, then clubs for the first deck. Multiple decks can be used to model longer periods of time.

Once the decks are prepared, the players decide on a shard key: the criteria used for chunk ranges. The shard key can be any deterministic strategy that an independent observer could calculate. Some examples: order dealt, suit, or deck number.

Gameplay

The game begins with Player 1 having a single chunk (chunk1). chunk1 has 0 cards and the shard key range [-∞, ∞).

Each turn, the dealer flips over a card, computes the value for the shard key, figures out which player has a chunk range containing that key, and hands the card to that player. Because the first card’s shard key value will obviously fall in the range [-∞, ∞), it will go to Player 1, who will add it to chunk1. The second and the third cards go to chunk1, too. When the fourth card goes to chunk1, the chunk is full (chunks can only hold up to four cards) so the player splits it into two chunks: one with a range of [-∞, mid_chunk1), the other with a range of [mid_chunk1, ∞), where mid_chunk1 is the midpoint shard key value for the cards in chunk1, such that two cards will end up in one chunk and two cards will end up in the other.

The dealer flips over the next card and computes the shard key’s value. If it’s in the [-∞, mid_chunk1) range, it will be added to that chunk. If it’s in the [mid_chunk1, ∞) range, it will be added to that chunk.

Splitting

Whenever a chunk gets four cards, the player splits the chunk into two 2-card chunks. If a chunk has the range [x, z), it can be split into two chunks with ranges [x, y), [y, z), where x < y < z.

Balancing

All of the players should have roughly the same number of chunks at all times. If, after splitting, Player A ends up with more chunks than Player B, Player A should pass one of their chunks to Player B.

Strategy

The goals of the game are for no player to be overwhelmed and for the gameplay to remain easy indefinitely, even if more players and dealers are added. For this to be possible, the players have to choose a good shard key. There are a few different strategies people usually try:

Sample Strategy 1: Let George Do It

The players huddle together and come up with a plan: they’ll choose “order dealt” as the shard key.

The dealer starts dealing out cards: 2 of spades, 3 of spades, 4 of spades, etc. This works for a bit, but all of the cards are going to one player (he has the [x, ∞) chunk, and each card’s shard key value is closer to ∞ than the preceding card’s). He’s filling up chunks and passing them to his friends like mad, but all of the incoming cards are added to this single chunk. Add a few more dealers and the situation becomes completely unmaintainable.

Ascending shard keys are equivalent to this strategy: ObjectIds, dates, timestamps, auto-incrementing primary keys.

Sample Strategy 2: More Scatter, Less Gather

When George falls over dead from exhaustion, the players regroup and realize they have to come up with a different strategy. “What if we go the other direction?” suggests one player. “Let’s have the shard key be the MD5 hash of the order dealt, so it’ll be totally random.” The players agree to give it a try.

The dealer begins calculating MD5 hashes with his calculator watch. This works great at first, at least compared to the last method. The cards are dealt at a pretty much even rate to all of the players. Unfortunately, once each player has a few dozen chunks in front of them, things start to get difficult. The dealer is handing out cards at a swift pace and the players are scrambling to find the right chunk every time the dealer hands them a card. The players realize that this strategy is just going to get more unmanageable as the number of chunks grows.

Sharding keys equivalent to this strategy: MD5 hashes, UUIDs. If you shard on a random key, you lose data locality benefits.

Sample Strategy 3: Combo Plate

What the players really want is something where they can take advantage of the order (like the first strategy) and distribute load across all of the players (like the second strategy). They figure out a trick: couple a coarsely-grained order with the random element. “Let’s say everything in a given deck is ‘equal,'” one player suggests. “If all of the cards in a deck are equivalent, we’ll need a way of splitting chunks, so we’ll also use the MD5 hash as a secondary criteria.”

The dealer passes the first four cards to Player 1. Player 1 splits his chunk and passes the new chunk to Player 2. Now the cards are being evenly distributed between Player 1 and Player 2. When one of them gets a full chunk again, they split it and hand a chunk to Player 3. After a few more cards, the dealer will be evenly distributing cards among all of the players because within a given deck, the order the players are getting the cards is random. Because the cards are being split in roughly ascending order, once a deck has finished, the players can put aside those cards and know that they’ll never have to pick them up again.

This strategy manages to both distribute load evenly and take advantage of the natural order of the data.

Applying Strategy 3 to MongoDB

For many applications where the data is roughly chronological, a good shard key is:

{<coarse timestamp> : 1, <search criteria> : 1}

The coarseness of the timestamp depends on your data: you want a bunch of chunks (a chunk is 200MB) to fit into one “unit” of timestamp. So, if 1 month’s worth of writes is 30GB, 1 month is a good granularity and your shard key could start with {"month" : 1}. If one month’s worth of data is 1 GB you might want to use the year as your timestamp. If you’re getting 500GB/month, a day would work better. If you’re inserting 5000GB/sec, sub-second timestamps would qualify as “coarse.”

If you only use a coarse granularity, you’ll end up with giant unsplittable chunks. For example, say you chose the shard key {"year" : 1}. You’d get one chunk per year, because MongoDB wouldn’t be able to split chunks based on any other criteria. So you need another field to target queries and prevent chunks from getting too big. This field shouldn’t really be random, as in Strategy 3, though. It’s good to group data by the criteria you’ll be looking for it by, so a good choice might be username, log level, or email, depending on your application.

Warning: this pattern is not the only way to choose a shard key and it won’t work well for all applications. Spend some time monitoring, figuring out what your application’s write and read patterns are. Setting up a distributed system is hard and should not be taken lightly.

How to Use Shard Carks

If you’re going to be sharding and you’re not sure what shard key to choose, try running through a few Shark Carks scenarios with coworkers. If a certain player start getting grouchy because they’re having to do twice the work or everyone is flailing around trying to find the right cards, take note and rethink your strategy. Your servers will be just as grumpy and flailing, only at 3am.

If you don’t have anyone easily persuadable around, I made a little web application for people to try out the strategies mentioned above. The source is written in PHP and available on Github, so feel free to modify. (Contribute back if you come up with something cool!)

And that’s how you choose a shard key.

Wireless dongle review

A dongle is a USB thingy (as you can see, I’m very qualified in this area) that lets you connect your computer to the internet wherever you go. It uses the same type of connection your cellphone data plan uses (3G or 4G).

A few months ago, Clear asked if they could send me a free sample dongle, as I am such a prestigious tech blogger. And I, being a sucker for free things (take note marketers) agreed to try out their dongle. And I have to say, it’s been pretty cool having free wifi wherever I go. The good bits:

It is very handy, especially when traveling. Waiting for hours in cold, smelly terminals become much more bearable. If I traveled more, I’d definitely get my own dongle (or try to get work to get one for me).
I could use the dongle on multiple laptops. I was worried about this, it seems like a lot of companies grub for money by binding devices like this to a single machine so you have to buy one for each computer you have (and who has just one computer?). It only supported Mac and Windows, though, so minor ding for that.
Andrew and I watched Law and Order (Netflix) using it and there was no noticeable difference in quality from our landline. I didn’t do a proper speed test, partly because I’m lazy and partly because I didn’t care. (If you know me IRL and want to do one, let me know and I’ll lend the dongle to you.)

But… there aren’t a whole lot of places I go where I don’t have free wifi already. Almost all of the coffeeshops and bookstores (and even bars) I go to already advertise free wifi. I used the dongle maybe once a week. I’ll miss it when my free trial runs out, but I won’t miss it $55-per-month-worth.

Also, I should be able to get the same sort of behavior by tethering my cellphone–if Sprint didn’t cripple their cellphones to prevent you from tethering. I actually don’t like having a phone, period, so when my contract runs out I’ll probably get a phone with just a data plan and a less douchey carrier.

So, my conclusions are: it’s super handy, but my cellphone should really be able to serve the same function. But that’s just me, and it is really cool being able to go online anywhere.

Setting Up Your Interview Toolbox

This post covers a couple “toolbox” topics that are easy to brush up on before the technical interview.

I recently read a post that drove me nuts, written by someone looking for a job. They said:

I can’t seem to crack the on-site coding interviews… [Interviews are geared towards] those who can suavely implement a linked list code library (inserting, deleting, reversing) as well as a data structure using that linked list (i.e. a stack) on a white board, no syntax errors, compilable, all error paths covered, interfaces cleanly buttoned up. Lather, rinse, repeat for binary search trees and sorting algorithms.

These are a programmer’s multiplication tables! If someone asked me “what’s 6×15?” on an interview, I wouldn’t throw my hands up and complain that I learned it 20 years ago, I’d be fucking thrilled that they had given me such a softball question.

Believe me, if you can’t figure out my basic algorithm questions, you do not want me to ask my “fun” questions.

If you’re looking for a job, I’d recommend accepting that interviewers want to see you know your multiplication tables and spend a few hours cramming if you need to. Make sure you have a basic toolbox set up in your brain, covering a couple basic categories:

Data structures: hashes, lists, trees – know how to implement them and common manipulations and searches.
Algorithms: sorts, recursion, search – simple algorithm problems. “Algorithms” covers a lot of ground, but at the very least know how to do the basic sorts (merge, quick, selection), recursion, and tree searches. They come up a lot. Also, make sure you know when to apply them (or they won’t be very useful).
Bit twiddling – this is mainly for C and C++ positions. I like to see if people know how to manipulate their bits (oh la la). This varies on the company, though, I doubt a Web 2.0 site is going to care that you know your bit shifts backwards and forwards (or, rather, left and right).

If you are applying for a language-specific job, the interviewer will probably ask you about some specifics. A good interviewer shouldn’t try to trap you with obscure language trivia, but make sure you’re familiar with the basics. So, if you’re applying for, say, a Java position, get comfortable with java.lang, java.util, how garbage collection works, basic synchronization, and know that Strings are immutable.

Protip: when I was looking for a job, every single place I interviewed asked me about Java’s public/protected/private keywords. Nearly all of them asked about final, too.

Don’t freak out if you get up to the board and can’t remember whether it’s foo.toString() or (String)foo, or if you forget a semicolon. Any reasonable interviewer knows that it’s hard to program on a whiteboard and doesn’t expect compiler-ready code. On the other hand, if your resume says you’ve been doing C for 10 years and you allocate an array of chars as char *x[], we expect you to laugh and understand your mistake when we point it out (I know I might do something like that out of nerves, so I wouldn’t hold it against you as long as you understood the problem).

Good luck out there. Remember that, if a company brings you in for an interview, they want to hire you. Do everything you can to let them!

How I Became a Programmer

cims — NYU's asbestos-filled math and CS building where I spent my undergrad

I started programming when I was 20. My original college plan was to major in mathematics and become a saxophonist (I didn’t feel like starving while I tried to make it as a musician).

Luckily, I had a crush on a computer science major so I tagged along with him to a programming team meeting. Progteam blew my mind: programming was like math, only fun! Majoring in math made me feel smart and dignified, but it was never like “Wow, this it fun.” It was more like “Ow, my brain hurts, but I guess it’s building brain muscles…”

It turned out I was good at computer science, so I decided (somewhat randomly) that I was getting into MIT for grad school, dammit. I knew they’d want to see research, so I asked a professor to mentor an independent research project. Over the next year, I did researched a classic optimization algorithm and wrote a paper on an algorithm I came up with to improve its performance for certain cases.

The problem was that, when the time came to apply to grad school, I wasn’t sure I wanted to go to at all anymore. I had liked learning about optimization and coming up with a new algorithm, but I had hated research, in and of itself. I asked my parents for advice.

“Just apply,” they said. “Keep your options open.”

mitbuildings — The computer science building at MIT

Grad school had been my goal for a while, so I applied to a couple of PhD programs. I half hoped that they would all reject me and make the choice easier. Of course, they all accepted me, even MIT (poor me</sarcasm>). I thought about it some more and told my parents that I still didn’t think I wanted to go.

“Just try a semester,” they said. “You can always leave if you don’t like it.”

I ended up accepting Columbia, not MIT. I had really liked every professor I met at Columbia, which I figured would give me more advisor options. Unfortunately, I continued to hate research and I was thoroughly sick of school. The next three months the most miserable of my life.

“Just stick it out,” said my parents. “Until you get a master’s degree, at least.”

I finally put my foot down. Usually they have good advice but I realized that this was their thing, not mine. I dropped out of grad school and got a job I loved. My parents were happy that I was happy and got over the disappointment that I would never be Dr. Chodorow. I’m still at the same job and couldn’t be happier.

So, in the spirit of Thanksgiving, I’m really thankful that I lucked into discovering computer science. Math kind of sucks.

Firesheep: Internet Snooping made Easy

If you use an open wifi network, people around you can see what you’re doing. They not only can look at your accounts, but log in as you with a double click. Even if you’re non-technical (especially if you’re non-technical!) you should know how this works and how to protect your accounts. Here’s what’s happening:

When you use wireless internet you are sending information through the air from your computer to a router* somewhere. This information is like broadcasting your own little radio station: it can be picked up and seen by anyone in the area. The problem is, your radio station is broadcasting you checking email, updating your OkCupid profile, writing stupid messages to friends on Facebook… activities that you don’t want random “listeners” to know about.

To keep your radio station private, websites support encoding all of the data you send so it looks like gibberish to anyone on the outside. So, when you sign into Gmail (or Amazon or Chase) your computer turns your username and password into gibberish and sends it into the air. The website receives the message, decodes the gibberish, and says “Now that you’ve given me your credentials, I’ll assume you’re Joe Shmoe if you give me the unlikely combination of digits ‘874328972387498234’ every time you make a request.” And then most sites stop encoding anything.

So, when you post a status update to your wall, you send along “874328972387498234” as clear as day and Facebook says “Aha, it’s you. Okay, I’ll post that.”

However, remember that you’re broadcasting this on your own personal radio station. Well, someone finally built a tuner, called Firesheep. If you have Firesheep installed and you sit down in a coffeeshop (or anywhere with an open wifi network), you are logged in as everyone around you to every site the other patrons are visiting.

Important takeaways for non-geeks:

Don’t access any accounts you care about via a public wifi connection. There is an embarrassingly long list of sites built into Firesheep: Amazon, Cisco, Facebook, Flickr, Google, New York Times, Twitter, WordPress, Yahoo, and many others. My mom could figure out how to use Firesheep and it would take a geek ~10 minutes to add a new site.
This “hack” cannot be patched globally by flipping a switch. Each website needs to fix itself. It is analogous to a locksmith discovering that every lock can be unlocked by whistling at it: everyone needs to go and improve their locks, we can’t outlaw whistling.
There’s no easy way, other than not using your accounts, to prevent people from seeing what you’re doing. The easiest ways I can think of off the top of my head are setting up Tor or a VPN, which are beyond the abilities (or at least interest) of most non-geeks I know.
Gmail encodes everything, by default. Your Google account will pop up in Firesheep (see the screenshot above), but people won’t actually be able to access your email. Also, any bank or reasonably professional payment system will be secure (look for the little lock symbol in the corner of your browser or https:// in the address bar). You can log into someone’s Amazon account with Firesheep, but you can’t do any payment stuff.

The code for Firesheep is open source and available on Github. You can try it out by starting up Firefox, downloading Firesheep, going to File->Open File and selecting the file you just downloaded. You may have to select View->Sidebar->Firesheep if it doesn’t pop up automatically.

That’s it, it’s ready to start capturing data from other people on your wifi network.

* Geeks: I know it’s not necessarily a router, but most lay people know that a router is where internet comes out and it’s close enough.

Bending the Oplog to Your Will

Part 3 of the replication internals series: three handy tricks.

DIY triggers
Using the oplog for crash recovery
Creating non-replicated collections

This is the third post in a three-part series on replication. See also parts 1 (replication internals) and 2 (getting to know your oplog).

DIY triggers

MongoDB has a type of query that behaves like the tail -f command: it shows you new data as it’s written to a collection. This is great for the oplog, where you want to see new records as they pop up and don’t want to query over and over.

If you want this type of ongoing query, MongoDB returns a tailable cursor. When this cursor gets to the end of the result set it will hang around and wait for more elements to be added to the collection. As they’re added, the cursor will return them. If no elements are added for a while, the cursor will time out and the client has to requery if they want more results.

Using your knowledge of the oplog’s format, you can use a tailable cursor to do a long pull for activities in a certain collection, of a certain type, at a certain time… almost any criteria you can imagine.

Using the oplog for crash recovery

Suppose your database goes down, but you have a fairly recent backup. You could put a backup into production, but it’ll be a bit behind. You can bring it up-to-date using your oplog entries.

If you use the trigger mechanism (described above) to capture the entire oplog and send it to a non-capped collection on another server, you can then use an oplog replayer to play the oplog over your dump, bringing it as up-to-date as possible.

Pick a time pre-dump and start replaying the oplog from there. It’s okay if you’re not sure exactly when the dump was taken because the oplog is idempotent: you can apply it to your data as many times as you want and your data will end up the same.

Also, warning: I haven’t tried out the oplog replayer I linked to, it’s just the first one I found. There are a few different ones out there and they’re pretty easy to write.

Creating non-replicated collections

The local database contains data that is local to a given server: it won’t be replicated anywhere. This is one reason why it holds all of the replication info.

local isn’t reserved for replication stuff: you can put your own data there, too. If you do a write in the local database and then check the oplog, you’ll notice that there’s no record of the write. The oplog doesn’t track changes to the local database, since they won’t be replicated.

And now I’m all replicated out. If you’re interested in learning more about replication, check out the core documentation on it. There’s also core documentation on tailable cursors and language-specific instructions in the driver documentation.

How not to get a job with a startup

10gen is in super-recruiting mode, trying to scoop up all the great graduates before Google and Microsoft absorb them. I’ve been doing what feels like endless recruiting activities, and I’ve noticed that lot of applicants shoot themselves in the foot. So, here’s what not to do:

First contact

Don’t: contact the startup before you know what they do. I’ve recruited at a couple college job fairs and almost everyone comes up and says, “Hi, I’m a masters student in computer science and I’m looking for a job. Can I give you my resume?” Yes, you can, and I’ll put it on the pile of 200 other resumes.

Also, please don’t walk me through your resume line-by-line: it’s boring. I’ll hate you and I won’t be able to think of a polite way of cutting you off.

Do: say, “I love MongoDB! I’ve been using it with Ruby for <some project> and I would love to work on it full time! I’m really interested in replication/sharding/geospatial/etc. stuff!” Keep in mind: you’re talking to startup employees. Working is our life (which sounds depressing, but we’re doing what we love). It’s annoying to have people apply who are looking for a job, any job, and obviously don’t give a crap what we do.

Startups tend to get romanticized (and I’m about to romanticize them out the wazoo), but working at one definitely isn’t for everyone. The salary isn’t as good, the job security is going to suck, it’s tons more work and investment than a “normal” company, and in all likelihood, after pouring your heart and soul into it for years, it’ll flop.

On the other hand, working at a startup is awesome. You get to do everything: I’ve done C socket programming and jQuery and everything in between. I’m two years out of school and manage release cycles and user communities. I’ve gotten to travel everywhere from Belgium to Brazil and written a book.

It’s a great match if you like being independent: not the Rambo-“don’t tie me down, baby”-independent, the “::snerk::, I like dinosaurs so I wrote a research paper on sauropods”-independent. You have to be willing to work hard under your own steam.

Your resume

Don’t: have a boring resume.

Your resume should prove that we are fools if we don’t bring you in for an interview.

If yours doesn’t, think about what your dream job would look for on your resume. Open source development? Independent research? A penchant for robot design? Now go out and get that stuff on your resume.

Don’t use fluffy language, your resume is going to be read by programmers, not managers. “Did in-depth research to enable optimization of processes” is going to make us groan. “Made a genome-crunching aggregation script 50 times faster by researching how Java memory allocation works” is going to make us go “cool!” Have you done other optimization research? Do you like benchmarking? Do you know a lot about Java internals? Heck, tell us about the human genome.

Your interview is going to be a lot more fun for everyone involved (and much more likely to actually occur) if you make us think, “this person sounds really interesting, I want to talk to them.”

When I was in college I had no idea what I wanted to do, other than a vague idea of “solving interesting problems.” So, you don’t exactly have to be dedicated to the cause to get a job at a startup. Just express some enthusiasm for what they do, write a kick-ass resume, and the rest is up to your technical ability.

Oh, and by the way: if you’re looking for an awesome job, 10gen is recruiting!

Getting to Know Your Oplog

blinkdog — Keeping with the theme: a blink dog.

This is the second in a series of three posts on replication internals. We’ve already covered what’s stored in the oplog, today we’ll take a closer look at what the oplog is and how that affects your application.

Our application could do billions of writes and the oplog has to record them all, but we don’t want our entire disk consumed by the oplog. To prevent this, MongoDB makes the oplog a fixed-size, or capped, collection (the oplog is actually the reason capped collections were invented).

When you start up the database for the first time, you’ll see a line that looks like:

Mon Oct 11 14:25:21 [initandlisten] creating replication oplog of size: 47MB... (use --oplogSize to change)

Your oplog is automatically allocated to be a fraction of your disk space. As the message suggests, you may want to customize it as you get to know your application.

Protip: you should make sure you start up arbiter processes with --oplogSize 1, so that the arbiter doesn’t preallocate an oplog. There’s no harm in it doing so, but it’s a waste of space as the arbiter will never use it.

Implications of using a capped collection

The oplog is a fixed size so it will eventually fill up. At this point, it’ll start overwriting the oldest entries, like a circular queue.

It’s usually fine to overwrite the oldest operations because the slaves have already copied and applied them. Once everyone has an operation there’s no need to keep it around. However, sometimes a slave will fall very far behind and “fall off” the end of the oplog: the latest operation it knows about is before the earliest operation in the master’s oplog.

oplog time ->

   ^         ^    ^        ^
   |         |    |        |
      slave         master

If this occurs, the slave will start giving error messages about needing to be resynced. It can’t catch up to the master from the oplog anymore: it might miss operations between the last oplog entry it has and the master’s oldest oplog entry. It needs a full resync at this point.

Resyncing

On a resync or an initial sync, the slave will make a note of the master’s current oplog time and call the copyDatabase command on all of the master’s databases. Once all of the master’s databases have been copied over, the slave makes a note of the time. Then it applies all of the oplog operations from the time the copy started up until the end of the copy.

Once it has completed the copy and run through the operations that happened during the copy, it is considered resynced. It can now begin replicating normally again. If so many writes occur during the resync that the slave’s oplog cannot hold them all, you’ll end up in the “need to resync” state again. If this occurs, you need to allocate a larger oplog and try again (or try it at a time when the system is has less traffic).

Next up: using the oplog in your application.

Replication Internals

displacer-beast — Displacer beast... seemed related (it's sort of in two places at the same time).

This is the first in a three-part series on how replication works.

Replication gives you hot backups, read scaling, and all sorts of other goodness. If you know how it works you can get a lot more out of it, from how it should be configured to what you should monitor to using it directly in your applications. So, how does it work?

MongoDB’s replication is actually very simple: the master keeps a collection that describes writes and the slaves query that collection. This collection is called the oplog (short for “operation log”).

The oplog

Each write (insert, update, or delete) creates a document in the oplog collection, so long as replication is enabled (MongoDB won’t bother keeping an oplog if replication isn’t on). So, to see the oplog in action, start by running the database with the –replSet option:

$ ./mongod --replSet funWithOplogs

Now, when you do operations, you’ll be able to see them in the oplog. Let’s start out by initializing out replica set:

> rs.initiate()

Now if we query the oplog you’ll see this operation:

> use local
switched to db local
> db.oplog.rs.find()
{ 
    "ts" : { "t" : 1286821527000, "i" : 1 }, 
    "h" : NumberLong(0), 
    "op" : "n", 
    "ns" : "", 
    "o" : { "msg" : "initiating set" } 
}

This is just an informational message for the slave, it isn’t a “real” operation. Breaking this down, it contains the following fields:

ts: the time this operation occurred.
h: a unique ID for this operation. Each operation will have a different value in this field.
op: the write operation that should be applied to the slave. n indicates a no-op, this is just an informational message.
ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.
o: the actual document representing the op. Since this is a no-op, this field is pretty useless.

To see some real oplog messages, we’ll need to do some writes. Let’s do a few simple ones in the shell:

> use test
switched to db test
> db.foo.insert({x:1})
> db.foo.update({x:1}, {$set : {y:1}})
> db.foo.update({x:2}, {$set : {y:1}}, true)
> db.foo.remove({x:1})

Now look at the oplog:

> use local
switched to db local
> db.oplog.rs.find()
{ "ts" : { "t" : 1286821527000, "i" : 1 }, "h" : NumberLong(0), "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } }
{ "ts" : { "t" : 1286821977000, "i" : 1 }, "h" : NumberLong("1722870850266333201"), "op" : "i", "ns" : "test.foo", "o" : { "_id" : ObjectId("4cb35859007cc1f4f9f7f85d"), "x" : 1 } }
{ "ts" : { "t" : 1286821984000, "i" : 1 }, "h" : NumberLong("1633487572904743924"), "op" : "u", "ns" : "test.foo", "o2" : { "_id" : ObjectId("4cb35859007cc1f4f9f7f85d") }, "o" : { "$set" : { "y" : 1 } } }
{ "ts" : { "t" : 1286821993000, "i" : 1 }, "h" : NumberLong("5491114356580488109"), "op" : "i", "ns" : "test.foo", "o" : { "_id" : ObjectId("4cb3586928ce78a2245fbd57"), "x" : 2, "y" : 1 } }
{ "ts" : { "t" : 1286821996000, "i" : 1 }, "h" : NumberLong("243223472855067144"), "op" : "d", "ns" : "test.foo", "b" : true, "o" : { "_id" : ObjectId("4cb35859007cc1f4f9f7f85d") } }

You can see that each operation has a ns now: “test.foo”. There are also three operations represented (the op field), corresponding to the three types of writes mentioned earlier: i for inserts, u for updates, and d for deletes.

The o field now contains the document to insert or the criteria to update and remove. Notice that, for the update, there are two o fields (o and o2). o2 give the update criteria and o gives the modifications (equivalent to update()‘s second argument).

Using this information

MongoDB doesn’t yet have triggers, but applications could hook into this collection if they’re interested in doing something every time a document is deleted (or updated, or inserted, etc.) Part three of this series will elaborate on this idea.

Next up: what the oplog is and how syncing works.

Scaling, scaling everywhere

Interested in learning more about scaling MongoDB? Pick up September’s issue of PHP|Architect magazine, the database issue! I wrote an article on scaling your MongoDB database: how to choose good indexes, help handle load using replication, and set up sharding correctly (it’s not PHP-specific).

If you prefer multimedia, I also did an O’Reilly webcast on scaling MongoDB, which you can watch below:

Unfortunately, I had some weird lag problems throughout and at the end it totally cut my audio, so I didn’t get to all of the questions. I asked the O’Reilly people to send me the unanswered questions, so I’ll post the answers as soon as they do (or you can post it again in the comments below).