Firesheep: Internet Snooping made Easy

A demo of Firesheep, courtesy of a fellow bus rider

If you use an open wifi network, people around you can see what you’re doing. They not only can look at your accounts, but log in as you with a double click. Even if you’re non-technical (especially if you’re non-technical!) you should know how this works and how to protect your accounts. Here’s what’s happening:

When you use wireless internet you are sending information through the air from your computer to a router* somewhere. This information is like broadcasting your own little radio station: it can be picked up and seen by anyone in the area. The problem is, your radio station is broadcasting you checking email, updating your OkCupid profile, writing stupid messages to friends on Facebook… activities that you don’t want random “listeners” to know about.

To keep your radio station private, websites support encoding all of the data you send so it looks like gibberish to anyone on the outside. So, when you sign into Gmail (or Amazon or Chase) your computer turns your username and password into gibberish and sends it into the air. The website receives the message, decodes the gibberish, and says “Now that you’ve given me your credentials, I’ll assume you’re Joe Shmoe if you give me the unlikely combination of digits ‘874328972387498234’ every time you make a request.” And then most sites stop encoding anything.

So, when you post a status update to your wall, you send along “874328972387498234” as clear as day and Facebook says “Aha, it’s you. Okay, I’ll post that.”

However, remember that you’re broadcasting this on your own personal radio station. Well, someone finally built a tuner, called Firesheep. If you have Firesheep installed and you sit down in a coffeeshop (or anywhere with an open wifi network), you are logged in as everyone around you to every site the other patrons are visiting.

Important takeaways for non-geeks:

  • Don’t access any accounts you care about via a public wifi connection. There is an embarrassingly long list of sites built into Firesheep: Amazon, Cisco, Facebook, Flickr, Google, New York Times, Twitter, WordPress, Yahoo, and many others. My mom could figure out how to use Firesheep and it would take a geek ~10 minutes to add a new site.
  • This “hack” cannot be patched globally by flipping a switch. Each website needs to fix itself. It is analogous to a locksmith discovering that every lock can be unlocked by whistling at it: everyone needs to go and improve their locks, we can’t outlaw whistling.
  • There’s no easy way, other than not using your accounts, to prevent people from seeing what you’re doing. The easiest ways I can think of off the top of my head are setting up Tor or a VPN, which are beyond the abilities (or at least interest) of most non-geeks I know.
  • Gmail encodes everything, by default. Your Google account will pop up in Firesheep (see the screenshot above), but people won’t actually be able to access your email. Also, any bank or reasonably professional payment system will be secure (look for the little lock symbol in the corner of your browser or https:// in the address bar). You can log into someone’s Amazon account with Firesheep, but you can’t do any payment stuff.

The code for Firesheep is open source and available on Github. You can try it out by starting up Firefox, downloading Firesheep, going to File->Open File and selecting the file you just downloaded. You may have to select View->Sidebar->Firesheep if it doesn’t pop up automatically.

That’s it, it’s ready to start capturing data from other people on your wifi network.

* Geeks: I know it’s not necessarily a router, but most lay people know that a router is where internet comes out and it’s close enough.

Bending the Oplog to Your Will

Brains...

Part 3 of the replication internals series: three handy tricks.

This is the third post in a three-part series on replication. See also parts 1 (replication internals) and 2 (getting to know your oplog).

DIY triggers

MongoDB has a type of query that behaves like the tail -f command: it shows you new data as it’s written to a collection. This is great for the oplog, where you want to see new records as they pop up and don’t want to query over and over.

If you want this type of ongoing query, MongoDB returns a tailable cursor. When this cursor gets to the end of the result set it will hang around and wait for more elements to be added to the collection. As they’re added, the cursor will return them. If no elements are added for a while, the cursor will time out and the client has to requery if they want more results.

Using your knowledge of the oplog’s format, you can use a tailable cursor to do a long pull for activities in a certain collection, of a certain type, at a certain time… almost any criteria you can imagine.

Using the oplog for crash recovery

Suppose your database goes down, but you have a fairly recent backup. You could put a backup into production, but it’ll be a bit behind. You can bring it up-to-date using your oplog entries.

If you use the trigger mechanism (described above) to capture the entire oplog and send it to a non-capped collection on another server, you can then use an oplog replayer to play the oplog over your dump, bringing it as up-to-date as possible.

Pick a time pre-dump and start replaying the oplog from there. It’s okay if you’re not sure exactly when the dump was taken because the oplog is idempotent: you can apply it to your data as many times as you want and your data will end up the same.

Also, warning: I haven’t tried out the oplog replayer I linked to, it’s just the first one I found. There are a few different ones out there and they’re pretty easy to write.

Creating non-replicated collections

The local database contains data that is local to a given server: it won’t be replicated anywhere. This is one reason why it holds all of the replication info.

local isn’t reserved for replication stuff: you can put your own data there, too. If you do a write in the local database and then check the oplog, you’ll notice that there’s no record of the write. The oplog doesn’t track changes to the local database, since they won’t be replicated.

And now I’m all replicated out. If you’re interested in learning more about replication, check out the core documentation on it. There’s also core documentation on tailable cursors and language-specific instructions in the driver documentation.

How not to get a job with a startup

Hugh Mongoose wants you
10gen is in super-recruiting mode, trying to scoop up all the great graduates before Google and Microsoft absorb them. I’ve been doing what feels like endless recruiting activities, and I’ve noticed that lot of applicants shoot themselves in the foot. So, here’s what not to do:

First contact

Don’t: contact the startup before you know what they do. I’ve recruited at a couple college job fairs and almost everyone comes up and says, “Hi, I’m a masters student in computer science and I’m looking for a job. Can I give you my resume?” Yes, you can, and I’ll put it on the pile of 200 other resumes.

Also, please don’t walk me through your resume line-by-line: it’s boring. I’ll hate you and I won’t be able to think of a polite way of cutting you off.

Do: say, “I love MongoDB! I’ve been using it with Ruby for <some project> and I would love to work on it full time! I’m really interested in replication/sharding/geospatial/etc. stuff!” Keep in mind: you’re talking to startup employees. Working is our life (which sounds depressing, but we’re doing what we love). It’s annoying to have people apply who are looking for a job, any job, and obviously don’t give a crap what we do.

Startups tend to get romanticized (and I’m about to romanticize them out the wazoo), but working at one definitely isn’t for everyone. The salary isn’t as good, the job security is going to suck, it’s tons more work and investment than a “normal” company, and in all likelihood, after pouring your heart and soul into it for years, it’ll flop.

On the other hand, working at a startup is awesome. You get to do everything: I’ve done C socket programming and jQuery and everything in between. I’m two years out of school and manage release cycles and user communities. I’ve gotten to travel everywhere from Belgium to Brazil and written a book.

It’s a great match if you like being independent: not the Rambo-“don’t tie me down, baby”-independent, the “::snerk::, I like dinosaurs so I wrote a research paper on sauropods”-independent. You have to be willing to work hard under your own steam.

Your resume

Don’t: have a boring resume.

Your resume should prove that we are fools if we don’t bring you in for an interview.

If yours doesn’t, think about what your dream job would look for on your resume. Open source development? Independent research? A penchant for robot design? Now go out and get that stuff on your resume.

Don’t use fluffy language, your resume is going to be read by programmers, not managers. “Did in-depth research to enable optimization of processes” is going to make us groan. “Made a genome-crunching aggregation script 50 times faster by researching how Java memory allocation works” is going to make us go “cool!” Have you done other optimization research? Do you like benchmarking? Do you know a lot about Java internals? Heck, tell us about the human genome.

Your interview is going to be a lot more fun for everyone involved (and much more likely to actually occur) if you make us think, “this person sounds really interesting, I want to talk to them.”

When I was in college I had no idea what I wanted to do, other than a vague idea of “solving interesting problems.” So, you don’t exactly have to be dedicated to the cause to get a job at a startup. Just express some enthusiasm for what they do, write a kick-ass resume, and the rest is up to your technical ability.

Oh, and by the way: if you’re looking for an awesome job, 10gen is recruiting!

Getting to Know Your Oplog

Keeping with the theme: a blink dog.
This is the second in a series of three posts on replication internals. We’ve already covered what’s stored in the oplog, today we’ll take a closer look at what the oplog is and how that affects your application.

Our application could do billions of writes and the oplog has to record them all, but we don’t want our entire disk consumed by the oplog. To prevent this, MongoDB makes the oplog a fixed-size, or capped, collection (the oplog is actually the reason capped collections were invented).

When you start up the database for the first time, you’ll see a line that looks like:

Mon Oct 11 14:25:21 [initandlisten] creating replication oplog of size: 47MB... (use --oplogSize to change)

Your oplog is automatically allocated to be a fraction of your disk space. As the message suggests, you may want to customize it as you get to know your application.

Protip: you should make sure you start up arbiter processes with --oplogSize 1, so that the arbiter doesn’t preallocate an oplog. There’s no harm in it doing so, but it’s a waste of space as the arbiter will never use it.

Implications of using a capped collection

The oplog is a fixed size so it will eventually fill up. At this point, it’ll start overwriting the oldest entries, like a circular queue.

It’s usually fine to overwrite the oldest operations because the slaves have already copied and applied them. Once everyone has an operation there’s no need to keep it around. However, sometimes a slave will fall very far behind and “fall off” the end of the oplog: the latest operation it knows about is before the earliest operation in the master’s oplog.

oplog time ->

   ^         ^    ^        ^
   |         |    |        |
      slave         master

If this occurs, the slave will start giving error messages about needing to be resynced. It can’t catch up to the master from the oplog anymore: it might miss operations between the last oplog entry it has and the master’s oldest oplog entry. It needs a full resync at this point.

Resyncing

On a resync or an initial sync, the slave will make a note of the master’s current oplog time and call the copyDatabase command on all of the master’s databases. Once all of the master’s databases have been copied over, the slave makes a note of the time. Then it applies all of the oplog operations from the time the copy started up until the end of the copy.

Once it has completed the copy and run through the operations that happened during the copy, it is considered resynced. It can now begin replicating normally again. If so many writes occur during the resync that the slave’s oplog cannot hold them all, you’ll end up in the “need to resync” state again. If this occurs, you need to allocate a larger oplog and try again (or try it at a time when the system is has less traffic).

Next up: using the oplog in your application.

Replication Internals

Displacer beast... seemed related (it's sort of in two places at the same time).

This is the first in a three-part series on how replication works.

Replication gives you hot backups, read scaling, and all sorts of other goodness. If you know how it works you can get a lot more out of it, from how it should be configured to what you should monitor to using it directly in your applications. So, how does it work?

MongoDB’s replication is actually very simple: the master keeps a collection that describes writes and the slaves query that collection. This collection is called the oplog (short for “operation log”).

The oplog

Each write (insert, update, or delete) creates a document in the oplog collection, so long as replication is enabled (MongoDB won’t bother keeping an oplog if replication isn’t on). So, to see the oplog in action, start by running the database with the –replSet option:

$ ./mongod --replSet funWithOplogs

Now, when you do operations, you’ll be able to see them in the oplog. Let’s start out by initializing out replica set:

> rs.initiate()

Now if we query the oplog you’ll see this operation:

> use local
switched to db local
> db.oplog.rs.find()
{ 
    "ts" : { "t" : 1286821527000, "i" : 1 }, 
    "h" : NumberLong(0), 
    "op" : "n", 
    "ns" : "", 
    "o" : { "msg" : "initiating set" } 
}

This is just an informational message for the slave, it isn’t a “real” operation. Breaking this down, it contains the following fields:

  • ts: the time this operation occurred.
  • h: a unique ID for this operation. Each operation will have a different value in this field.
  • op: the write operation that should be applied to the slave. n indicates a no-op, this is just an informational message.
  • ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.
  • o: the actual document representing the op. Since this is a no-op, this field is pretty useless.

To see some real oplog messages, we’ll need to do some writes. Let’s do a few simple ones in the shell:

> use test
switched to db test
> db.foo.insert({x:1})
> db.foo.update({x:1}, {$set : {y:1}})
> db.foo.update({x:2}, {$set : {y:1}}, true)
> db.foo.remove({x:1})

Now look at the oplog:

> use local
switched to db local
> db.oplog.rs.find()
{ "ts" : { "t" : 1286821527000, "i" : 1 }, "h" : NumberLong(0), "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } }
{ "ts" : { "t" : 1286821977000, "i" : 1 }, "h" : NumberLong("1722870850266333201"), "op" : "i", "ns" : "test.foo", "o" : { "_id" : ObjectId("4cb35859007cc1f4f9f7f85d"), "x" : 1 } }
{ "ts" : { "t" : 1286821984000, "i" : 1 }, "h" : NumberLong("1633487572904743924"), "op" : "u", "ns" : "test.foo", "o2" : { "_id" : ObjectId("4cb35859007cc1f4f9f7f85d") }, "o" : { "$set" : { "y" : 1 } } }
{ "ts" : { "t" : 1286821993000, "i" : 1 }, "h" : NumberLong("5491114356580488109"), "op" : "i", "ns" : "test.foo", "o" : { "_id" : ObjectId("4cb3586928ce78a2245fbd57"), "x" : 2, "y" : 1 } }
{ "ts" : { "t" : 1286821996000, "i" : 1 }, "h" : NumberLong("243223472855067144"), "op" : "d", "ns" : "test.foo", "b" : true, "o" : { "_id" : ObjectId("4cb35859007cc1f4f9f7f85d") } }

You can see that each operation has a ns now: “test.foo”. There are also three operations represented (the op field), corresponding to the three types of writes mentioned earlier: i for inserts, u for updates, and d for deletes.

The o field now contains the document to insert or the criteria to update and remove. Notice that, for the update, there are two o fields (o and o2). o2 give the update criteria and o gives the modifications (equivalent to update()‘s second argument).

Using this information

MongoDB doesn’t yet have triggers, but applications could hook into this collection if they’re interested in doing something every time a document is deleted (or updated, or inserted, etc.) Part three of this series will elaborate on this idea.

Next up: what the oplog is and how syncing works.

Scaling, scaling everywhere

Interested in learning more about scaling MongoDB? Pick up September’s issue of PHP|Architect magazine, the database issue! I wrote an article on scaling your MongoDB database: how to choose good indexes, help handle load using replication, and set up sharding correctly (it’s not PHP-specific).

If you prefer multimedia, I also did an O’Reilly webcast on scaling MongoDB, which you can watch below:

Unfortunately, I had some weird lag problems throughout and at the end it totally cut my audio, so I didn’t get to all of the questions. I asked the O’Reilly people to send me the unanswered questions, so I’ll post the answers as soon as they do (or you can post it again in the comments below).

Writing MongoDB: The Definitive Guide

Me, with the finished product
MongoDB: The Definitive Guide is now available in bookstores everywhere! (Or at least on Amazon.) Please pick up a copy!

Some interesting things I learned about the process of publishing:

There are professional indexers who write the index.
This amazes me, because we had to proofread our index and I’ve never been so bored in my life. These people must have the exact opposite personality I do. And, in our case, they spelled “Ruby gems” as “Ruby germs.”
Blog posts are a better length
In 500 words, I can edit and polish something until it’s a shimmering jewel of a, uh, blog post. It’s really hard to make a hundred thousand words even have a reasonable flow, never mind be “perfect.”
Illustrations will be assimilated.
When we submitted the manuscript, I had (the night before) whipped up the illustrations in Photoshop that looked like this:

Every document is a beautiful snowflake (because they're all unique)

At the final stage of the editing process, these all got replaced by O’Reilly illustrations, which looked a lot more professional.

Well la-dee-da.

I’m pretty impressed by how well they matched what I was going for, but wish I hadn’t spent so long making those damn snowflakes.

An advance is an advance on sales.

In retrospect, I should have realized this, but I never really thought about it before. If O’Reilly advanced us $100,000 (they didn’t), that just means we wouldn’t get any royalty checks until people bought enough books to give us $100k in royalties. So, essentially, authors write books for free. This kind of amazes me.

All and all, it was really fun and I’d do it again in a heartbeat. In the future, I wouldn’t stick to the schedule quite as rigorously. At the beginning, O’Reilly gave us the following timeline:

  • 3 months = 2 chapters
  • 6 months = first half
  • 9 months = whole book

I write best when I splorch down everything that comes to me as fast as possible and then edit it fifty times. So next time I’d do:

  • 3 months = book of crap
  • 6 months = semi-literate book
  • 9 months = great American (technical) novel.

Andrew suggested we do the National Novel Writing Month, so now I’m trying to think of another thing to write about. I’ll probably do a MongoDB book, but not sure what yet…

Choose your own adventure: MongoDB crash recovery edition

Suppose your application is happily talking to MongoDB and your laptop battery runs out. Or your server bursts into flame. Or velociraptors attack your data center. What now?

To bring your server back up, read through the text until you get to a bold question. Click on the answer that best matches your situation to see the instructions. When you’ve finished an “adventure,” there’ll be a link to bring you back to the top (here).

Is your server physically okay?

Recovering a physically damaged server.

This is beyond the scope of this article. Get a new server and then…

Do you have a backup?

Don’t recover.

If you didn’t do any writes during the session that shutdown uncleanly (this has happened to people), your data is fine. Remove the mongod.lock file and start your database back up.

Try another adventure.

Seriously?

Recover from a backup.

If you have a recent backup, recovery is easy. Remove the entire data directory, replace with the backup. Start the database.

Try another adventure.

Did you do any writes during your last session?

Single server “recovery”

If you have a single instance that shut down uncleanly, you may lose data! Use this as a painful learning experience and always run with replication in the future.

Since you only have this one copy of your data, you’ll have to repair whatever is there. Remove the mongod.lock file and start the database with –repair and any other options you usually use (if you usually use a config file or dbpath, put that in). repair has no way of knowing where mongod put data unless you tell it. It can’t repair data unless it can find it.

Please do not just remove the mongod.lock file and start the database back up. If you’ve got corrupt data, the database will start up fine but you’ll start getting really weird errors when you try to read data. The annoying mongod.lock file is there for a reason!

repair will make a full copy of the uncorrupted data (make sure you have enough disk space!) and remove any corrupted data. It can take a while for large data sets because it looks at every field of every document.

Note that MongoDB keeps a sort of “table of contents” at the beginning of your collection. If it was updating its table of contents when the unclean shutdown happened, it might not be able to find a lot of your data. This is how the “I lost all my documents!” horror stories happen. Run with replication, people!

Better luck next time.

Are you on EBS?

You ran with replication!

Thank you, you get a lollipop! There are lots of ways to recover with various levels of swiftness and ease, but first you need a master. If you are running a replica set (with or without sharding), you don’t need to do anything, the promotion will happen automatically (and you don’t need to change anything in your application, that will failover automatically, too).

If you’re not running a replica set, shut down your slave and restart it with –master. Point your application to the new master.

When you start back up the server that crashed, the way you should start it depends on if you’re using master-slave or replica sets. If you’re using master-slave, start your database back up with –slave and –source pointing to the new master. If you’re running a replica set, just start it with the same arguments you used before.

Are you in a hurry?

Recover quickly without a backup nor messing with your currently up servers.

Here’s where things stand: you have data at point A and you want to get it to point B. If you don’t have a backup, you’re going to have to create a snapshot of whatever’s at A and send it to B. To take a snapshot, you’ll have to make sure the files at A are in a consistent state, so you’ll have to suck it up and fsync and lock it. Or you can use replication, but that’ll take longer.

Next time, make a backup.

Now that you’ve thought it over…

Are you willing to make a server read-only for a bit?

Recover via file system snapshot.

This is generally super-fast, but it might not be supported by your filesystem.

If you’re running on EBS or using ZFS, you can take a file system snapshot of the new master and put it on the server that crashed. Then, start up mongod.

Try another adventure.

How big is your data?

Recover via replication.

This way is the easiest, but it’s also the slowest.

Remove everything in the data directory and start the database back up again. It’ll resync all of the data from the new master.

Try another adventure.

How about XFS (or some other files system that lets you take snapshots)?

Recover with –fastsync.

If you don’t mind making your new master read-only for a bit, you can get your other server back up pretty quickly and easily. First, fsync and lock the master, take a dump its files (or a snapshot, as described above) and put them on the server that went down. Start back up with –fastsync and unlock the master.

Try another adventure.

Pre-create the oplog.

If you have hundreds of gigabytes of data, syncing from scratch may not be practical and the amount of data might be too big to throw around in backups. This way is trickier, but faster than syncing from scratch (unless you’re using ext4, where this won’t give you any added benefit).

Wipe the data directory, then pre-create the local.* files. Make them ~20% of your data size, so if you have 100GB, make 20GB of local files:

for i in {0..10}; do
      echo $i
      head -c 2146435072 /dev/zero > /data/db/local.$i
done

Now start mongod up with an oplog size a bit smaller than the one you just created, e.g., –oplog 17000. It’ll still have to resync, but it’ll cut down on the file preallocation time.

Try another adventure.

Were you running with replication?

Recover via postal service.

If your data is unmovable, it’s unmovable. If you really have that much data, you can get pretty good data transfer rates by priority shipping it on disks. (And don’t feel too ridiculous, even Google does this, sometimes.)

Try another adventure.

Oh, the Mistakes I’ve Seen

A slow database is easily fixed
If you make good choices of fields indexed.
Sometimes the answer is simpler still,
A quick code change may fit the bill.

I’ll be giving an O’Reilly webcast, Scaling with MongoDB, on Friday (9/17). Please sign up if you’re interested in learning some more advanced optimization than what this post gets into. This webcast is, in part, to pimp MongoDB: The Definitive Guide, which will be coming out next week!

These are a few basic tips on making your application better/faster/stronger without knowing anything about indexes or sharding.

Connecting

Connecting to the database is a (relatively) expensive operation. Try to minimize the number of times you connect and disconnect: use persistent connections or connection pooling (depending on your language).

To not waste connections, you have to know what your driver is doing. I see a lot of code like this in PHP:

$connection = new Mongo();
$connection->connect();

What this does is:

  1. The constructor connects to the database.
  2. connect() sees that you’re already connected, assumes you want to reset the connection.
  3. Disconnects from the database.
  4. Connects again.

Gah! You just doubled your execution time.

ObjectIds

ObjectIds seem to make people vaguely uncomfortable, so they convert their ObjectIds into strings (the macaroni and cheese of data types). The problem is, an ObjectId takes up 12 bytes but its string representation takes up 29 bytes (almost two and a half times bigger). The lesson: suck it up and eat your spinachy ObjectIds. You’ll learn to like ’em.

Also, an ObjectId won’t sneakily convert itself into a string on the fly. I see a lot of code like:

id = new ObjectId();
db.foo.insert({"_id" : new ObjectId(id)});
// or, even sillier
db.foo.insert({"_id" : new ObjectId(id.toString())});

If you created an ObjectId and haven’t messed with it, it’s still an ObjectId.

Numbers vs. Strings

MongoDB is type-sensitive and it’s important to use the correct type: numbers for numeric values and strings for strings.

If you have large numbers and you save them as strings (“1234567890” instead of 1234567890), MongoDB may slow down as it strcmps the entire length of the number instead of doing a quicker numeric comparison. Also, “12” is going to be sorted as less than “9”, because MongoDB will use string, not numeric, comparison on the values. This can lead to some surprising results.

Driver-specific

Find out if you’re driver is particularly weaknesses (or strengths). For instance, the Perl driver is one of the fastest drivers, but it sucks at decoding Date types (Perl’s DateTime objects take a long time to create). So, if you want fast Perl programs, avoid dates like the plague or you’ll be puttering along with the Ruby programmers. (Just kidding, Rubyists! Sort of.)

The most important thing is to get to know your language’s documentation and ask if you have any questions.

A Quick Intro to mongosniff

Writing an application on top of a framework on top of a driver on top of the database is a bit like playing telephone: you say “insert foo” and the database says “purple monkey dishwasher.” mongosniff lets you see exactly what the database is hearing and saying.

It comes with the binary distribution, so if you have mongod you should have mongosniff.

To try it out, first start up an instance of mongod normally:

$ ./mongod

When you start up mongosniff you have to tell it to listen on the loopback (localhost) interface. This interface is usually called “lo”, but my Mac calls it “lo0”, so run ifconfig to make sure you have the name right. Now run:

$ sudo ./mongosniff --source NET lo
sniffing... 27017 

Note the “sudo”: this has never worked for me from my user account, probably because of some stupid network permissions thing.

Now start up the mongo shell and try inserting something:

> db.foo.insert({x:1})

If you look at mongosniff‘s output, you’ll see:

127.0.0.1:57856  -->> 127.0.0.1:27017 test.foo  62 bytes  id:430131ca   1124151754
        insert: { _id: ObjectId('4c7fb007b5d697849addc650'), x: 1.0 }
127.0.0.1:57856  -->> 127.0.0.1:27017 test.$cmd  76 bytes  id:430131cb  1124151755
        query: { getlasterror: 1.0, w: 1.0 }  ntoreturn: -1 ntoskip: 0
127.0.0.1:27017  <<--  127.0.0.1:57856   76 bytes  id:474447bf  1195657151 - 1124151755
        reply n:1 cursorId: 0
        { err: null, n: 0, wtime: 0, ok: 1.0 }

There are three requests here, all for one measly insert. Dissecting the first request, we can learn:

source -->> destination
Our client, mongo in this case, is running on port 57856 and has sent a message to the database (127.0.0.1:27017).

db.collection
This request is for the “test” database’s “foo” collection.
length bytes
The length of the request is 62 bytes. This can be handy to know if your requests are edging towards the maximum request length (16 MB).
id:hex-id id
This is the request id in hexadecimal and decimal (in case you don’t have a computer handy, apparently). Every request to and from the database has a unique id associated with it for tax purposes.
op: content
This is the actual meat of the request: we’re inserting this document. Notice that it’s inserting the float value 1.0, even though we typed 1 in the shell. This is because JavaScript only has one number type, so every number typed in the shell is converted to a double.

The next request in the mongosniff output is a database command: it checks to make sure the insert succeeded (the shell always does safe inserts).

The last message sniffed is a little different: it is going from the database to the shell. It is the database response to the getlasterror command. It shows that there was only one document returned (reply n:1) and that there are no more results waiting at the database (cursorId: 0). If this were a “real” query and there was another batch of results to be sent from the database, cursorId would be non-zero.

Hopefully this will help some people decipher what the hell is going on!