Kristina Chodorow's Blog

The Scripting Language of Databases

One of the most common questions non-users ask is “Why should I use MongoDB?”

There are a bunch of fancy answers: you can scale it (webscale!), you can use it for MapReduce, you can store files in it. Those things are all true, but every database worth its salt can scale (there are MySQL clusters with tens of thousands of nodes), every new-ish database I know of supports MapReduce, and filesystems are pretty good at storing files.

I think the reason is much simpler:

MongoDB gets out of your way.

For example, a user (on IRC) asked, “How do I store a form’s data in MongoDB?” Based on the question, I assumed he was using PHP, so I pasted the following three lines:

$m = new Mongo();
$m->test->formResults->insert($_POST);
print_r($m->test->formResults->findOne());

“Hey, it works!” he said.

(For those of you not familiar with MongoDB (or PHP), this stores and retrieves everything in the POST request and can be run with no prior database setup.*)

So, all of the bells and whistles are nice, but I think the real benefit is the simplicity. MongoDB is the scripting language of databases: you can just get stuff done, fast.

In this spirit, 10gen’s first monthly blogging contest topic was to write about something you developed quickly using MongoDB. The entries were cool: people built really interesting applications ridiculously fast.

mojology-1 — A screenshot of Mojology displaying syslog entries

Some of my favorites entries were:

BugRocket’s Rapid(-ish) Development

Ryan Funduk wrote about creating a bug tracker.

“Without MongoDB, I would have easily racked up over a thousand database migrations.”

The Birth of Mojology

Gergely Nagy built an open source application for viewing and doing statistics on syslog messages.

“About four hours [after installing MongoDB], I posted the first version of my mongodb destination driver to the syslog-ng mailing list.”

From 0 to 1 Million in 6 Hours

Bradley Grzesiak wrote about programming VoiceRally.

“Friday, the day after VoiceRally was written, we sent over 1.5 million WebSocket messages.”

Family Spoon and MongoDB

Tom Maiaroto writes about creating a recipe website.

“Yes, you need to be aware of “schema” and you can’t go hog wild, but you also get more forgiveness and MongoDB works with you to solve your problems. It’s very flexible. It’s not something that you need to work around, it’s something that you get to work with. Anytime that you have a situation like that as a developer, your day is going to be much more happy and productive.”

Check out the other entries, as well. It’s too bad we can only choose one to win!

This month, we’re asking people to write about an open source app using MongoDB and the prize is an iPad2!

Edited to add: some of the commenters are upset about my advice to store $_POST in MongoDB. You should not store any user input unsanitized. For people familiar with SQL, the code above does not allow a traditional injection attack with MongoDB (as it would with SQL). After the first flush of success, I told the guy to not do it this way and to go read the documentation. Inserting $_POST was a learning tool, not a solution, and I tried to make that clear over IRC, if not in this post.

Lorenz University: I can has degree?

2007-11-27-357parking — Click on the image to see the original (full size) version in a new window. Big thanks to Wondermark for allowing people to post their comics!

Misadventures in HR (an hilarious blog about… HR) mentioned Lorenz University, a degree mill. I’d never heard of a degree mill before, so I wanted to see how legit it looked from a computer scientist’s point of view.

whois

Every site on the internet has to register contact information the king of the internet, so you can see who’s behind a website. Anyone can look up this info by running “whois domain” on their computer. For example, here are some legit universities’ info:

$ whois nyu.edu
Registrant:
   New York University
   ITS Communications Operations Services
   7 East 12th Street, 5th Floor
   New York, NY 10003
   UNITED STATES
$ whois mit.edu
Registrant:
   Massachusetts Institute of Technology
   Cambridge, MA 02139
   UNITED STATES
$ whois ufl.edu
Registrant:
   University of Florida
   Computing and Network Services
   Space Sciences Research Building
   Gainesville, FL 32611-2050
   UNITED STATES

Most businesses, higher learning institutions, and pretty much any large, legitimate site has their actual address listed there. What does Lorenz U have?

$ whois lorenzuniversity.com
Registrant:
   Domains by Proxy, Inc.
   DomainsByProxy.com
   15111 N. Hayden Rd., Ste 160, PMB 353
   Scottsdale, Arizona 85260
   United States

Domains by Proxy is a service where you can pay them to keep your contact info a secret. It’s good for privacy, but it’s a bit unusual for a university.

Also, protip: most universities are not .com addresses.

Accreditation

At first glance, Lorenz University seem to have some good proof that they’re a valid, accredited institution:

Lorenz University holds valid accreditation from reputable accrediting agencies including IAAFOE and ACTDE. These agencies have clearly mentioned on their official websites that Lorenz University is fully approved by their evaluation committee.

But wait, I’ve never heard of the IAAFOE or the ACTDE. A quick Google search turns up the International Accreditation Association for Online Eduction and the Accreditation Council for Distance Education.

Okay, Lorenz University is accredited by someone, but let’s take a look at who.

$ whois iaafoe.org
Registrant Name:Registration Private
Registrant Organization:Domains by Proxy, Inc.
Registrant Street1:DomainsByProxy.com
Registrant Street2:15111 N. Hayden Rd., Ste 160, PMB 353
Registrant Street3:
Registrant City:Scottsdale
Registrant State/Province:Arizona
Registrant Postal Code:85260
Registrant Country:US

Huh, Domains by Proxy. Again.

$ whois actde.org
Registrant Name:Registration Private
Registrant Organization:Domains by Proxy, Inc.
Registrant Street1:DomainsByProxy.com
Registrant Street2:15111 N. Hayden Rd., Ste 160, PMB 353
Registrant Street3:
Registrant City:Scottsdale
Registrant State/Province:Arizona
Registrant Postal Code:85260
Registrant Country:US

And again! What are the chances?!

Now, let’s take a closer look at these accreditation sites. I used wget to download the entirety of both sites (somehow, I had the feeling that they wouldn’t be that big). Indeed, one site was 10 files and the other was 11:

$ wget -r http://iaafoe.org/
$ wget -r http://actde.org/

Looking at these files, we can see certain similarities:

$ ls actde.org/
ACTDE  CSS  index.asp  index.html  PDF  robots.txt
$ ls iaafoe.org/
IAAFOE  index.asp  index.html  PDF  robots.txt
$ 
$ # is robots.txt non-trivial?
$ wc -l iaafoe.org/robots.txt
30 iaafoe.org/robots.txt
$  diff -s iaafoe.org/robots.txt actde.org/robots.txt 
Files iaafoe.org/robots.txt and actde.org/robots.txt are identical

Also, there’s a funny “Members Login” link on the ACTDE site that—whoops—isn’t actually a link. How hard is it to make a login page that doesn’t log anyone in?

Conclusion

Lorenz University seems to have “accredited” themselves by creating two accreditation websites, and are trying to take advantage of people who think this will help them get a job.

What I’m really curious about is if they’ll accredit other bullshit. The accreditation sites seem to be non-interactive, and don’t have any way of taking money.

P.S. As long as I’m just picking on them… Lorenz University also bought the site lorenzuniversityscam.com, to defend against people calling them a scam. The scam site has a link, “Click here[sic] to visit the official website of Lorenz University and find out all the details about Lorenz University and the application process to get an accredited degrees.” They misspelled “university” in a link to their own site.

Edit: the

Posted bykchodorowMarch 19, 2011Posted inTechnologyTags:Lorenz University, scam25 Comments on Lorenz University: I can has degree?

Implementing Replica Set Priorities

Replica set priorities will, very shortly, be allowed to vary between 0.0 and 100.0. The member with the highest priority that can reach a majority of the set will be elected master. (The change is done and works, but is being held up by 1.8.0… look for it after that release.) Implementing priorities was kind of an interesting problem, so I thought people might be interested in how it works. Following in the grand distributed system lit tradition I’m using the island nation of Replicos to demonstrate.

Replicos is a nation of islands that elect a primary island, called the Grand Poobah, to lead them. Each island cast a vote (or votes) and the island that gets a majority of the votes wins poobahship. If no one gets a majority (out of the total number of votes), no one becomes Grand Poobah. The islands can only communicate via carrier seagull.

Healthy configurations of islands, completely connected via carrier seagulls.

However, due to a perpetual war with the neighboring islands of Entropos, seagulls are often shot down mid-flight, distrupting communications between the Replicos islands until new seagulls can be trained.

The people of Replicos have realized that some islands are better at Poobah-ing than others. Their goal is to elect the island with the highest Poobah-ing ability that can reach a majority of the other islands. If all of the seagulls can make it to their destinations and back, electing a Poobah becomes trivial: an island sends a message saying they want to be Grand Poobah and everyone votes for them or says “I’m better at Poobah-ing, I should be Poobah.” However, it becomes tricky when you throw the Entropos Navy into the mix.

So, let’s Entropos has shot down a bunch of seagulls, leaving us with only three seagulls:

The island with .5 Poobah ability should be elected leader (the island with 1 Poobah ability can’t reach a majority of the set). But how can .5 know that it should be Poobah? It knows 1.0 exists, so theoretically it could ask the islands it can reach to ask 1.0 if it wants to be Poobah, but it’s a pain to pass messages through multiple islands (takes longer, more chances of failure, a lot more edge cases to check), so we’d like to be able to elect a Poobah using only the directly reachable islands, if possible.

One possibility might be for the islands sent a response indicating if they were connected to an island with a higher Poobah ability. In the case above, this would work (only one island is connected to an island with higher Poobah ability, so it can’t have a majority), but what about this case:

Every island, other than .5, is connected to a 1.0, but .5 should be the one elected! So, suppose we throw in a bit more information (which island of higher priority can be reached) and let the island trying to elect itself figure things out? Well, that doesn’t quite work, what if both .5 and 1.0 can reach a majority, but not the same one?

Conclusion: the Poobah-elect can’t figure this out on their own, everyone needs to work together.

Preliminaries: define an island to be Poohable if it has any aptitude for Poobah-ing and can reach a majority of the set. An island is not Poohable if it has no aptitude for Poobah-ing and/or cannot reach a majority of the set. Islands can be more or less Poohable, depending on their aptitude for Poobah-ing.

Every node knows whether or not it, itself, is Poohable: it knows its leadership abilities and if it can reach a majority of the islands. If more than one island (say islands A and B) is Poohable, then there must be at least one island that can reach both A and B [Proof at the bottom].

Let’s have each island keep a list of “possible Poobahs.” So, say we have an island A, that starts out with an empty list. If A is Poohable, it’ll add itself to the list (if it stops being Poohable, it’ll remove itself from the list). Now, whenever A communicates with another island, the other island will either say “add me to your list” or “remove me from your list,” depending on whether it is currently Poohable or not. Every other island does the same, so now each island has a list of the Poohable islands it can reach.

Now, say island X tries to elect itself master. It contacts all of the islands it can reach for votes. Each of the voting islands checks its list: if it has an entry on it that is more Poohable than X, it’ll send a veto. Otherwise X can be elected master. If you check the situations above (and any other situation) you can see that Poohability works, due to the strength of the guarantee that a Poobah must be able to reach a majority of the set.

Proof: suppose a replica set has n members and a node A can reach a majority of the set (at least ⌊n/2+1⌋) and a node B can reach a majority of the set (again, ⌊n/2+1⌋). If the sets of members A and B can reach are disjoint, then there must be ⌊n/2+1⌋+⌊n/2+1⌋ = at least n+1 members in the set. Therefore the set of nodes that A can reach and the set of nodes that B can reach are not disjoint.

Posted bykchodorowMarch 7, 2011Posted inMongoDBTags:replica sets1 Comment on Implementing Replica Set Priorities

“Scaling MongoDB” Update

Vroom vroom

In the last couple weeks, we’ve been getting a lot of questions like: (no one asked this specific question, this is just similar to the questions we’ve been getting)

I ran shardcollection, but it didn’t return immediately and I didn’t know what was going on, so I killed the shell and tried deleting a shard and then running the ‘shard collection’ command again and then I started killing ops and then I turned the balancer off and then I turned it back on and now I’m not sure what’s going on…

Aaaaagh! Stop running commands!

If a single server is like a TIE fighter then a sharded cluster is like the Death Star: you’ve got more power but you’re not making any sudden movements. For any configuration change you make, at least four servers have to talk to each other (usually more) and often a great deal of data has to get processed. If you ran all of the commands above on a big MongoDB install, everything would eventually work itself out (except the killing random operations part, it sort of depends on what got killed), but it could take a long time.

I think these questions stem from sharding being nerve-wracking: the documentation says what commands to run, but then nothing much seems to happen and everything seems slow and the command line doesn’t return a response (immediately). Meanwhile, you have hundreds of gigabytes of production data and MongoDB is chugging along doing… something.

So, I added some new sections to Scaling MongoDB on what to expect when you shard a big collection: if you run x in the shell, you’ll see y in the log, then MongoDB will be doing z until you see w. What it’s doing, what you’ll see, how (and if) you should react. In general: a sharding operation that hasn’t returned yet isn’t done, keep you eye on the logs, and don’t panic.

I’ve also added a section on backing up config servers and updated the chunk size information. If you bought the eBook, you can update it free to the latest version for free to get the new info. (I love this eBook update system!) The update should be going out this week.

Let me know if there’s any other info that you think is missing and I’ll add it for future updates.

Posted bykchodorowFebruary 24, 2011Posted inMongoDBTags:Scaling MongoDB, sharding1 Comment on “Scaling MongoDB” Update

Resizing Your Oplog

The MongoDB replication oplog is, by default, 5% of your free disk space. The theory behind this is that, if you’re writing 5% of your disk space every x amount of time, you’re going to run out of disk in 19x time. However, this doesn’t hold true for everyone, sometimes you’ll need a larger oplog. Some common cases:

Applications that delete almost as much data as they create.
Applications that do lots of in-place updates, which consume oplog entries but not disk space.
Applications that do lots of multi-updates or remove lots of documents at once. These multi-document operations have to be “exploded” into separate entries for each document in the oplog, so that the oplog remains idempotent.

If you fall into one of these categories, you might want to think about allocating a bigger oplog to start out with. (Or, if you have a read-heavy application that only does a few writes, you might want a smaller oplog.) However, what if your application is already running in production when you realize you need to change the oplog size?

Usually if you’re having oplog size problems, you want to change the oplog size on the master. To change its oplog, we need to “quarantine” it so it can’t reach the other members (and your application), change the oplog size, then un-quarantine it.

To start the quarantine, shut down the master. Restart it without the --replSet option on a different port. So, for example, if I was starting MongoDB like this:

$ mongod --replSet foo # default port

I would restart it with:

$ mongod --port 10000

Replica set members look at the last entry of the oplog to see where to start syncing from. So, we want to do the following:

Save the latest insert in the oplog.
Resize the oplog
Put the entry we saved in the new oplog.

So, the process is:

1. Save the latest insert in the oplog.

> use local switched to db local > // "i" is short for "insert" > db.temp.save(db.oplog.rs.find({op : "i"}).sort( ... {$natural : -1}).limit(1).next())

Note that we are saving the last insert here. If there have been other operations since that insert (deletes, updates, commands), that’s fine, the oplog is designed to be able to replay ops multiple times. We don’t want to use deletes or updates as a checkpoint because those could have $s in their keys, and $s cannot be inserted into user collections.

2. Resize the oplog

First, back up the existing oplog, just in case:

$ mongodump --db local --collection 'oplog.rs' --port 10000

Drop the local.oplog.rs collection, and recreate it to be the size that you want:

> db.oplog.rs.drop() true > // size is in bytes > db.runCommand({create : "oplog.rs", capped : true, size : 1900000}) { "ok" : 1 }

3. Put the entry we saved in the new oplog.

> db.oplog.rs.save(db.temp.findOne())

Making this server primary again

Now shut down the database and start it up again with --replSet on the correct port. Once it is a secondary, connect to the current primary and ask it to step down so you can have your old primary back (in 1.9+, you can use priorities to force a certain member to be preferentially primary and skip this step: it’ll automatically switch back to being primary ASAP).

> rs.stepDown(10000) // you'll get some error messages because reconfiguration // causes the db to drop all connections

Your oplog is now the correct size.

Edit: as Graham pointed out in the comments, you should do this on each machine that could become primary.

Posted bykchodorowFebruary 22, 2011Posted inMongoDBTags:oplog, replica sets21 Comments on Resizing Your Oplog

Enchiladas of Doom

Damjan Stanković's Eko light design

Andrew and I are visiting San Francisco this week. Last night, I wanted enchiladas from the Mexican place across the street from the hotel. It was still warm out even though the sun had set hours ago, so we decided to walk over.

Our hotel is on a busy road with three lanes in both directions, but there are lights along the road so there are periods when no cars are coming. We waited until a wave of cars had passed and sprinted across the first three lanes. In the darkness, it looked like the median was flush with the pavement and I charged at it. It was actually raised and my foot hit it six inches before I had expected to encounter anything solid. I staggered and lost my balance but I was still running full-steam, so I tripped my way forward ending up in the middle of the road. I lay there on the highway, stunned, looking at three lanes of cars coming at me.

Get up! screamed my brain. It hadn’t even finished shouting when Andrew scooped me up and half-carried me off of the road. He had almost tripped over me as I fell, but managed to leap over and then turn and pick me up. He is my Batman!

My ankle is still pretty sore and I’m a bit banged up down the side I fell on, but other than that I’m fine. And the enchiladas were delicious.

Also: if you’re a subscriber and you’re only interested in MongoDB-related posts, I created a new RSS feed you can subscribe to for just those posts. The old feed will continue to have all MongoDB posts, plus stuff about my life and other technology.

Posted bykchodorowFebruary 11, 2011Posted inPersonalTags:Travel6 Comments on Enchiladas of Doom

My Life is Awesome

Andrew and I are getting married!

I can’t figure out how to say this eloquently, but: Andrew is so wonderful and I am incredibly lucky to have him. I love him so much. We love doing stuff together, talking about everything together (well, he puts up with more database talk than he’d probably strictly like), and just being with each other.

He has the cutest new haircut, too, but he’s a tough man to get in front of a camera (the picture on the right is from a year ago).

We’re getting married by the justice of the peace on our third anniversary. We already have tons of crap in our apartment and the last thing we need is more crap, so we’re asking wedding guests to skip the presents and donate to one of the following charities instead:

Freedom to Marry Foundation, which fights for gays’ and lesbians’ rights to marry.
The Dakin Animal Shelter, a no-kill animal shelter in my hometown.
Electronic Frontier Foundation, the ACLU of technology.

I am so happy.

Posted bykchodorowFebruary 2, 2011Posted inUncategorized5 Comments on My Life is Awesome

A Short eBook on Scaling MongoDB

I just finished a little ebook for O’Reilly: Scaling MongoDB. I’m excited about it, it was really fun to write and I think it’ll be both fun and instructive to read. It covers:

What a cluster is
How it works (at a high level)
How to set it up correctly
The differences in programming for a cluster vs. a single server
How to administer a cluster
What can go wrong and how to handle it

So, why would you want to get a copy?

It’s a succinct reference for anything that’s likely to come up and covers the common questions I’ve heard people ask in the last year.
I heard some grumbling about my post on choosing a shard key (“Ugh, a metaphor,” -an IRC user). People who don’t like metaphors will be pleased to hear that this book has a straight-forward, serious-business section on choosing a shard key that not only lacks a metaphor but also goes much more into depth than the blog post.
It’s a quick read. There are code examples, of course, and it can be used as a reference, but after banging out 15,000 words in a couple of days, I took the next couple weeks to make them all flow together like fuckin’ Shakespeare.
It can be updated in real time! After MongoDB: The Definitive Guide becoming out-of-date approximately 6 seconds after we handed in the final draft, I’m delighted that new sections, advice, and features can be added as needed. Once you buy the ebook, you can update to the latest “edition” whenever you want, as many times as you want. O’Reilly wants to do automatic updates, but so far the infrastructure isn’t there in traditional ebook readers so you’ll have to update it manually.

You can also get a “print on demand” copy if you’re old school.

I hope you guys will check it out and let me know what you think!

To promote the book, Meghan is forcing me to do a webcast on Friday (February 4th). It’s called How Sharding Works and it’s a combination whitepaper and Magic School Bus tour of sharding. It should cover some of the interesting stuff about sharding that didn’t really fit into the 60 pages I had to work with (or the more practical focus of the book).

Look at the teeth on that guy! (He'll bite you if you make any webscale jokes.)

Posted bykchodorowJanuary 31, 2011Posted inMongoDBTags:book, scaling, sharding, writing11 Comments on A Short eBook on Scaling MongoDB

Why Command Helpers Suck

This is a rant from my role as a driver developer and person who gives support on the mailing list/IRC.

Command helpers are terrible. They confuse users, result in a steeper learning curve, and make MongoDB’s interface seem arbitrary.

The basics: what are command helpers?

Command helpers are wrappers for database commands. Database commands are everything other than CRUD (create, retrieve, update, delete) that you can do with MongoDB. This includes things like dropping a collection, doing a MapReduce, adding a member to a replica set, seeing what arguments you started mongod with, and finding out if the last write operation succeeded. They’re everywhere, if you’ve used MongoDB, you’ve run a database command (even if you weren’t aware of it).

So, what are command helpers? These are wrappers around the raw command, turning something like db.adminCommand({serverStatus:1}) into db.serverStatus(). This makes it slightly quicker to run and look “nicer” than the command. However, there are honey bunches of reasons that they’re a bad idea and should be avoided whenever possible.

Database helpers are unportable

Helpers are extremely unportable. If you know how to run db.serverStatus() in the shell, that’s great, but all you know is how to do it in the shell. If you know how to run the serverStatus command, you know how to get the server status in every language you’ll ever use.

Similarly, each language handles command options differently. Take a command like group: the shell helper chooses one order of options (a single argument “options”, incidentally) and the Python driver chooses another (“key”, “condition”, “initial”, “reduce”, “finalize”) and the PHP driver another (“key”, “initial”, “reduce”, “options”). If you just learn the group command itself, you can execute it in any language you please.

This affects almost everyone using MongoDB, as almost everyone uses at least two languages (JavaScript and something else). I have seen hundreds of questions of the form “How do I run <shell function> using my driver?” If these users knew it was a database command (and knew what a database command was), they wouldn’t have to ask.

Database helpers lock you to a certain API, often an out-of-date one

Suppose the database changes the options for a command. All of the drivers that support helpers for that command are suddenly out-of-date. Conversely, if you have a recent version of a driver and an old version of the database, you can have helpers for features that don’t exist yet or have different options.

An example of old driver/new database: MapReduce’s options changed in version 1.7.4. As far as I know, none of the drivers support the new options, yet.

You can’t support database helpers for everything

Next, there’s just the sheer volume of database commands, which makes it impossible to implement helpers for all of them. Everyone has their favorites: aggregation is important to some people, administration helpers are important to others, etc. If all of them had helpers, not only would there be a ridiculous number of methods polluting the API documentation, but it would leads to tons of compatibility problems between the driver and the database (as mentioned above).

Database helpers conceal what’s going on, giving users less options

Finally, using command helpers keeps people from understanding what’s actually going on, which is pointless and can lead to problems. It’s pointless to conceal the gory details because the details aren’t very gory: all database commands are queries. This means you can deconstruct command helpers as follows (example in PHP):

// the command helper $db->lastError(); // is the same as $db->command(array("getlasterror" => 1)); // is the same as $db->selectCollection('$cmd')->findOne(array("getlasterror" => 1)); // is the same as $db->selectCollection('$cmd')->find(array("getlasterror" => 1))->limit(1)->getNext();

Every command helper is just a find() in disguise! This means you can do (almost) anything with a database command that you could with a query.

This gives you more control. Not only can you use whatever options you want, you can do a few other things:

By default, drivers send all commands to the master, even if slaveOkay is set. If you want to send a command to a slave, you can deconstruct it to a query bypass the driver’s commands-go-to-master logic.

Suppose you have a command that takes a long time to execute and it times out on the client side. If you deconstruct the command into a query, you can (for some drivers) set the client-side timeout on the cursor.

Finally, if you’re using an unfamiliar driver, you might not know what its helpers are called but all drivers have a find() method, so you can always use that.

Exceptions

There are a couple command helpers worth implementing. I think that count and drop (at both the database and collection levels) are common enough to be worth having helpers for. Also, at a higher level (e.g., frameworks on top of the driver and admin GUIs) I think helpers are absolutely fine. However, as someone who has been maintaining a driver and supporting users for the last few years, I think that, at a driver level, command helpers are a terrible idea.

Posted bykchodorowJanuary 25, 2011Posted inMongoDBTags:commands20 Comments on Why Command Helpers Suck

How to Use Replica Set Rollbacks

Rollin' rollin' rollin', keep that oplog rollin'

If you’re using replica sets, you can get into a situation where you have conflicting data. MongoDB will roll back conflicting data, but it never throws it out.

Let’s take an example, say you have three servers: A (arbiter), B, and C. You initialize A, B, and C:

$ mongo B:27017/foo > rs.initiate() > rs.add("C:27017") { "ok" : 1 } > rs.addArb("A:27017") { "ok" : 1 }

Now do a couple of writes to the master (say it’s B).

> B = connect("B:27017/foo") > B.bar.insert({_id : 1}) > B.bar.insert({_id : 2}) > B.bar.insert({_id : 3})

Then C gets disconnected (if you’re trying this out, you can just hit Ctrl-C—in real life, this might be caused by a network partition). B handles some more writes:

> B.bar.insert({_id : 4}) > B.bar.insert({_id : 5}) > B.bar.insert({_id : 6})

Now B gets disconnected. C gets reconnected and the arbiter elects it master, so it starts handling writes.

> C = connect("C:27017/foo") > C.bar.insert({_id : 7}) > C.bar.insert({_id : 8}) > C.bar.insert({_id : 9})

But now B gets reconnected. B has data that C doesn’t have and C has data that B doesn’t have! What to do? MongoDB chooses to roll back B’s data, since it’s “further behind” (B’s latest timestamp is before C’s latest timestamp).

If we query the databases after the millisecond or so it takes to roll back, they’ll be the same:

> C.bar.find() { "_id" : 1 } { "_id" : 2 } { "_id" : 3 } { "_id" : 7 } { "_id" : 8 } { "_id" : 9 } > B.bar.find() { "_id" : 1 } { "_id" : 2 } { "_id" : 3 } { "_id" : 7 } { "_id" : 8 } { "_id" : 9 }

Note that the data B wrote and C didn’t is gone. However, if you look in B’s data directory, you’ll see a rollback directory:

$ ls /data/db journal local.0 local.1 local.ns mongod.lock rollback foo.0 foo.1 foo.ns _tmp $ ls /data/db/rollback foo.bar.2011-01-19T18-27-14.0.bson

If you look in the rollback directory, there will be a file for each rollback MongoDB has done. You can examine what was rolled back with the bsondump utility (comes with MongoDB):

$ bsondump foo.bar.2011-01-19T18-27-14.0.bson { "_id" : 4 } { "_id" : 5 } { "_id" : 6 } Wed Jan 19 13:33:32 3 objects found

If these won’t conflict with your existing data, you can add them back to the collection with mongorestore.

$ mongorestore -d foo -c bar foo.bar.2011-01-19T18-27-14.0.bson connected to: 127.0.0.1 Wed Jan 19 13:36:27 foo.bar.2011-01-19T18-27-14.0.bson Wed Jan 19 13:36:27 going into namespace [foo.bar] Wed Jan 19 13:36:27 3 objects found

Note that you need to specify -d foo and -c bar to get it into the correct collection. If it would conflict, you could restore it into another collection and do a more delicate merge operation.

Now, if you do a find, you’ll get all of the documents:

> B.bar.find() { "_id" : 1 } { "_id" : 2 } { "_id" : 3 } { "_id" : 7 } { "_id" : 8 } { "_id" : 9 } { "_id" : 4 } { "_id" : 5 } { "_id" : 6 }

Hopefully this sort of thing can tide most people over until MongoDB supports multi-master.

Posted bykchodorowJanuary 19, 2011Posted inMongoDBTags:bsondump, mongorestore, replica sets, rollback5 Comments on How to Use Replica Set Rollbacks

Posts pagination

Newer posts 1 … 19 20 21 22 23 … 29 Older posts