Choose your own adventure: MongoDB crash recovery edition

Suppose your application is happily talking to MongoDB and your laptop battery runs out. Or your server bursts into flame. Or velociraptors attack your data center. What now?

To bring your server back up, read through the text until you get to a bold question. Click on the answer that best matches your situation to see the instructions. When you’ve finished an “adventure,” there’ll be a link to bring you back to the top (here).

Is your server physically okay?

Recovering a physically damaged server.

This is beyond the scope of this article. Get a new server and then…

Do you have a backup?

Don’t recover.

If you didn’t do any writes during the session that shutdown uncleanly (this has happened to people), your data is fine. Remove the mongod.lock file and start your database back up.

Try another adventure.

Seriously?

Recover from a backup.

If you have a recent backup, recovery is easy. Remove the entire data directory, replace with the backup. Start the database.

Try another adventure.

Did you do any writes during your last session?

Single server “recovery”

If you have a single instance that shut down uncleanly, you may lose data! Use this as a painful learning experience and always run with replication in the future.

Since you only have this one copy of your data, you’ll have to repair whatever is there. Remove the mongod.lock file and start the database with –repair and any other options you usually use (if you usually use a config file or dbpath, put that in). repair has no way of knowing where mongod put data unless you tell it. It can’t repair data unless it can find it.

Please do not just remove the mongod.lock file and start the database back up. If you’ve got corrupt data, the database will start up fine but you’ll start getting really weird errors when you try to read data. The annoying mongod.lock file is there for a reason!

repair will make a full copy of the uncorrupted data (make sure you have enough disk space!) and remove any corrupted data. It can take a while for large data sets because it looks at every field of every document.

Note that MongoDB keeps a sort of “table of contents” at the beginning of your collection. If it was updating its table of contents when the unclean shutdown happened, it might not be able to find a lot of your data. This is how the “I lost all my documents!” horror stories happen. Run with replication, people!

Better luck next time.

Are you on EBS?

You ran with replication!

Thank you, you get a lollipop! There are lots of ways to recover with various levels of swiftness and ease, but first you need a master. If you are running a replica set (with or without sharding), you don’t need to do anything, the promotion will happen automatically (and you don’t need to change anything in your application, that will failover automatically, too).

If you’re not running a replica set, shut down your slave and restart it with –master. Point your application to the new master.

When you start back up the server that crashed, the way you should start it depends on if you’re using master-slave or replica sets. If you’re using master-slave, start your database back up with –slave and –source pointing to the new master. If you’re running a replica set, just start it with the same arguments you used before.

Are you in a hurry?

Recover quickly without a backup nor messing with your currently up servers.

Here’s where things stand: you have data at point A and you want to get it to point B. If you don’t have a backup, you’re going to have to create a snapshot of whatever’s at A and send it to B. To take a snapshot, you’ll have to make sure the files at A are in a consistent state, so you’ll have to suck it up and fsync and lock it. Or you can use replication, but that’ll take longer.

Next time, make a backup.

Now that you’ve thought it over…

Are you willing to make a server read-only for a bit?

Recover via file system snapshot.

This is generally super-fast, but it might not be supported by your filesystem.

If you’re running on EBS or using ZFS, you can take a file system snapshot of the new master and put it on the server that crashed. Then, start up mongod.

Try another adventure.

How big is your data?

Recover via replication.

This way is the easiest, but it’s also the slowest.

Remove everything in the data directory and start the database back up again. It’ll resync all of the data from the new master.

Try another adventure.

How about XFS (or some other files system that lets you take snapshots)?

Recover with –fastsync.

If you don’t mind making your new master read-only for a bit, you can get your other server back up pretty quickly and easily. First, fsync and lock the master, take a dump its files (or a snapshot, as described above) and put them on the server that went down. Start back up with –fastsync and unlock the master.

Try another adventure.

Pre-create the oplog.

If you have hundreds of gigabytes of data, syncing from scratch may not be practical and the amount of data might be too big to throw around in backups. This way is trickier, but faster than syncing from scratch (unless you’re using ext4, where this won’t give you any added benefit).

Wipe the data directory, then pre-create the local.* files. Make them ~20% of your data size, so if you have 100GB, make 20GB of local files:

for i in {0..10}; do
      echo $i
      head -c 2146435072 /dev/zero > /data/db/local.$i
done

Now start mongod up with an oplog size a bit smaller than the one you just created, e.g., –oplog 17000. It’ll still have to resync, but it’ll cut down on the file preallocation time.

Try another adventure.

Were you running with replication?

Recover via postal service.

If your data is unmovable, it’s unmovable. If you really have that much data, you can get pretty good data transfer rates by priority shipping it on disks. (And don’t feel too ridiculous, even Google does this, sometimes.)

Try another adventure.

15 thoughts on “Choose your own adventure: MongoDB crash recovery edition

  1. Correct me if I am wrong please: the single server case without a backup leads us to the same result we get with a replica set that’s on the same power circuit and a power outage … yes?

    Like

    1. Yes, exactly. An unclean shutdown means the server may have corrupted data. If all of your servers have unclean shutdowns, all of them may have corrupted data.

      Like

  2. No replication case:
    In single server recovery, repair() starts from first file for that DB or the latest file? In one crash recovery, repair caused loss of all the files related to one DB. Is this possible? Why repair() doesnt just validate and repair() the latest file?

    Like

    1. Note that MongoDB keeps a sort of “table of contents” at the beginning of your collection. If it was updating its table of contents when the unclean shutdown happened, it might not be able to find a lot of your data. This is how the “I lost all my documents!” horror stories happen.

      Like

      1. Thanks. I noticed that but since couldn’t find more information on this ToC elsewhere, so wanted re-confirmation of the same.

        I have planned a single server deployment as the amount of data I have is quite large and thus I cant afford full replication. To achieve, some sort of recovery against node crash, I have divided my data by creating one DB per day. And this DB will have selective replication set-on in master-slave config (so slave will get replicated data in cycle for a day’s data only). I have shared file-system, accessible from another machine. So, just in case, master node crashes, I plan to start a new MongoD on another machine and point it to crashed instances data-set. At the same time, I plan to replace master’s that days data-files with slave’s replicated data of that day. I know, there might be some data loss due to slave not being in complete sync with the master but with this approach, I hope to recover fast without using slow repair() or any subsequent loss of my complete data during repair(). Do you see any gotcha’s here, I might be missing? TIA for your help.

        Like

      2. That setup sounds good. Off of the top of my head, I don’t see any problems with it.

        On the other hand, assuming you’re not planning to deploy in the next couple of days, 1.8.0 will have single-server durability. You might want to stick with your plan so that you can bring your app back up faster, but you won’t be running the data loss risk if you just use one server.

        Like

      3. Thanks for verifying the setup. My worry: I hope it will never be the case where Master crashes, while slave couldn’t get the complete data to update in ToC thus causing partial ToC update in slave copy?

        Rel 1.8.0 with durability is much awaited but roadmap doesn’t really give a date for this feature. I would love to check out the durability, the release promises and may want to do away with this master-slave setup altogether, if recovery is faster and doesn’t involve the slow repair().
        Meanwhile, to understand this single server durability: Hopefully, in case of machine crash, even the newer MongoD on another machine will also be able to avoid data loss with available logs etc, by pointing to older data set?

        Like

      4. Re your worry: no, replication is a logical copy (“do an insert”), not a raw data copy (“insert these bytes at this file offset”), so you’re slave can’t get corrupted from a master crashing.

        1.8.0 should be mid-January. Recovery won’t necessarily be fast, just safe. Replication will probably be much faster.

        I don’t understand your last question. Durability means that, if you crash, you can start mongod back up and it’ll replay its transaction logs to get you to the state you were in before the crash.

        Like

  3. Pingback: ehcache.net
  4. Pingback: keyword2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: