How to Use Replica Set Rollbacks

Rollin' rollin' rollin', keep that oplog rollin'

If you’re using replica sets, you can get into a situation where you have conflicting data. MongoDB will roll back conflicting data, but it never throws it out.

Let’s take an example, say you have three servers: A (arbiter), B, and C. You initialize A, B, and C:

$ mongo B:27017/foo
> rs.initiate()
> rs.add("C:27017")
{ "ok" : 1 }
> rs.addArb("A:27017")
{ "ok" : 1 }

Now do a couple of writes to the master (say it’s B).

> B = connect("B:27017/foo")
> B.bar.insert({_id : 1})
> B.bar.insert({_id : 2})
> B.bar.insert({_id : 3})

Then C gets disconnected (if you’re trying this out, you can just hit Ctrl-C—in real life, this might be caused by a network partition). B handles some more writes:

> B.bar.insert({_id : 4})
> B.bar.insert({_id : 5})
> B.bar.insert({_id : 6})

Now B gets disconnected. C gets reconnected and the arbiter elects it master, so it starts handling writes.

> C = connect("C:27017/foo")
> C.bar.insert({_id : 7})
> C.bar.insert({_id : 8})
> C.bar.insert({_id : 9})

But now B gets reconnected. B has data that C doesn’t have and C has data that B doesn’t have! What to do? MongoDB chooses to roll back B’s data, since it’s “further behind” (B’s latest timestamp is before C’s latest timestamp).

If we query the databases after the millisecond or so it takes to roll back, they’ll be the same:

> C.bar.find()
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
{ "_id" : 7 }
{ "_id" : 8 }
{ "_id" : 9 }
> B.bar.find()
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
{ "_id" : 7 }
{ "_id" : 8 }
{ "_id" : 9 }

Note that the data B wrote and C didn’t is gone. However, if you look in B’s data directory, you’ll see a rollback directory:

$ ls /data/db
journal  local.0  local.1  local.ns  mongod.lock  rollback  foo.0  foo.1  foo.ns  _tmp
$ ls /data/db/rollback
foo.bar.2011-01-19T18-27-14.0.bson

If you look in the rollback directory, there will be a file for each rollback MongoDB has done. You can examine what was rolled back with the bsondump utility (comes with MongoDB):

$ bsondump foo.bar.2011-01-19T18-27-14.0.bson
{ "_id" : 4 }
{ "_id" : 5 }
{ "_id" : 6 }
Wed Jan 19 13:33:32      3 objects found

If these won’t conflict with your existing data, you can add them back to the collection with mongorestore.

$ mongorestore -d foo -c bar foo.bar.2011-01-19T18-27-14.0.bson 
connected to: 127.0.0.1
Wed Jan 19 13:36:27 foo.bar.2011-01-19T18-27-14.0.bson
Wed Jan 19 13:36:27      going into namespace [foo.bar]
Wed Jan 19 13:36:27      3 objects found

Note that you need to specify -d foo and -c bar to get it into the correct collection. If it would conflict, you could restore it into another collection and do a more delicate merge operation.

Now, if you do a find, you’ll get all of the documents:

> B.bar.find()
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
{ "_id" : 7 }
{ "_id" : 8 }
{ "_id" : 9 }
{ "_id" : 4 }
{ "_id" : 5 }
{ "_id" : 6 }

Hopefully this sort of thing can tide most people over until MongoDB supports multi-master.

5 thoughts on “How to Use Replica Set Rollbacks

    1. It is something we’re definitely planning on adding in the future, although it’s not scheduled for release yet (see https://jira.mongodb.org/browse/SERVER-2956). 

      Like

  1. Actually in this case node C MUST has priotity 0. Then when C is restarted it wont be a primary node. And when node A is restarted again will be the primary again. Now C will be sync with node A. Now C and B have all data
    Am I ok???  

    Like

    1.  I’m not sure what you mean.  If C is priority 0, it won’t be elected primary and there will be no rollback.  A is an arbiter: it cannot be elected primary and nothing can sync from an arbiter.

      Like

Leave a comment