Part 3 of the replication internals series: three handy tricks.
This is the third post in a three-part series on replication. See also parts 1 (replication internals) and 2 (getting to know your oplog).
MongoDB has a type of query that behaves like the tail -f command: it shows you new data as it’s written to a collection. This is great for the oplog, where you want to see new records as they pop up and don’t want to query over and over.
If you want this type of ongoing query, MongoDB returns a tailable cursor. When this cursor gets to the end of the result set it will hang around and wait for more elements to be added to the collection. As they’re added, the cursor will return them. If no elements are added for a while, the cursor will time out and the client has to requery if they want more results.
Using your knowledge of the oplog’s format, you can use a tailable cursor to do a long pull for activities in a certain collection, of a certain type, at a certain time… almost any criteria you can imagine.
Using the oplog for crash recovery
Suppose your database goes down, but you have a fairly recent backup. You could put a backup into production, but it’ll be a bit behind. You can bring it up-to-date using your oplog entries.
If you use the trigger mechanism (described above) to capture the entire oplog and send it to a non-capped collection on another server, you can then use an oplog replayer to play the oplog over your dump, bringing it as up-to-date as possible.
Pick a time pre-dump and start replaying the oplog from there. It’s okay if you’re not sure exactly when the dump was taken because the oplog is idempotent: you can apply it to your data as many times as you want and your data will end up the same.
Also, warning: I haven’t tried out the oplog replayer I linked to, it’s just the first one I found. There are a few different ones out there and they’re pretty easy to write.
Creating non-replicated collections
The local database contains data that is local to a given server: it won’t be replicated anywhere. This is one reason why it holds all of the replication info.
local isn’t reserved for replication stuff: you can put your own data there, too. If you do a write in the local database and then check the oplog, you’ll notice that there’s no record of the write. The oplog doesn’t track changes to the local database, since they won’t be replicated.
And now I’m all replicated out. If you’re interested in learning more about replication, check out the core documentation on it. There’s also core documentation on tailable cursors and language-specific instructions in the driver documentation.
5 thoughts on “Bending the Oplog to Your Will”
Question: how does this work with multiple shards where each shard is replicated? is there one oplog per shard? or one oplog for the whole configuration? would i be able to query the MongoS process for the oplog of the configuration?
There is one oplog per shard. Each shard is a replica set that doesn’t “know” it’s a shard (okay, it does, but it’s really just a normal replica set). So, if you wanted, you could connect directly to the shard to mess with its oplog.
Honestly, I’m not sure what mongos does if you try to use the local database. It might put it on the config servers or it might use one of the shards or it might give you an error… you could try it and report back!
Thanks for your response. I tried it out, if you use the local db on MongoS, it returns an error saying that’s not allowed.