The MongoDB replication oplog is, by default, 5% of your free disk space. The theory behind this is that, if you’re writing 5% of your disk space every x amount of time, you’re going to run out of disk in 19x time. However, this doesn’t hold true for everyone, sometimes you’ll need a larger oplog. Some common cases:
- Applications that delete almost as much data as they create.
- Applications that do lots of in-place updates, which consume oplog entries but not disk space.
- Applications that do lots of multi-updates or remove lots of documents at once. These multi-document operations have to be “exploded” into separate entries for each document in the oplog, so that the oplog remains idempotent.
If you fall into one of these categories, you might want to think about allocating a bigger oplog to start out with. (Or, if you have a read-heavy application that only does a few writes, you might want a smaller oplog.) However, what if your application is already running in production when you realize you need to change the oplog size?
Usually if you’re having oplog size problems, you want to change the oplog size on the master. To change its oplog, we need to “quarantine” it so it can’t reach the other members (and your application), change the oplog size, then un-quarantine it.
To start the quarantine, shut down the master. Restart it without the --replSet
option on a different port. So, for example, if I was starting MongoDB like this:
$ mongod --replSet foo # default port
I would restart it with:
$ mongod --port 10000
Replica set members look at the last entry of the oplog to see where to start syncing from. So, we want to do the following:
- Save the latest insert in the oplog.
- Resize the oplog
- Put the entry we saved in the new oplog.
So, the process is:
1. Save the latest insert in the oplog.
> use local switched to db local > // "i" is short for "insert" > db.temp.save(db.oplog.rs.find({op : "i"}).sort( ... {$natural : -1}).limit(1).next())
Note that we are saving the last insert here. If there have been other operations since that insert (deletes, updates, commands), that’s fine, the oplog is designed to be able to replay ops multiple times. We don’t want to use deletes or updates as a checkpoint because those could have $s in their keys, and $s cannot be inserted into user collections.
2. Resize the oplog
First, back up the existing oplog, just in case:
$ mongodump --db local --collection 'oplog.rs' --port 10000
Drop the local.oplog.rs collection, and recreate it to be the size that you want:
> db.oplog.rs.drop() true > // size is in bytes > db.runCommand({create : "oplog.rs", capped : true, size : 1900000}) { "ok" : 1 }
3. Put the entry we saved in the new oplog.
> db.oplog.rs.save(db.temp.findOne())
Making this server primary again
Now shut down the database and start it up again with --replSet
on the correct port. Once it is a secondary, connect to the current primary and ask it to step down so you can have your old primary back (in 1.9+, you can use priorities to force a certain member to be preferentially primary and skip this step: it’ll automatically switch back to being primary ASAP).
> rs.stepDown(10000) // you'll get some error messages because reconfiguration // causes the db to drop all connections
Your oplog is now the correct size.
Edit: as Graham pointed out in the comments, you should do this on each machine that could become primary.
Kristina, thank you very much for your post. The community behind MongoDB is ever growing and all of you at 10Gen are the driving force of that.
LikeLike
You’re welcome! Thanks, we try.
LikeLike
Great info, thanks for this post
LikeLike
Don’t you need to repeat on each member of the replicaset to ensure the correct oplog size in the event of primary outage?
LikeLike
Yes, although only on machines that could become primary (not priority=0 machines). I’ll add that to the post, as it’s a pretty important point!
LikeLike
Another reason for the oplog to be(come) too small is if you are using LVM to periodically extend your volumes.
The initial oplog size will (of course) be based on the volume size you start with, but doubling the volumes in size a couple of times thereafter will make you run into arcane troubles later, when you need to resync a node… Better anticipate the maximum volume size you want to support.
LikeLike
Hi,
Just found this – looks really useful.
Would it work to do this on each secondary first and then perform a failover and doing the same on the old primary? Was thinking that this would reduce the number of failovers that happen (we have the same hardware for all our instances so it doesn’t matter which machine is the primary).
Thanks!
LikeLike
Yes, it definitely would.
LikeLike
And to follow up… did this earlier today without any problems 🙂
LikeLike
Maybe add “use local” above “db.oplog.rs.drop()” ?
LikeLike
I’m assuming that you’re already in the local db from the previous shell command.
LikeLike
Add “–port 10000” to the mongodump line.
LikeLike
Fixed, thank you!
LikeLike
Just a heads up: Saving the last item from the oplog will fail when that item contains an update operator (containing “$”). You will get the following error:
Thu Mar 8 12:43:07 uncaught exception: field names cannot start with $ [$set]
In which case one could use the second-to-last,or third-to-last, etc.
We had a load of $set updates in the oplog, so we filtered out with:
db.oplog.rs.find({‘o.$set’:{$exists:false}}).sort({$natural:-1}).limit(5);
Are there downsides/gotchas to not using the very latest item from the oplog?
I imagine it would, at the very least, re-apply all oplog items after the one you’ve saved, which would be a problem with $inc, $bit, $push, etc.
Any way to force mongo to accept $set as a key?
Thanks!
LikeLike
Good point! You could also look for oplog entries where ‘op’:’i’ (inserts), as those will not have any $-operators.
There is no downside to using an earlier entry. $incs et al are converted into $sets, as there is an expectation that oplog entries may be replayed multiple times. Oplog entries are designed to be idempotent.
LikeLike
For future reference: fixed the post to store/restore the last insert in the oplog.
LikeLike
Sorry for restarting the post. But I’ve stumbled into what I think is a strange situation. Then again being a Mongodb neophyte could also be the problem.
I’m trying to increase the size of my oplog. And ran into a problem right off the start. I don’t have a oplog.rs I have a oplog.$main. (I’m using a master / slave model) I can query and save data by following the instructions in the docs, but get halted when I try to drop the oplog. Since oplog.rs or oplog.db does nothing for me I try oplog.$main and get this response.
> db.oplog.$main.drop()
Fri Oct 24 12:34:44.279 drop failed: {
“errmsg” : “exception: can’t drop collection with reserved $ character in name”,
“code” : 10039,
“ok” : 0
} at src/mongo/shell/collection.js:383
The error is clear but how to work around this. Any thoughts?
LikeLike
Use replica sets, not master slave. If you think that your use case cannot be satisfied via replica sets, please check that on the mailing list or StackOverflow first.
LikeLike