Resizing Your Oplog

kchodorowFebruary 22, 2011MongoDBoplog, replica sets

The MongoDB replication oplog is, by default, 5% of your free disk space. The theory behind this is that, if you’re writing 5% of your disk space every x amount of time, you’re going to run out of disk in 19x time. However, this doesn’t hold true for everyone, sometimes you’ll need a larger oplog. Some common cases:

Applications that delete almost as much data as they create.
Applications that do lots of in-place updates, which consume oplog entries but not disk space.
Applications that do lots of multi-updates or remove lots of documents at once. These multi-document operations have to be “exploded” into separate entries for each document in the oplog, so that the oplog remains idempotent.

If you fall into one of these categories, you might want to think about allocating a bigger oplog to start out with. (Or, if you have a read-heavy application that only does a few writes, you might want a smaller oplog.) However, what if your application is already running in production when you realize you need to change the oplog size?

Usually if you’re having oplog size problems, you want to change the oplog size on the master. To change its oplog, we need to “quarantine” it so it can’t reach the other members (and your application), change the oplog size, then un-quarantine it.

To start the quarantine, shut down the master. Restart it without the --replSet option on a different port. So, for example, if I was starting MongoDB like this:

$ mongod --replSet foo # default port

I would restart it with:

$ mongod --port 10000

Replica set members look at the last entry of the oplog to see where to start syncing from. So, we want to do the following:

Save the latest insert in the oplog.
Resize the oplog
Put the entry we saved in the new oplog.

So, the process is:

1. Save the latest insert in the oplog.

> use local
switched to db local
> // "i" is short for "insert"
> db.temp.save(db.oplog.rs.find({op : "i"}).sort(
... {$natural : -1}).limit(1).next())

Note that we are saving the last insert here. If there have been other operations since that insert (deletes, updates, commands), that’s fine, the oplog is designed to be able to replay ops multiple times. We don’t want to use deletes or updates as a checkpoint because those could have $s in their keys, and $s cannot be inserted into user collections.

2. Resize the oplog

First, back up the existing oplog, just in case:

$ mongodump --db local --collection 'oplog.rs' --port 10000

Drop the local.oplog.rs collection, and recreate it to be the size that you want:

> db.oplog.rs.drop()
true
> // size is in bytes
> db.runCommand({create : "oplog.rs", capped : true, size : 1900000}) 
{ "ok" : 1 }

3. Put the entry we saved in the new oplog.

> db.oplog.rs.save(db.temp.findOne())

Making this server primary again

Now shut down the database and start it up again with --replSet on the correct port. Once it is a secondary, connect to the current primary and ask it to step down so you can have your old primary back (in 1.9+, you can use priorities to force a certain member to be preferentially primary and skip this step: it’ll automatically switch back to being primary ASAP).

> rs.stepDown(10000)
// you'll get some error messages because reconfiguration 
// causes the db to drop all connections

Your oplog is now the correct size.

Edit: as Graham pointed out in the comments, you should do this on each machine that could become primary.

21 thoughts on “Resizing Your Oplog”

Tim Linquist says:

February 22, 2011 at 9:42 pm

Kristina, thank you very much for your post. The community behind MongoDB is ever growing and all of you at 10Gen are the driving force of that.

LikeLike

Reply
1. Anonymous says:
  
  February 23, 2011 at 12:59 am
  
  You’re welcome! Thanks, we try.
  
  LikeLike
  
  Reply
Pingback: Tweets that mention Resizing Your Oplog -- Topsy.com
Adrian Hills (@AdaTheDev) says:

February 23, 2011 at 8:13 am

Great info, thanks for this post

LikeLike

Reply
Pingback: Women Turtleneck: Eddie Bauer Cotton/Cashmere Turtleneck Sweater
Pingback: ehcache.net
Graham says:

March 21, 2011 at 8:24 pm

Don’t you need to repeat on each member of the replicaset to ensure the correct oplog size in the event of primary outage?

LikeLike

Reply
1. Anonymous says:
  
  March 21, 2011 at 8:42 pm
  
  Yes, although only on machines that could become primary (not priority=0 machines). I’ll add that to the post, as it’s a pretty important point!
  
  LikeLike
  
  Reply
Pieter Ennes says:

April 26, 2011 at 10:10 am

Another reason for the oplog to be(come) too small is if you are using LVM to periodically extend your volumes.

The initial oplog size will (of course) be based on the volume size you start with, but doubling the volumes in size a couple of times thereafter will make you run into arcane troubles later, when you need to resync a node… Better anticipate the maximum volume size you want to support.

LikeLike

Reply
Colin Howe says:

December 7, 2011 at 1:09 am

Hi,

Just found this – looks really useful.

Would it work to do this on each secondary first and then perform a failover and doing the same on the old primary? Was thinking that this would reduce the number of failovers that happen (we have the same hardware for all our instances so it doesn’t matter which machine is the primary).

Thanks!

LikeLike

Reply
1. Anonymous says:
  
  December 7, 2011 at 6:57 am
  
  Yes, it definitely would.
  
  LikeLike
  
  Reply
  1. Colin Howe says:
    
    December 7, 2011 at 7:14 am
    
    And to follow up… did this earlier today without any problems 🙂
    
    LikeLike
Thorn Roby says:

February 7, 2012 at 12:44 pm

Maybe add “use local” above “db.oplog.rs.drop()” ?

LikeLike

Reply
1. Anonymous says:
  
  February 7, 2012 at 2:14 pm
  
  I’m assuming that you’re already in the local db from the previous shell command.
  
  LikeLike
  
  Reply
Thorn Roby says:

February 7, 2012 at 1:55 pm

Add “–port 10000” to the mongodump line.

LikeLike

Reply
1. Anonymous says:
  
  February 7, 2012 at 2:14 pm
  
  Fixed, thank you!
  
  LikeLike
  
  Reply
Anastasios Bitsios says:

March 8, 2012 at 5:56 am

Just a heads up: Saving the last item from the oplog will fail when that item contains an update operator (containing “$”). You will get the following error:
Thu Mar 8 12:43:07 uncaught exception: field names cannot start with $ [$set]

In which case one could use the second-to-last,or third-to-last, etc.
We had a load of $set updates in the oplog, so we filtered out with:

db.oplog.rs.find({‘o.$set’:{$exists:false}}).sort({$natural:-1}).limit(5);

Are there downsides/gotchas to not using the very latest item from the oplog?
I imagine it would, at the very least, re-apply all oplog items after the one you’ve saved, which would be a problem with $inc, $bit, $push, etc.

Any way to force mongo to accept $set as a key?

Thanks!

LikeLike

Reply
1. kristina1 says:
  
  March 8, 2012 at 7:09 am
  
  Good point! You could also look for oplog entries where ‘op’:’i’ (inserts), as those will not have any $-operators.
  
  There is no downside to using an earlier entry. $incs et al are converted into $sets, as there is an expectation that oplog entries may be replayed multiple times. Oplog entries are designed to be idempotent.
  
  LikeLike
  
  Reply
  1. kristina1 says:
    
    March 19, 2012 at 2:25 pm
    
    For future reference: fixed the post to store/restore the last insert in the oplog.
    
    LikeLike
Karl Raffelsieper says:

October 24, 2014 at 12:30 pm

Sorry for restarting the post. But I’ve stumbled into what I think is a strange situation. Then again being a Mongodb neophyte could also be the problem.

I’m trying to increase the size of my oplog. And ran into a problem right off the start. I don’t have a oplog.rs I have a oplog.$main. (I’m using a master / slave model) I can query and save data by following the instructions in the docs, but get halted when I try to drop the oplog. Since oplog.rs or oplog.db does nothing for me I try oplog.$main and get this response.

> db.oplog.$main.drop()

Fri Oct 24 12:34:44.279 drop failed: {

“errmsg” : “exception: can’t drop collection with reserved $ character in name”,

“code” : 10039,

“ok” : 0

} at src/mongo/shell/collection.js:383

The error is clear but how to work around this. Any thoughts?

LikeLike

Reply
1. kristina1 says:
  
  October 24, 2014 at 2:36 pm
  
  Use replica sets, not master slave. If you think that your use case cannot be satisfied via replica sets, please check that on the mailing list or StackOverflow first.
  
  LikeLike
  
  Reply