Replication Internals

No, at the moment it’ll actually log an update op. This is sort of just an implementation detail, though, not by-design. It actually may change to no being logged for 2.4, which is having a lot of the update code re-written.

LikeLike

Reply

2012-11-06T12:58:00-05:00

Thanks for the quick reply, Kristina! I was hoping to use this data for tracking changes to documents over a period and ignoring updates that were effectively no-ops (nothing modified). Perhaps I can work out a way to determine this using the current output.

LikeLike

2012-11-06T15:46:00-05:00

No problem! I think you’d have to compare the op to the underlying document, at the moment.

LikeLike

2012-11-07T07:50:00-05:00

OK. I’m concerned that there’s some chance that the document may have changed in the time since the log entry was created and the process that is recording the event reads it. If so then the underlying document may no longer be current.

LikeLike

2013-01-07T14:49:00-05:00

I am using mongolab for mongodb hosting , so I can access to op-log when mongodb is in localhost but my db is hosted on web.

LikeLike

Reply

2013-01-07T15:06:00-05:00

You cannot access the oplog on a hosted solution, as all the ops from everyone else’s DBs would be visible, too.

LikeLike

Reply

2013-03-03T12:48:00-05:00

I can see myself getting into this same situation. Is there a work around or alternate method for creating triggers if you do not have access to the oplog?

LikeLike

2013-03-03T12:46:00-05:00

I set up a test to listen to the oplog for changes to a certain collection. It’s working fine, but I notice there is a delay of about 1 second maybe… 80% of the time. The other times it executes quickly, completing in under 100ms. Is there some delay on the oplog generation? Is it polling to generate the entries? Is there some configuration I could use to decrease the lag, and would that have other penalties such as over taxing the CPU?

I am using the node.js driver. This is my implementation, though I don’t imagine it will work outside of my application environment:

	{EventEmitter} = require 'events'
	eventChannel = new EventEmitter()

	eventChannel.on 'error', (args…) -> console.log "Event Channel received error:", args…

	module.exports = eventChannel

view raw

eventChannel.coffee

hosted with ❤ by GitHub

	{Server, Db, Timestamp} = require 'mongodb'
	client = new Db 'local', new Server('localhost', 27017, {native_parser: true}), {w: 0}
	eventChannel = config.require 'load/eventChannel'

	getTimestamp = (date) ->
	date \|\|= new Date()
	time = Math.floor(date.getTime() / 1000)
	new Timestamp 0, time

	getDate = (timestamp) ->
	new Date timestamp.high_ * 1000

	mapOp =
	n: 'noop'
	i: 'insert'
	u: 'update'
	r: 'remove'

	options = {} # raw

	module.exports =
	connect: (opts) ->
	options.merge opts

	watch: (collection) ->

	# watch user model
	client.open (err) ->
	console.log 'Error connecting:', err if err

	client.collection 'oplog.rs', (err, oplog) ->

	options =
	tailable: true
	tailableRetryInterval: 1000
	numberOfRetries: 1000

	currentTime = getTimestamp()

	cursor = oplog.find {ts: {$gte: currentTime}}, options
	stream = cursor.stream()

	stream.on 'data', (data) ->
	if collection
	return unless data.ns is collection

	if options.raw
	event = data
	else
	event =
	timestamp: getDate data.ts
	operation: mapOp[data.op] or data.op
	namespace: data.ns
	id: data.h.toString()
	criteria: data.o2
	data: data.o

	eventChannel.emit 'change', event

view raw

watchMongo.coffee

hosted with ❤ by GitHub

I’ll release this as a simple watcher library once I get it cleaned up.

LikeLike

Reply

2013-03-04T15:11:00-05:00

First, it might be the rate you’re writing to the oplog: oplog queries will hang around for a while waiting for results. Second, you might want to use the “oplog replay” option (no idea what it’s called in Coffeescript) which makes querying the oplog more efficient.

LikeLike

Reply

2013-03-04T16:56:00-05:00

Kristina,

Thanks for your reply!

I changed the tailableRetryInterval in my code to 100ms, and that reduced the delay. So it looks like the tailable cursor is actually polling for results. Hmm… not ideal. I’ll take this up on the node-mongodb-native google group though, as it seems it’s an implementation detail of the driver.

LikeLike

2013-03-04T17:07:00-05:00

Ok, so I actually found an answer. If you initialize the cursor with ‘awaitdata: true’ then it will rely on Mongo server functionality to push out the new data. Here’s an example from the tests in node-mongodb-native:

https://github.com/mongodb/node-mongodb-native/blob/master/test/cursor_test.js#L2038

Using this my test runs in ~100ms (20ms lag over just waiting for the update callback), or if the server’s been idle for a little while it’s more like 500ms. That should be acceptable.

LikeLike

2013-03-04T17:13:00-05:00

Cool, glad it worked out! I do recommend still looking into the oplog replay flag.

LikeLike

2013-03-04T17:39:00-05:00

You’re referring to the OplogReplay bit mentioned here, I assume?

http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/

I think I’ll have to ask Christian more specifically if his driver handles that… I grepped the code and cannot find it, but maybe he is passing all the options through directly. What specifically does this flag do? I can’t seem to find much documentation on it, though I see it mentioned by Scott here:

https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/b1qiuAIG75A

Thanks again for your help. This is an adventure into the lair of the beast, to be certain. 🙂

LikeLike

2013-03-05T12:56:00-05:00

Yes, that’s the flag I’m talking about. The oplog doesn’t have any indexes, so when you query mongod has to scan every document. The oplog replay flag makes the query start at the latest document and jump back by 200MB at a time to try to find a timestamp earlier than the one you’re querying for. Then, once it’s found the right oplog segment to search, it’ll move forward one document at a time. It makes querying the oplog for a particular timestamp much faster.

LikeLike

2013-03-06T01:37:00-05:00

Alrighty, library is published. Here it is for any interested:

https://github.com/TorchlightSoftware/mongo-watch

LikeLike

2013-03-08T11:50:00-05:00

Cool, thanks for sharing!

LikeLike

Replication Internals

24 thoughts on “Replication Internals”

Leave a comment Cancel reply

Share this:

Related

24 thoughts on “Replication Internals”

Leave a comment Cancel reply