With a name like Mongo, it has to be good

kchodorowMay 5, 2010MongoDBMongoDB, MongoSF, sharding

I just got back from MongoSF, which was awesome. Over 200 Mongo geeks, three tracks, and language-specific workshops all day.

mongosf_hall — I got to the conference early and took a picture of the venue... an hour later it was packed.

The highlight, for me, was Eliot Horowitz’s talk on sharding. He set up a MongoDB cluster of 25 large EC2 instances and started hammering them.

He pulled up an incredibly snazzy sharding GUI (okay, I wrote it, it’s not actually that snazzy) that displayed a row of stats for each shard. Each shard had about the same level of operations per second, so you could see that Mongo was doing pretty well distributing the data across the shards.

Eliot scrolled down to the bottom of the stats table where it showed the total number of operations per second across all the shards. The cluster was doing 8 million operations per second.

8,000,000 operations per second.

The whole audience burst into applause.

That’s about 320,000 operations per second per box, which is about what would be expected for Mongo on a powerful server (like a large EC2 instance).

Cool things about this:

If your app needs more than 8 million ops/sec, you can just add more machines and Mongo will take care of redistributing the load.
Your app doesn’t need to know if it’s talking to 1, 25, or 7,000 servers. You can focus on writing your app and let Mongo focus on scaling it.

speakers_dinner — Speaker's dinner the night before

If you missed out on MongoSF, there are a bunch of other Mongo conferences coming up: MongoNY at the end of May and MongoUK and MongoFR in June.

37 thoughts on “With a name like Mongo, it has to be good”

Chris says:

May 5, 2010 at 11:51 pm

The sharding talk sounds interesting. Will any videos be posted?

LikeLike

Reply
1. Anonymous says:
  
  November 23, 2010 at 5:41 pm
  
  Yes, videos are available here: http://www.10gen.com/video
  
  LikeLike
  
  Reply
Chris says:

May 5, 2010 at 4:51 pm

The sharding talk sounds interesting. Will any videos be posted?

LikeLike

Reply
Simo says:

May 6, 2010 at 3:54 am

Nice! was this achieved using the experimental auto-sharding functionality or is this readily available in the stable version?

LikeLike

Reply
Simo says:

May 5, 2010 at 8:54 pm

Nice! was this achieved using the experimental auto-sharding functionality or is this readily available in the stable version?

LikeLike

Reply
kristina says:

May 6, 2010 at 12:01 pm

@Chris there was video taken, I’m not sure if/when/where it’ll be up, but I’ll link to it if it is. There are slides at http://www.slideshare.net/mongosf, but these don’t include the demo, of course.

@Slimo this was done with the latest 1.5 branch build, which is what I’d recommend using for anyone trying sharding.

LikeLike

Reply
kristina says:

May 6, 2010 at 5:01 am

@Chris there was video taken, I’m not sure if/when/where it’ll be up, but I’ll link to it if it is. There are slides at http://www.slideshare.net/mongosf, but these don’t include the demo, of course.

@Slimo this was done with the latest 1.5 branch build, which is what I’d recommend using for anyone trying sharding.

LikeLike

Reply
kristina says:

May 9, 2010 at 2:15 pm

Looks like videos are up at http://www.10gen.com/event_mongosf_10apr30

LikeLike

Reply
kristina says:

May 9, 2010 at 7:15 am

Looks like videos are up at http://www.10gen.com/event_mongosf_10apr30

LikeLike

Reply
Nicolas says:

May 11, 2010 at 10:17 pm

Any idea on what was the setup of the cluster and what kind of workload it was?

320k ops seems awesome but if the dataset was small (like fewer than 1 millions documents) and all in memory with only reads, the results are not that surprising.

LikeLike

Reply
1. Anonymous says:
  
  November 23, 2010 at 5:42 pm
  
  It was a mix of reads and writes, I think it was 4:1 reads to writes. I’d imagine almost everything was in memory.
  
  LikeLike
  
  Reply
Nicolas says:

May 11, 2010 at 3:17 pm

Any idea on what was the setup of the cluster and what kind of workload it was?

320k ops seems awesome but if the dataset was small (like fewer than 1 millions documents) and all in memory with only reads, the results are not that surprising.

LikeLike

Reply
kristina says:

May 11, 2010 at 10:36 pm

IIRC, it was ~20,000 each of writes, deletes, updates, inserts, commands, & get mores and ~200,000 reads. I think that it was a quite large data set, but I’ll check with Eliot tomorrow.

LikeLike

Reply
kristina says:

May 11, 2010 at 3:36 pm

IIRC, it was ~20,000 each of writes, deletes, updates, inserts, commands, & get mores and ~200,000 reads. I think that it was a quite large data set, but I’ll check with Eliot tomorrow.

LikeLike

Reply
Noah Gibbs says:

May 12, 2010 at 12:09 am

I agree – an absolutely awesome demo! 🙂

@Nicolas, I think it was five shards of five replica-set servers each. I might misremember.

LikeLike

Reply
Noah Gibbs says:

May 11, 2010 at 5:09 pm

I agree – an absolutely awesome demo! 🙂

@Nicolas, I think it was five shards of five replica-set servers each. I might misremember.

LikeLike

Reply
ian says:

May 11, 2010 at 9:10 pm

was at MongoSF.. the sharding demo was definitely cool, a lot of the other talks/demos were very good as well. overall a great conference, esp. for the price. thanks again 🙂

LikeLike

Reply
ian says:

May 12, 2010 at 4:10 am

was at MongoSF.. the sharding demo was definitely cool, a lot of the other talks/demos were very good as well. overall a great conference, esp. for the price. thanks again 🙂

LikeLike

Reply
kristina says:

May 12, 2010 at 2:19 pm

@Noah unfortunately, replica sets aren’t ready yet. So, it was just 25 shards with no replication, which was actually kind of a problem because Amazon thought that we were DDOSing ourselves, so kept shutting down our instances, thus breaking the demo.

About replica sets: I am totally in love with them, so if you’re interested subscribe to my RSS feed because I’ll be doing a blog post on how to use them ~3 seconds after they’re available in the nightly build.

LikeLike

Reply
kristina says:

May 12, 2010 at 7:19 am

@Noah unfortunately, replica sets aren’t ready yet. So, it was just 25 shards with no replication, which was actually kind of a problem because Amazon thought that we were DDOSing ourselves, so kept shutting down our instances, thus breaking the demo.

About replica sets: I am totally in love with them, so if you’re interested subscribe to my RSS feed because I’ll be doing a blog post on how to use them ~3 seconds after they’re available in the nightly build.

LikeLike

Reply
Stefan Edlich says:

May 12, 2010 at 4:36 pm

8 million ops/s is 320000 ops/s per instance.
As you wrote this is far above a local Mongo, Redis or OrientDB instance.
And this on EC2 with more network letency??
Sounsd impossible. What’s behind this magic?
Regards
Stefan Edlich

LikeLike

Reply
Stefan Edlich says:

May 12, 2010 at 9:36 am

8 million ops/s is 320000 ops/s per instance.
As you wrote this is far above a local Mongo, Redis or OrientDB instance.
And this on EC2 with more network letency??
Sounsd impossible. What’s behind this magic?
Regards
Stefan Edlich

LikeLike

Reply
kristina says:

May 12, 2010 at 6:52 pm

As I wrote, ~320,000 is reasonable for a powerful server. There were lots of clients sending requests and MongoDB can do reads in parallel, which is what there were the most of.

LikeLike

Reply
kristina says:

May 12, 2010 at 11:52 am

As I wrote, ~320,000 is reasonable for a powerful server. There were lots of clients sending requests and MongoDB can do reads in parallel, which is what there were the most of.

LikeLike

Reply
Alex Popescu says:

May 13, 2010 at 3:28 pm

Kristina,

Those numbers sound amazing, but there are a lot of missing details ;-).

1. what was the concurrency level?
2. where there concurrent CRUD operations? (basically I expect things to behave quite differently when sets of READS are (almost completely) separated from sets of WRITES)
3. what was the average data size/op?

You know such benchmarks can be “fabricated” and missing information tends to encourage that kind of thought.

LikeLike

Reply
Alex Popescu says:

May 13, 2010 at 8:28 am

Kristina,

Those numbers sound amazing, but there are a lot of missing details ;-).

1. what was the concurrency level?
2. where there concurrent CRUD operations? (basically I expect things to behave quite differently when sets of READS are (almost completely) separated from sets of WRITES)
3. what was the average data size/op?

You know such benchmarks can be “fabricated” and missing information tends to encourage that kind of thought.

LikeLike

Reply
kristina says:

May 13, 2010 at 4:47 pm

1. I have no idea.
2. Yes.
3. I have no idea.

I think people are taking this post a bit too seriously… I just thought it was a really cool demo of what Mongo’s going to be able to do: scale horizontally to practically infinite capacity. It’s certainly not meant to be a benchmark.

I think we’re working on a more interesting application (one has actual functionality other than hammering the database) that we’ll open source and use as a demo in the future. We’ll also do some real benchmarks against sharding before releasing 1.6, obviously.

LikeLike

Reply
kristina says:

May 13, 2010 at 9:47 am

1. I have no idea.
2. Yes.
3. I have no idea.

I think people are taking this post a bit too seriously… I just thought it was a really cool demo of what Mongo’s going to be able to do: scale horizontally to practically infinite capacity. It’s certainly not meant to be a benchmark.

I think we’re working on a more interesting application (one has actual functionality other than hammering the database) that we’ll open source and use as a demo in the future. We’ll also do some real benchmarks against sharding before releasing 1.6, obviously.

LikeLike

Reply
Pgkylxws says:

May 14, 2010 at 11:08 am

i think it's blog class ho, meet girls on webcams for free, %O,

LikeLike

Reply
Mike says:

May 14, 2010 at 4:04 pm

Do you have any more details on the “incredibly snazzy sharding GUI” that you mentioned in the post? I'm interested in doing some load testing to see how MongoDB will perform for my system and it sounds like that GUI was giving some pretty valuable data — anyway to get access to that program? Thanks!

LikeLike

Reply
kristina1 says:

May 14, 2010 at 4:23 pm

Unfortunately, it's not publicly available yet, but we'll probably release it when 1.6 comes out. It doesn't give any information you can't get from the shell (it basically runs the serverStatus command once per second for each machine), it just presents it in a pretty way. I'm hoping to add a chunk viewer and stuff, too, but it's very, very, very alpha at the moment.

LikeLike

Reply
Bob says:

November 23, 2010 at 5:12 pm

If you are looking for blog post suggestions, how about a post about MongoDB and memory.

LikeLike

Reply
1. Anonymous says:
  
  November 23, 2010 at 5:40 pm
  
  What are you interested in, in particular?
  
  LikeLike
  
  Reply
Pingback: ehcache.net
Pingback: Reflections on MongoSF 2012 « Meghan Gill
Kay says:

October 26, 2013 at 4:39 am

It seems that Eliot’s presentation of the 25 node cluster has been removed from the whole internet. I have doubts about the performance of 320.000 ops/sec/mongod (back in 2010) and would have loved to see his presentation to have some insights into his test setup.

Is there any chance to get the video back or is it gone for ever? Thanks!

LikeLike

Reply
1. kristina1 says:
  
  October 28, 2013 at 11:40 am
  
  I have no idea about the video availability, the 10gen site’s been recreated a few times since this was posted. For advice on speeding thing up, I recommend starting with http://docs.mongodb.org/manual/administration/optimization/ and MongoDB conferences often have talks about performance tuning.
  
  (I know this is a fairly unsatisfying answer, but I was busy being sick with nerves for my talk and wasn’t there when they were setting up the cluster.)
  
  LikeLike
  
  Reply