With a name like Mongo, it has to be good

I just got back from MongoSF, which was awesome. Over 200 Mongo geeks, three tracks, and language-specific workshops all day.

I got to the conference early and took a picture of the venue... an hour later it was packed.

The highlight, for me, was Eliot Horowitz’s talk on sharding. He set up a MongoDB cluster of 25 large EC2 instances and started hammering them.

He pulled up an incredibly snazzy sharding GUI (okay, I wrote it, it’s not actually that snazzy) that displayed a row of stats for each shard. Each shard had about the same level of operations per second, so you could see that Mongo was doing pretty well distributing the data across the shards.

Eliot scrolled down to the bottom of the stats table where it showed the total number of operations per second across all the shards. The cluster was doing 8 million operations per second.

8,000,000 operations per second.

The whole audience burst into applause.

That’s about 320,000 operations per second per box, which is about what would be expected for Mongo on a powerful server (like a large EC2 instance).

Cool things about this:

  • If your app needs more than 8 million ops/sec, you can just add more machines and Mongo will take care of redistributing the load.
  • Your app doesn’t need to know if it’s talking to 1, 25, or 7,000 servers. You can focus on writing your app and let Mongo focus on scaling it.
Speaker's dinner the night before

If you missed out on MongoSF, there are a bunch of other Mongo conferences coming up: MongoNY at the end of May and MongoUK and MongoFR in June.

Join the Conversation

37 Comments

  1. Nice! was this achieved using the experimental auto-sharding functionality or is this readily available in the stable version?

    Like

  2. Nice! was this achieved using the experimental auto-sharding functionality or is this readily available in the stable version?

    Like

  3. @Chris there was video taken, I’m not sure if/when/where it’ll be up, but I’ll link to it if it is. There are slides at http://www.slideshare.net/mongosf, but these don’t include the demo, of course.

    @Slimo this was done with the latest 1.5 branch build, which is what I’d recommend using for anyone trying sharding.

    Like

  4. @Chris there was video taken, I’m not sure if/when/where it’ll be up, but I’ll link to it if it is. There are slides at http://www.slideshare.net/mongosf, but these don’t include the demo, of course.

    @Slimo this was done with the latest 1.5 branch build, which is what I’d recommend using for anyone trying sharding.

    Like

  5. Any idea on what was the setup of the cluster and what kind of workload it was?

    320k ops seems awesome but if the dataset was small (like fewer than 1 millions documents) and all in memory with only reads, the results are not that surprising.

    Like

    1. It was a mix of reads and writes, I think it was 4:1 reads to writes. I’d imagine almost everything was in memory.

      Like

  6. Any idea on what was the setup of the cluster and what kind of workload it was?

    320k ops seems awesome but if the dataset was small (like fewer than 1 millions documents) and all in memory with only reads, the results are not that surprising.

    Like

  7. IIRC, it was ~20,000 each of writes, deletes, updates, inserts, commands, & get mores and ~200,000 reads. I think that it was a quite large data set, but I’ll check with Eliot tomorrow.

    Like

  8. IIRC, it was ~20,000 each of writes, deletes, updates, inserts, commands, & get mores and ~200,000 reads. I think that it was a quite large data set, but I’ll check with Eliot tomorrow.

    Like

  9. was at MongoSF.. the sharding demo was definitely cool, a lot of the other talks/demos were very good as well. overall a great conference, esp. for the price. thanks again 🙂

    Like

  10. was at MongoSF.. the sharding demo was definitely cool, a lot of the other talks/demos were very good as well. overall a great conference, esp. for the price. thanks again 🙂

    Like

  11. @Noah unfortunately, replica sets aren’t ready yet. So, it was just 25 shards with no replication, which was actually kind of a problem because Amazon thought that we were DDOSing ourselves, so kept shutting down our instances, thus breaking the demo.

    About replica sets: I am totally in love with them, so if you’re interested subscribe to my RSS feed because I’ll be doing a blog post on how to use them ~3 seconds after they’re available in the nightly build.

    Like

  12. @Noah unfortunately, replica sets aren’t ready yet. So, it was just 25 shards with no replication, which was actually kind of a problem because Amazon thought that we were DDOSing ourselves, so kept shutting down our instances, thus breaking the demo.

    About replica sets: I am totally in love with them, so if you’re interested subscribe to my RSS feed because I’ll be doing a blog post on how to use them ~3 seconds after they’re available in the nightly build.

    Like

  13. 8 million ops/s is 320000 ops/s per instance.
    As you wrote this is far above a local Mongo, Redis or OrientDB instance.
    And this on EC2 with more network letency??
    Sounsd impossible. What’s behind this magic?
    Regards
    Stefan Edlich

    Like

  14. 8 million ops/s is 320000 ops/s per instance.
    As you wrote this is far above a local Mongo, Redis or OrientDB instance.
    And this on EC2 with more network letency??
    Sounsd impossible. What’s behind this magic?
    Regards
    Stefan Edlich

    Like

  15. As I wrote, ~320,000 is reasonable for a powerful server. There were lots of clients sending requests and MongoDB can do reads in parallel, which is what there were the most of.

    Like

  16. As I wrote, ~320,000 is reasonable for a powerful server. There were lots of clients sending requests and MongoDB can do reads in parallel, which is what there were the most of.

    Like

  17. Kristina,

    Those numbers sound amazing, but there are a lot of missing details ;-).

    1. what was the concurrency level?
    2. where there concurrent CRUD operations? (basically I expect things to behave quite differently when sets of READS are (almost completely) separated from sets of WRITES)
    3. what was the average data size/op?

    You know such benchmarks can be “fabricated” and missing information tends to encourage that kind of thought.

    Like

  18. Kristina,

    Those numbers sound amazing, but there are a lot of missing details ;-).

    1. what was the concurrency level?
    2. where there concurrent CRUD operations? (basically I expect things to behave quite differently when sets of READS are (almost completely) separated from sets of WRITES)
    3. what was the average data size/op?

    You know such benchmarks can be “fabricated” and missing information tends to encourage that kind of thought.

    Like

  19. 1. I have no idea.
    2. Yes.
    3. I have no idea.

    I think people are taking this post a bit too seriously… I just thought it was a really cool demo of what Mongo’s going to be able to do: scale horizontally to practically infinite capacity. It’s certainly not meant to be a benchmark.

    I think we’re working on a more interesting application (one has actual functionality other than hammering the database) that we’ll open source and use as a demo in the future. We’ll also do some real benchmarks against sharding before releasing 1.6, obviously.

    Like

  20. 1. I have no idea.
    2. Yes.
    3. I have no idea.

    I think people are taking this post a bit too seriously… I just thought it was a really cool demo of what Mongo’s going to be able to do: scale horizontally to practically infinite capacity. It’s certainly not meant to be a benchmark.

    I think we’re working on a more interesting application (one has actual functionality other than hammering the database) that we’ll open source and use as a demo in the future. We’ll also do some real benchmarks against sharding before releasing 1.6, obviously.

    Like

  21. Do you have any more details on the “incredibly snazzy sharding GUI” that you mentioned in the post? I'm interested in doing some load testing to see how MongoDB will perform for my system and it sounds like that GUI was giving some pretty valuable data — anyway to get access to that program? Thanks!

    Like

  22. Unfortunately, it's not publicly available yet, but we'll probably release it when 1.6 comes out. It doesn't give any information you can't get from the shell (it basically runs the serverStatus command once per second for each machine), it just presents it in a pretty way. I'm hoping to add a chunk viewer and stuff, too, but it's very, very, very alpha at the moment.

    Like

  23. Pingback: ehcache.net
  24. It seems that Eliot’s presentation of the 25 node cluster has been removed from the whole internet. I have doubts about the performance of 320.000 ops/sec/mongod (back in 2010) and would have loved to see his presentation to have some insights into his test setup.

    Is there any chance to get the video back or is it gone for ever? Thanks!

    Like

    1. I have no idea about the video availability, the 10gen site’s been recreated a few times since this was posted. For advice on speeding thing up, I recommend starting with http://docs.mongodb.org/manual/administration/optimization/ and MongoDB conferences often have talks about performance tuning.

      (I know this is a fairly unsatisfying answer, but I was busy being sick with nerves for my talk and wasn’t there when they were setting up the cluster.)

      Like

Leave a comment

Leave a Reply to Kay Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: