Sharding with the Fishes

kchodorowMarch 30, 2010MongoDBmafia, MongoDB, sharding

Sharding is the not-so-revolutionary way that MongoDB scales writes (it’s very similar to techniques described in the Big Table paper and by PNUTS) but many people are unfamiliar with what it is and how it works. If you’ve seen a talk on MongoDB or looked at the website, you’ve probably seen a diagram of sharding that looks like this:

…which probably looks a bit like “I hope I don’t have to understand that.”

However, it’s actually quite simple: it’s exactly how the Mafia is structured (or, at least, how The Godfather taught me it was):

The shards are the peons: someone tells them to do something (e.g., a query or an insert), they do it and report back.
The mongos is the godfather. It knows what data each peon has and gives them orders. It’s basically a router for the requests.
The config server is the consigliere. It knows where all of the data is at any given time and lets the boss know so that he can focus on bossing. The consigliere keeps the organization running smoothly.

For a concrete example, let’s say we have a collection of blog posts. You choose a “shard key,” which is the value Mongo will use to split the data across shards. We’ll choose “date” as the shard key, which means it will be split up based on values in the “date” field. If we have four shards, they might contain data something like:

shard #1: beginning of time up to June 2009
shard #2: July 2009 to November 2009
shard #3: December 2009 to February 2010
shard #4: March 2010 through the end of time

Now that we’ve got our peons set up, let’s ask the godfather for some favors.

Queries

Say you query for all documents created from the beginning of this year (January 1st, 2010) up to the present. Here’s what happens:

You (the client) send the query to the godfather.
The godfather knows which shards contain the data you’re looking for, so he sends the query to shards #3 and #4.
shard #3 and shard #4 execute the query and return the results to the godfather.
The godfather puts together the results he’s received and sends them back to the client.

Note how all of the sharding stuff is done a layer away from the client, so your application doesn’t have to be sharding-aware, it can just query the godfather as though it were a normal mongod instance.

Inserts

Suppose you want to insert a new document with today’s date. Here’s the sequence of events:

You send the document to the godfather.
It sees today’s date and routes it to shard #4.
shard #4 inserts the document.

Again, identical to a single-server setup from the client’s point of view.

So where’s the consigliere?

Suppose you start getting millions of documents inserted with the date September 2009. Shard #2 begins to swell up like a bloated corpse. The consigliere will notice this and, when shard #2 gets too big it will split the data on shard #2 and migrate some of it to other, emptier shards. Then it will let the godfather know that now shard #2 contains July 2009-September 15th 2009 and shard #3 contains September 16th 2009-February 2010.

The consigliere is also responsibly for figuring out what to do when you add a new shard to the cluster. It figures out if it should keep the new shard in reserve or migrate some data to it right away. Basically, it’s the brains of the operation.

Whenever the consigliere moves around data, it lets the godfather know what the final configuration is so that the godfather can continue routing requests correctly.

Leave the gun. Take the cannolis.

This scaling deliciousness is, unfortunately, still very alpha. You can help us out by telling us where our documentation sucks (specifics are better than “it sucks”), testing it out on your machine, and voting for features you’d like to see.

34 thoughts on “Sharding with the Fishes”

Dan says:

March 30, 2010 at 5:34 pm

Three questions come out of this discussion on sharding:

1) If you only have one big, kicking server, are you better off using it for a single mongod or would it be better to divide it up into four or five virtual machines with shards and mongos on them? And by “better off” I mean raw query response.

2) Your “division by date” example is pretty simplistic. How about a division by type of data? Or perhaps chunks that would be normalized in a SQL database.

and finally,
3) Can the consigliere and the godfather live on the same box and either on a shard box?

LikeLike

Reply
Dan says:

March 30, 2010 at 10:34 am

Three questions come out of this discussion on sharding:

1) If you only have one big, kicking server, are you better off using it for a single mongod or would it be better to divide it up into four or five virtual machines with shards and mongos on them? And by “better off” I mean raw query response.

2) Your “division by date” example is pretty simplistic. How about a division by type of data? Or perhaps chunks that would be normalized in a SQL database.

and finally,
3) Can the consigliere and the godfather live on the same box and either on a shard box?

LikeLike

Reply
John Bledsoe says:

March 30, 2010 at 5:39 pm

I think you meant Shard #4 should be March 2010.

LikeLike

Reply
John Bledsoe says:

March 30, 2010 at 10:39 am

I think you meant Shard #4 should be March 2010.

LikeLike

Reply
kristina says:

March 30, 2010 at 5:46 pm

@Dan:
1) I’m actually not clear on this point myself. mongod figures out how much free space you have, but I think if you had the (slightly ridiculous) setup of a server with 100GB of storage and 2GB of ram and another with 8GB of storage and 8GB of ram, I’d imagine most of the data would end up on the 100/2 server. You could ask on the list (http://groups.google.com/group/mongodb-user/).

2) As long as it’s a field, you can shard on it. It shards by differences in field value (so you wouldn’t want to shard on a boolean field, but other than that anything should work). Eventually you’ll be able to shard on compound fields, too, but I don’t think this is working yet.

3) Yes, see http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-ServerLayout for an ugly diagram.

@John Indeed I did, thanks.

LikeLike

Reply
kristina says:

March 30, 2010 at 10:46 am

@Dan:
1) I’m actually not clear on this point myself. mongod figures out how much free space you have, but I think if you had the (slightly ridiculous) setup of a server with 100GB of storage and 2GB of ram and another with 8GB of storage and 8GB of ram, I’d imagine most of the data would end up on the 100/2 server. You could ask on the list (http://groups.google.com/group/mongodb-user/).

2) As long as it’s a field, you can shard on it. It shards by differences in field value (so you wouldn’t want to shard on a boolean field, but other than that anything should work). Eventually you’ll be able to shard on compound fields, too, but I don’t think this is working yet.

3) Yes, see http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-ServerLayout for an ugly diagram.

@John Indeed I did, thanks.

LikeLike

Reply
Fitz Agard says:

April 2, 2010 at 4:39 pm

Great word illustration!

I’m looking forward to testing out sharding with the geospatial functionality. On an older project I ran into huge performance issues with ton loads of lat/long data and distance queries. I’m going to try to replicate the project using MongoDB.

LikeLike

Reply
Fitz Agard says:

April 2, 2010 at 9:39 am

Great word illustration!

I’m looking forward to testing out sharding with the geospatial functionality. On an older project I ran into huge performance issues with ton loads of lat/long data and distance queries. I’m going to try to replicate the project using MongoDB.

LikeLike

Reply
kristina says:

April 2, 2010 at 5:42 pm

@Fitz cool! Let us know how it goes.

LikeLike

Reply
kristina says:

April 2, 2010 at 10:42 am

@Fitz cool! Let us know how it goes.

LikeLike

Reply
Pingback: Mongo Smash! | deadkarma
Hery says:

June 22, 2010 at 7:27 am

Thanks Kristina, very good illustration! I'm going to try that..

LikeLike

Reply
Navin says:

July 14, 2010 at 12:14 pm

lol @ godfather analogy :)– desperately waiting for mongo sharding to be production ready…in the mean time, my app has to handle sharding based on ranges…

LikeLike

Reply
VJ says:

July 14, 2010 at 5:47 pm

Hello. We've just started using MongoDB 1.5 (CentOS) and are trying out sharding. I've installed mongo on 4 servers (A, B, C, D). Server A runs “mongod –configsvr” and “mongos –configdb”, and servers B, C, D run “mongod –shardsvr”. After setting up & enabling sharding, I inserted some test data, but I only see data on server B. I used a small chunk size (1 MB). Will I not see data on the other servers until I've inserted 1 MB of data?Thanks.

LikeLike

Reply
kristina1 says:

July 14, 2010 at 5:53 pm

You won't see any sharding activity until you've inserted at least 200MB of data! If you just want to try it out and see some data move around, you can wipe everything and start again, starting mongos with the option –chunkSize 1. This will force it to start sharding data after 1 MB has been inserted. Also, make sure sharding is enabled at the db and collection levels.

LikeLike

Reply
VJ says:

July 14, 2010 at 6:35 pm

I am using a chunk size of 1 MB [“mongos –configdb localhost:20000 –chunkSize 1”]. I inserted 1 million records on server A, and I see 1 million records on server B using db.people.count().

LikeLike

Reply
kristina1 says:

July 14, 2010 at 6:47 pm

It sounds like you may not have enabled sharding on the database and the collection. See http://www.mongodb.org/display/DOCS/A+Sample+Co…, particularly the enablesharding and shardcollection commands.Also: if you started mongos at some point without the –chunkSize, it might have registered a bigger chunk size in the configuration and won't go back, even though you're now using a smaller one.

LikeLike

Reply
VJ says:

July 14, 2010 at 6:47 pm

I just noticed these error msgs in the mongos.log on server A. Did we build mongo incorrectly?========================================[Balancer] Wed Jul 14 14:45:54 balancer: chose [shard0] to [shard1] { _id: “test.people-name_MinKey”, lastmod: Timestamp 135000|0, ns: “test.people”, min: { name: MinKey }, max: { name: 1.0 }, shard: “shard0” }[Balancer] Wed Jul 14 14:45:54 moving chunk ns: test.people moving ( shard ns:test.people shard: shard0:68.67.172.15:10000 lastmod: 135|0 min: { name: MinKey } max: { name: 1.0 }) shard0:68.67.172.15:10000 -> shard1:68.67.172.16:10000[Balancer] Wed Jul 14 14:45:54 balancer: MOVE FAILED **** no such cmd { errmsg: “no such cmd”, bad cmd: { moveChunk: “test.people”, from: “68.67.172.15:10000”, to: “68.67.172.16:10000”, filter: { name: { $gte: MinKey, $lt: 1.0 } }, shardId: “test.people-name_MinKey”, configdb: “localhost:20000” }, ok: 0.0 } from: shard0 to: chunk: { _id: “test.people-name_MinKey”, lastmod: Timestamp 135000|0, ns: “test.people”, min: { name: MinKey }, max: { name: 1.0 }, shard: “shard0” }========================================

LikeLike

Reply
VJ says:

July 14, 2010 at 6:54 pm

First of all, thanks for your prompt replies.I had started with a chunk size of 1 MB. Also, I followed the instructions to enable sharding on the collection.========================================> db.runCommand({listshards:1}){ “shards” : [ { “_id” : “shard0”, “host” : “68.67.172.15:10000” }, { “_id” : “shard1”, “host” : “68.67.172.16:10000” }, { “_id” : “shard2”, “host” : “68.67.172.17:10000” } ], “ok” : 1}========================================> db.printShardingStatus();— Sharding Status — sharding version: { “_id” : 1, “version” : 3 } shards: { “_id” : “shard0”, “host” : “68.67.172.15:10000” } { “_id” : “shard1”, “host” : “68.67.172.16:10000” } { “_id” : “shard2”, “host” : “68.67.172.17:10000” } databases: { “_id” : “admin”, “partitioned” : false, “primary” : “config” } { “_id” : “test”, “partitioned” : true, “primary” : “shard0”, “sharded” : { “test.people” : { “key” : { “name” : 1 }, “unique” : false } } } test.people chunks: { “name” : { $minKey : 1 } } –>> { “name” : 1 } on : shard0 { “t” : 135000, “i” : 0 } { “name” : 1 } –>> { “name” : 8012 } on : shard0 { “t” : 135000, “i” : 1 } { “name” : 8012 } –>> { “name” : 14514 } on : shard0 { “t” : 135000, “i” : 2 }::: { “t” : 135000, “i” : 132 } { “name” : 987356 } –>> { “name” : “j” } on : shard0 { “t” : 135000, “i” : 133 } { “name” : “j” } –>> { “name” : { $maxKey : 1 } } on : shard0 { “t” : 135000, “i” : 134 }========================================

LikeLike

Reply
kristina1 says:

July 14, 2010 at 6:56 pm

Thanks for all the information! It looks like everything is set up right. You may have hit a bug, could you send all of this info to the user list? http://groups.google.com/group/mongodb-user

LikeLike

Reply
VJ says:

July 14, 2010 at 7:13 pm

I've submitted the info. Thanks.

LikeLike

Reply
vic_nyc says:

August 5, 2010 at 8:48 pm

Really enjoyed this, thanks a lot. So easy to understand!

LikeLike

Reply
Tom says:

August 11, 2010 at 12:50 pm

Awesome blog

LikeLike

Reply
Pingback: NoSQL Daily – Mon Sep 13 › PHP App Engine
Rohith says:

September 20, 2010 at 7:54 pm

Great man I really enjoyed it

LikeLike

Reply
Pingback: Three Things To Watch Out For With NoSQL | Jeremiah Peschka
Pingback: ehcache.net
Pingback: Sharding Mongodb Overview | Software Development/Product Ideas/Games Development
Bnelson says:

February 10, 2012 at 9:39 am

The way you describe this makes me feel very very smart. You should think about going into teaching full-time 🙂

LikeLike

Reply
1. Anonymous says:
  
  February 10, 2012 at 11:34 am
  
  Haha, thank you!
  
  LikeLike
  
  Reply
Easy_010481 says:

July 12, 2012 at 3:08 am

i have a collection with documents in shape ({_id:{w1:”word 1″,W2:”word 2″}, value: 3}), i want sharding my collection. could i choose _id be shard key? my system is heavy in read..!! thanks

LikeLike

Reply
1. kristina1 says:
  
  July 12, 2012 at 6:53 am
  
  It really depends what your read load is like and what kinds of values w1 & W2 have. You should ask & give as much info as you can on the mailing list: https://groups.google.com/group/mongodb-user.
  
  LikeLike
  
  Reply
  1. Easy_010481 says:
    
    July 14, 2012 at 12:29 am
    
    thanks so much!!!
    Can you give me advice!!! when i finish config a cluster. i should create database directly on shard and then enablesharding or create on mongos and then enablesharding. which databases mongos can see on its cluster!!!
    
    Thanks!!
    
    LikeLike
  2. kristina1 says:
    
    July 16, 2012 at 7:34 am
    
    Please ask questions on the mailing list.
    
    LikeLike