On the mongodb-user mailing list last week, someone asked (basically):
I have 4 servers and I want two shards. How do I set it up?
A lot of people have been asking questions about configuring replica sets and sharding, so here’s how to do it in nitty-gritty detail.
The Architecture
Prerequisites: if you aren’t too familiar with replica sets, see my blog post on them. The rest of this post won’t make much sense unless you know what an arbiter is. Also, you should know the basics of sharding.
Each shard should be a replica set, so we’ll need two replica sets (we’ll call them foo and bar). We want our cluster to be okay if one machine goes down or gets separated from the herd (network partition), so we’ll spread out each set among the available machines. Replica sets are color-coded and machines are imaginatively named server1-4.
Each replica set has two hosts and an arbiter. This way, if a server goes down, no functionality is lost (and there won’t be two masters on a single server).
To set this up, run:
server1
$ mkdir -p ~/dbs/foo ~/dbs/bar $ ./mongod --dbpath ~/dbs/foo --replSet foo $ ./mongod --dbpath ~/dbs/bar --port 27019 --replSet bar --oplogSize 1
server2
$ mkdir -p ~/dbs/foo $ ./mongod --dbpath ~/dbs/foo --replSet foo
server3
$ mkdir -p ~/dbs/foo ~/dbs/bar $ ./mongod --dbpath ~/dbs/foo --port 27019 --replSet foo --oplogSize 1 $ ./mongod --dbpath ~/dbs/bar --replSet bar
server4
$ mkdir -p ~/dbs/bar $ ./mongod --dbpath ~/dbs/bar --replSet bar
Note that arbiters have an oplog size of 1. By default, oplog size is ~5% of your hard disk, but arbiters don’t need to hold any data so that’s a huge waste of space.
Putting together the replica sets
Now, we’ll start up our two replica sets. Start the mongo shell and type:
> db = connect("server1:27017/admin") connecting to: server1:27017 admin > rs.initiate({"_id" : "foo", "members" : [ ... {"_id" : 0, "host" : "server1:27017"}, ... {"_id" : 1, "host" : "server2:27017"}, ... {"_id" : 2, "host" : "server3:27019", arbiterOnly : true}]}) { "info" : "Config now saved locally. Should come online in about a minute.", "ok" : 1 } > db = connect("server3:27017/admin") connecting to: server3:27017 admin > rs.initiate({"_id" : "bar", "members" : [ ... {"_id" : 0, "host" : "server3:27017"}, ... {"_id" : 1, "host" : "server4:27017"}, ... {"_id" : 2, "host" : "server1:27019", arbiterOnly : true}]}) { "info" : "Config now saved locally. Should come online in about a minute.", "ok" : 1 }
Okay, now we have two replica set running. Let’s create a cluster.
Setting up Sharding
Since we’re trying to set up a system with no single points of failure, we’ll use three configuration servers. We can have as many mongos processes as we want (one on each appserver is recommended), but we’ll start with one.
server2
$ mkdir ~/dbs/config $ ./mongod --dbpath ~/dbs/config --port 20000
server3
$ mkdir ~/dbs/config $ ./mongod --dbpath ~/dbs/config --port 20000
server4
$ mkdir ~/dbs/config $ ./mongod --dbpath ~/dbs/config --port 20000 $ ./mongos --configdb server2:20000,server3:20000,server4:20000 --port 30000
Now we’ll add our replica sets to the cluster. Connect to the mongos and and run the addshard command:
> mongos = connect("server4:30000/admin") connecting to: server4:30000 admin > mongos.runCommand({addshard : "foo/server1:27017,server2:27017"}) { "shardAdded" : "foo", "ok" : 1 } > mongos.runCommand({addshard : "bar/server3:27017,server4:27017"}) { "shardAdded" : "bar", "ok" : 1 }
Edit: you must list all of the non-arbiter hosts in the set for now. This is very lame, because given one host, mongos really should be able to figure out everyone in the set, but for now you have to list them.
Tada! As you can see, you end up with one “foo” shard and one “bar” shard. (I actually added that functionality on Friday, so you’ll have to download a nightly to get the nice names. If you’re using an older version, your shards will have the thrilling names “shard0000” and “shard0001”.)
Now you can connect to “server4:30000” in your application and use it just like a “normal” mongod. If you want to add more mongos processes, just start them up with the same configdb parameter used above.