There must be 50 ways to start your Mongo

This blog post covers four major ones:

Feel free to jump to the ones that interest you (for instance, sharding).

Just start up a database, Ace

Starting up a vanilla MongoDB instance is super easy, it just needs a port it can listen on and a directory where it can save your info. By default, Mongo listens on port 27017, which should work fine (it’s not a very commonly used port). We’ll create a new directory for database files:

$ mkdir -p ~/dbs/mydb # -p creates parent directories if they don't exist

And then start up our database:

$ cd 
$ bin/mongod --dbpath ~/dbs/mydb

…and you’ll see a bunch of output:

$ bin/mongod --dbpath ~/dbs/mydb
Fri Apr 23 11:59:07 Mongo DB : starting : pid = 9831 port = 27017 dbpath = /data/db/ master = 0 slave = 0  32-bit 

** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data
**       see http://blog.mongodb.org/post/137788967/32-bit-limitations for more

Fri Apr 23 11:59:07 db version v1.5.1-pre-, pdfile version 4.5
Fri Apr 23 11:59:07 git version: f86d93fd949777d5fbe00bf9784ec0947d6e75b9
Fri Apr 23 11:59:07 sys info: Linux ubuntu 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC 2009 i686 BOOST_LIB_VERSION=1_38
Fri Apr 23 11:59:07 waiting for connections on port 27017
Fri Apr 23 11:59:07 web admin interface listening on port 28017

Now, Mongo will “freeze” like this, which confuses some people. Don’t worry, it’s just waiting for requests. You’re all set to go.

Set up a slave, Maeve

As we’re running master and slave on the same machine, they’ll need separate ports. We’ll use port 10000 for the master and 20000 for the slave. We also need separate directories for data, so we’ll create those:

$ mkdir ~/dbs/master ~/dbs/slave

Now we start the master database:

$ bin/mongod --master --port 10000 --dbpath ~/dbs/master

And then the slave, in a different terminal:

$ bin/mongod --slave --port 20000 --dbpath ~/dbs/slave --source localhost:10000

The “source” option specifies where the master is that the slave should replicate data from.

Now, if we want to add another slave, we need to go though the herculean effort of choosing a port and creating a new directory:

$ mkdir ~/dbs/slave2
$ bin/mongod --slave --port 20001 --dbpath ~/dbs/slave2 --source localhost:10000

Tada! Two slaves, one master. For more information on master-slave, see the core docs on it and my previous post.

This example puts the master server and slave server on the same machine, but people generally have a master on one machine and a slave on another. It works fine to put them on a single machine, it just defeats the point of a bit.

Get auto-failover… Rover

Okay, so there aren’t many people named Rover, but you come up with a rhyme for “auto-failover” (I tried “replica”, too).

Replica pairs are cool because it’s like master-slave, but you get automatic failover: if the master becomes unavailable, the slave will become a master. So, it’s basically the same as master-slave, but the servers know about each other and there is, optionally, an arbiter server that doesn’t do anything other than resolve “disputes” over who is master.

When could the arbiter come it in handy? Suppose the master’s network cable is pulled. The server still thinks it’s master, but no one else knows it’s there. The slave becomes master and the rest of the world goes along happily. When the master’s network cable gets plugged back in, now both servers think they’re master! In this case, the arbiter steps in and gently informs the master who’s behind in the times that he is now a slave.

You don’t have to set up an arbiter, but we will since it’s good practice:

$ mkdir ~/dbs/arbiter ~/dbs/replica1 ~/dbs/replica2
$ bin/mongod --port 50000 --dbpath ~/dbs/arbiter

Now, in separate terminals, you start each of the replicas:

$ bin/mongod --port 60000 --dbpath ~/dbs/replica1 --pairwith localhost:60001 --arbiter localhost:50000

And then the other one:

$ bin/mongod --port 60001 --dbpath ~/dbs/replica2 --pairwith localhost:60000 --arbiter localhost:50000

After they’ve been running for a bit, try killing (Ctrl-C) one, then restarting it, then killing the other one, back and forth.

For more information on replica pairs, see the core docs.

What’s this? Replica pairs are evolving! *voop* *voop* *voop*

Replica pairs have evolved into… replica sets! Well, okay, they haven’t yet, but they’re coming soon. Then you’ll be able to have an arbitrary number of servers in the auto-failover ring.

Make a new cluster, Buster

For the grand finale, sharding. Sharding is how you distribute data with Mongo. If you don’t know what sharding is, check out my previous post explaining how it works.

First of all, download the latest 1.5.x nightly build from the website. Sharding is changing rapidly, you want the latest and greatest here, not stable.

We’re going to be creating a three-node cluster. So, same as ever, create your database directories. We want one directory for the cluster configuration and three directories for our shards (nodes):

$ mkdir ~/dbs/config ~/dbs/shard1 ~/dbs/shard2 ~/dbs/shard3

The config server keeps track of what’s where, so we need to start that up first:

$ bin/mongod --configsvr --port 70000 --dbpath ~/dbs/config

The mongos is just a request router that runs on top of the config server. It doesn’t even need a data directory, we just tell it where to look for the configuration:

$ bin/mongos --configdb localhost:70000

Note the “s”: the router is called “mongos”, not “mongod”. We haven’t specified a port for it, so it’ll listen on the default port (27017).

Okay! Now, we need to set up our shards. Start these each up in separate terminals:

$ bin/mongod --shardsvr --port 71000 --dbpath ~/dbs/shard1
$ bin/mongod --shardsvr --port 71001 --dbpath ~/dbs/shard2
$ bin/mongod --shardsvr --port 71002 --dbpath ~/dbs/shard3

mongos doesn’t actually know about the shards yet, you need to tell it to add these servers to the cluster. The easiest way is to fire up a mongo shell:

$ bin/mongo
MongoDB shell version: 1.5.1-pre-
url: test
connecting to: test
type "help" for help
>

Now, we add each shard to the cluster:

> db = connect("localhost:70000/admin");
connecting to: localhost:70000
admin
> db.runCommand({addshard : "localhost:71000", allowLocal : true})
{
    "added" : "localhost:71000",
    "ok" : 1
}
> db.runCommand({addshard : "localhost:71001", allowLocal : true})
{
    "added" : "localhost:71001",
    "ok" : 1
}
> db.runCommand({addshard : "localhost:71002", allowLocal : true})
{
    "added" : "localhost:71002",
    "ok" : 1
}
>

mongos expects shards to be on remote machines and by default won’t allow you to add local shards (i.e., shards with “localhost” in the name). Since we’re just playing around, we specify “allowLocal” to override this behavior. (Note that “addshard” IS NOT camel-case, and allowLocal IS camel-case, because we’re consistent like that.)

Congratulations, you’re running a distributed database!

What do you do now? Well, use it just like a normal database! Connect to “localhost:27017” and proceed normally (or, as normally as possible… please report any bugs to our bugtracker!). Try the tutorial (since you’ve already got the shell open) or connect through your favorite driver and play around.

Connecting to mongos should be an identical experience to connecting to a normal Mongo server. Behind the scenes, it splits up your requests/data across the shards so you can concentrate on making your application, not scaling it.


P.S. Obviously, this example setup is full of single points of failure, but that’s completely avoidable. I can go over how to set up distributed MongoDB with zero single points of failure in a follow-up post, if people are interested.

Once and Future Presentations

On Monday, I gave a presentation on MongoDB to the San Francisco MySQL user group.  It was a lot of fun, you can watch the recording on ustream:

http://www.ustream.tv/flash/live/1/3708550Streaming Video by Ustream.TV

Apparently the audio was buzzy (I haven’t actually listened to it myself yet).

The audience especially enjoyed this slide about MySQL’s current situation:

One of the guys told me that he was scrambling to take a picture of it but I went to the next slide too fast, so here it is in all it’s glory.

Thanks to everyone at the MySQL meetup for being so awesome, I had a great time!

Future Talks

April 30th: I’ll be in California again, giving a talk called “Map/reduce, geospatial indexing, and other cool features” at MongoSF

May 18-21: I’ll be in Chicago at Tek·X. I’ll be doing a regular session, “MongoDB for Mobile Applications“, and a tutorial on switching apps from MySQL to MongoDB (assuming no knowledge of MongoDB).

Sharding with the Fishes

Sharding is the not-so-revolutionary way that MongoDB scales writes (it’s very similar to techniques described in the Big Table paper and by PNUTS) but many people are unfamiliar with what it is and how it works.  If you’ve seen a talk on MongoDB or looked at the website, you’ve probably seen a diagram of sharding that looks like this:

…which probably looks a bit like “I hope I don’t have to understand that.”

However, it’s actually quite simple: it’s exactly how the Mafia is structured (or, at least, how The Godfather taught me it was):

  • The shards are the peons: someone tells them to do something (e.g., a query or an insert), they do it and report back.
  • The mongos is the godfather. It knows what data each peon has and gives them orders.  It’s basically a router for the requests.
  • The config server is the consigliere. It knows where all of the data is at any given time and lets the boss know so that he can focus on bossing. The consigliere keeps the organization running smoothly.

For a concrete example, let’s say we have a collection of blog posts.  You choose a “shard key,” which is the value Mongo will use to split the data across shards.  We’ll choose “date” as the shard key, which means it will be split up based on values in the “date” field.  If we have four shards, they might contain data something like:

  • shard #1: beginning of time up to June 2009
  • shard #2: July 2009 to November 2009
  • shard #3: December 2009 to February 2010
  • shard #4: March 2010 through the end of time

Now that we’ve got our peons set up, let’s ask the godfather for some favors.

Queries

Say you query for all documents created from the beginning of this year (January 1st, 2010) up to the present.  Here’s what happens:

  1. You (the client) send the query to the godfather.
  2. The godfather knows which shards contain the data you’re looking for, so he sends the query to shards #3 and #4.
  3. shard #3 and shard #4 execute the query and return the results to the godfather.
  4. The godfather puts together the results he’s received and sends them back to the client.

Note how all of the sharding stuff is done a layer away from the client, so your application doesn’t have to be sharding-aware, it can just query the godfather as though it were a normal mongod instance.

Inserts

Suppose you want to insert a new document with today’s date.  Here’s the sequence of events:

  1. You send the document to the godfather.
  2. It sees today’s date and routes it to shard #4.
  3. shard #4 inserts the document.

Again, identical to a single-server setup from the client’s point of view.

So where’s the consigliere?

Suppose you start getting millions of documents inserted with the date September 2009.  Shard #2 begins to swell up like a bloated corpse.  The consigliere will notice this and, when shard #2 gets too big it will split the data on shard #2 and migrate some of it to other, emptier shards.  Then it will let the godfather know that now shard #2 contains July 2009-September 15th 2009 and shard #3 contains September 16th 2009-February 2010.

The consigliere is also responsibly for figuring out what to do when you add a new shard to the cluster.  It figures out if it should keep the new shard in reserve or migrate some data to it right away.  Basically, it’s the brains of the operation.

Whenever the consigliere moves around data, it lets the godfather know what the final configuration is so that the godfather can continue routing requests correctly.

Leave the gun.  Take the cannolis.

This scaling deliciousness is, unfortunately, still very alpha.  You can help us out by telling us where our documentation sucks (specifics are better than “it sucks”), testing it out on your machine, and voting for features you’d like to see.

MapReduce – The Fanfiction

MapReduce is really cool, useful, and powerful, but a lot of people find it hard to wrap their heads around. This post is a fairly silly, non-technical explanation using Star Trek.

The Enterprise found a new planet, as it tends to do.

Kirk wanted to beam down immediately and start surveying the planet but Spock told him to wait a moment. “It usually takes us one hour to survey a planet, correct Captain?  In less than 5 minutes, I can calculate whether the chance of encountering friendly alien females outweighs the risk of attack by brain-eating monsters.”

“Interesting idea, Spock,” said Kirk.  “Go ahead.”

The Data

“Logically,” thought Spock, “if we can survey a whole planet in one hour, we can survey 1/16th of a planet in 3.75 minutes.”  Spock divided the planet into 16 equal-size pieces and summoned 16 red shirts.

“You’ll be beamed down to the surface of the planet with this special data collection device called an ’emitter.’  If you see a brain-eating monster, you press the “brain-eating monster” button on your emitter.  If you see an attractive female alien, you press the “hot alien chick” button.  Press either, neither, or both buttons, as your situation requires.”

The Map Step

The 16 red shirts were beamed down to the 16 parts of the planet.  As they found things, they would press the buttons on their emitter.

Back on the Enterprise, Spock started getting lots of data pairs that looked like:

| type                 | location |
|----------------------|----------|
| Brain-eating monster | 2        |
| Hot alien chick      | 7        |
| Brain-eating monster | 14       |
| Brain-eating monster | 7        |

The Reduce Step

“Computer,” Spock said.  “Initialize a counter to 0 for each new type you get.  Then, for every subsequent data pair with the same type, increment that counter.”

“I dinnae understand,” said Scotty.  “What’s that, then?”

“I basically told the computer to initialize two variables, ‘Brain-eating monster’ and ‘Hot alien chick’, setting them both to zero.  Every time the computer gets a ‘Brain-eating monster’ emit, it increments the ‘Brain-eating monster’ variable.  Every time it gets a ‘Hot alien chick’ emit, it increments the ‘Hot alien chick’ variable.

“Ah, I see,” said Scotty.  “But don’t you lose the location information?”

“Yes,” replied Spock.  “But I don’t actually care about location for this readout.  If I wanted the location, I could give the computer a slightly more complicated algorithm, but right now I just want the count.”

The Result

After 3.75 minutes, Spock beamed up the red shirts who were still alive and presented to Kirk: “There are brain-eating monsters on 7/8ths of the planet, Captain.  1/16 of the planet has hot alien chicks.”

“Excellent work Spock,” Kirk says.  “Let’s boldly go somewhere else.”

And so they did.

Captain’s log, star date 1419.7 (aka a summary of what we did)

  1. Goal – To generate a report on a planet.
  2. Data – 16 pieces of land with various attributes. Each piece of land could be represented by a JSON object such as:
    {
        "location" : 5
        "contains" : ["Brain-eating monsters", "rocks", "poison gas"]
    }
  3. Map – Send attributes for each piece of data back to the processor. In JSON, each emit would look something like:
    {
        "Brain-eating monsters" : 5
    }
  4. Reduce – Sum up the data, grouping by type
  5. Result – How much of each attribute is on the planet

Further reading: Kyle Banker has an excellent (and more technical) explanation of MapReduce.

Bug Reporting: A How-To

This type of bug report drives me nuts:

You have a critical bug! This crashes Java!

for (int i=0; i<10; i++) {
     cursor.next();
}

(I’ve never gotten this exact report and I’m not picking on anyone, it’s a composite.)

This doesn’t crash for me.  It doesn’t even compile for me because the variable “cursor” isn’t defined. If you’re going to use a variable (a function, a framework, etc.) in a code snippet, you have to define it. Let’s try again:

Mongo m = new Mongo();
DB db = m.getDB("bar");
DBCollection coll = db.getCollection("foo");
DBCursor cursor = coll.find();

for (int i=0; i<10; i++) {
     cursor.next();
}

Better! But this is probably crashing because of something in your database.  Unless it crashes regardless of dataset, you need to send me the data that makes it crash.  The basic rule is:

The faster I can reproduce your bug, the faster I can fix it.

Some other tips for submitting bug reports:

  1. Make sure to include information about your environment.  The more the merrier.
  2. If I ask for log messages, please send me the entire log.  If it’s been running for days and the log’s a zillion lines long, send everything from around the time the error happened (before, during, after).  Please, please, please don’t skim the logs, extract a single suspicious-looking line, and send it to me.  I promise that I’m not going to be mad about having to wade through a couple hundred (or thousand) lines to find what I’m looking for.  I would rather quickly skim a bunch of extra info than pry the logs, line by line, from your clutches.
  3. If you are running on a non-traditional setup (e.g., a Linux kernel you modified yourself on a mainframe from 1972), it would be super helpful if you can give me access to your machine.  If your bug is platform-specific to ENIAC, it’s doubtful I’m going to be able to figure out what’s going wrong on my MacBook Air.

And, of course, flatter the developer’s ego when they’ve fixed the bug.  Not applicable for me, of course, but other developers like it when you give them a little “Thanks, you’re awesome!” biscuit when they’ve fixed your bug.

To my users: thank you to everyone who has ever filed a bug report.  I’m really, really happy that you’re using my software and that you care enough about it to submit a bug, instead of just giving up. Seriously, thank you. I have to give a shout-out to Perl developers in particular, here.  More than half the time, people reporting bugs in the MongoDB Perl driver actually include a patch in the bug report that fixes it!  I love you guys.

Sleepy.Mongoose: A MongoDB HTTP Interface

The first half of the MongoDB book is due this week, so I wrote a REST interface for Mongo (I’m a prolific procrastinator).  Anyway, it’s called Sleepy.Mongoose and it’s available at https://github.com/10gen-labs/sleepy.mongoose.

Installing Sleepy.Mongoose

  1. Install MongoDB.
  2. Install the Python driver:
    $ easy_install pymongo
  3. Download Sleepy.Mongoose.
  4. From the mongoose directory, run:
    $ python httpd.py

You’ll see something that looks like:

=================================
|      MongoDB REST Server      |
=================================

listening for connections on http://localhost:27080

Using Sleepy.Mongoose

First, we’re just going to ping Sleepy.Mongoose to make sure it’s awake. You can use curl:

$ curl 'http://localhost:27080/_hello'

and it’ll send back a Star Wars quote.

To really use the interface, we need to connect to a database server. To do this, we post our database server address to the URI “/_connect” (all actions start with an underscore):

$ curl --data server=localhost:27017 'http://localhost:27080/_connect'

This connects to the database running at localhost:27017.

Now let’s insert something into a collection.

$ curl --data 'docs=[{"x":1}]' 'http://localhost:27080/foo/bar/_insert'

This will insert the document {“x” : 1} into the foo database’s bar collection. If we open up the JavaScript shell (mongo), we can see the document we just added:

> use foo
> db.bar.find()
{ "_id" : ObjectId("4b7edc9a1d41c8137e000000"), "x" : 1 }

But why bother opening the shell when we can query with curl?

$ curl -X GET 'http://localhost:27080/foo/bar/_find'
{"ok": 1, "results": [{"x": 1, "_id": {"$oid": "4b7edc9a1d41c8137e000000"}}], "id": 0}

Note that queries are GET requests, whereas the other requests up to this point have been posts (well, the _hello can be either).

A query returns three fields:

  • “ok”, which will be 1 if the query succeeded, 0 otherwise
  • “results” which is an array of documents from the db
  • “id” which is an identifier for that particular query

In this case “id” is irrelevant as we only have one document in the collection but if we had a bunch, we could use the id to get more results (_find only returns the first 15 matching documents by default, although it’s configurable). This will probably be clearer with an example, so let’s add some more documents to see how this works:

$ curl --data 'docs=[{"x":2},{"x":3}]' 'http://localhost:27080/foo/bar/_insert'
{"ok" : 1}

Now we have three documents. Let’s do a query and ask for it to return one result at a time:

$ curl -X GET 'http://localhost:27080/foo/bar/_find?batch_size=1'
{"ok": 1, "results": [{"x": 1, "_id": {"$oid": "4b7edc9a1d41c8137e000000"}}], "id": 1}

The only difference between this query and the one above is the “?batch_size=1” which means “send one document back.” Notice that the cursor id is 1 now, too (not 0). To get the next result, we can do:

$ curl -X GET 'http://localhost:27080/foo/bar/_more?id=1&batch_size=1'
{"ok": 1, "results": [{"x": 2, "_id": {"$oid": "4b7ee0731d41c8137e000001"}}], "id": 1}
$ curl -X GET 'http://localhost:27080/foo/bar/_more?id=1&batch_size=1'
{"ok": 1, "results": [{"x": 3, "_id": {"$oid": "4b7ee0731d41c8137e000002"}}], "id": 1}

Now let’s remove a document:

$ curl --data 'criteria={"x":2}' 'http://localhost:27080/foo/bar/_remove'
{"ok" : 1}

Now if we do a _find, it only returns two documents:

$ curl -X GET 'http://localhost:27080/foo/bar/_find'
{"ok": 1, "results": [{"x": 1, "_id": {"$oid": "4b7edc9a1d41c8137e000000"}}, {"x": 3, "_id": {"$oid": "4b7ee0731d41c8137e000002"}}], "id": 2}

And finally, updates:

$ curl --data 'criteria={"x":1}&newobj={"$inc":{"x":1}}' 'http://localhost:27080/foo/bar/_update'

Let’s do a _find to see the updated object, this time using criteria: {“x”:2}. To put this in a URL, we need to escape the ‘{‘ and ‘}’ characters. You can do this by copy-pasting it into any javascript interpreter (Rhino, Spidermonkey, mongo, Firebug, Chome’s dev tools) as follows:

> escape('{"x":2}')
%7B%22x%22%3A2%7D

And now we can use that in our URL:

$ curl -X GET 'http://localhost:27080/foo/bar/_find?criteria=%7B%22x%22%3A2%7D'
{"ok": 1, "results": [{"x": 2, "_id": {"$oid": "4b7edc9a1d41c8137e000000"}}], "id": 0}

If you’re looking to go beyond the basic CRUD, there’s more documentation in the wiki.

This code is super-alpha. Comments, questions, suggestions, patches, and forks are all welcome.

Note: Sleepy.Mongoose is an offshoot of something I’m actually supposed to be working on: a JavaScript API we’re going to use to make an awesome sharding tool.  Administrating your cluster will be a point-and-click interface.  You’ll be able to see how everything is doing, drag n’ drop chunks, visually split collections… it’s going to be so cool.

“Introduction to MongoDB” Video

This is the video of the talk I gave last Sunday at the NoSQL Devroom at FOSDEM. It’s about why MongoDB was created, what it’s good at (and a bit about what it’s not good for), the basic syntax for it and how sharding and replication work (it covers a lot of ground).

You can also go to Parleys.com to see the video with my slides next to it (they’re a little tough to see below).

http://www.parleys.com/share/parleysshare2.swf?pageId=1864

St. Clementine’s Day

The night before Valentine’s Day, I got Andrew a crate of clementines (they’re already gone).  Yesterday, the Doghouse Diaries ran:

I came down with a cold on Friday and neither of us wanted to do anything for Valentine’s Day so we ended up playing Legend of Zelda most of it. When we got hungry, we started looking through the cabinets and I saw some dried apricots that reminded me of some chocolates Andrew got me for Christmas.

“For future reference,” I said, “I loved those chocolate-covered apricots.  They were so good.”

“You want to make some now?” he asked.

Now, I was sick and tired and that sounded a lot like work.  But it was so freakin easy.  And awesome.  And delicious.   And come on, how much more romantic can you get?  Here’s how to do it yourself:

  1. Take a lump of semisweet baker’s chocolate about twice the size of a Hershey’s bar
  2. Stick it in a bowl in the microwave for 30 seconds
  3. Mush it up and dump in a bunch of dried fruit/nuts/whatever
  4. Spoon each chocolate-covered item onto a sheet of parchment
  5. Stick the parchment in the fridge

An hour later, we had my favorite chocolates.

Then we watched Star Wars.

My life is awesome.  Thank you for being so wonderful, Andrew.