Sleepy.Mongoose: A MongoDB HTTP Interface

The first half of the MongoDB book is due this week, so I wrote a REST interface for Mongo (I’m a prolific procrastinator).  Anyway, it’s called Sleepy.Mongoose and it’s available at https://github.com/10gen-labs/sleepy.mongoose.

Installing Sleepy.Mongoose

  1. Install MongoDB.
  2. Install the Python driver:
    $ easy_install pymongo
  3. Download Sleepy.Mongoose.
  4. From the mongoose directory, run:
    $ python httpd.py

You’ll see something that looks like:

=================================
|      MongoDB REST Server      |
=================================

listening for connections on http://localhost:27080

Using Sleepy.Mongoose

First, we’re just going to ping Sleepy.Mongoose to make sure it’s awake. You can use curl:

$ curl 'http://localhost:27080/_hello'

and it’ll send back a Star Wars quote.

To really use the interface, we need to connect to a database server. To do this, we post our database server address to the URI “/_connect” (all actions start with an underscore):

$ curl --data server=localhost:27017 'http://localhost:27080/_connect'

This connects to the database running at localhost:27017.

Now let’s insert something into a collection.

$ curl --data 'docs=[{"x":1}]' 'http://localhost:27080/foo/bar/_insert'

This will insert the document {“x” : 1} into the foo database’s bar collection. If we open up the JavaScript shell (mongo), we can see the document we just added:

> use foo
> db.bar.find()
{ "_id" : ObjectId("4b7edc9a1d41c8137e000000"), "x" : 1 }

But why bother opening the shell when we can query with curl?

$ curl -X GET 'http://localhost:27080/foo/bar/_find'
{"ok": 1, "results": [{"x": 1, "_id": {"$oid": "4b7edc9a1d41c8137e000000"}}], "id": 0}

Note that queries are GET requests, whereas the other requests up to this point have been posts (well, the _hello can be either).

A query returns three fields:

  • “ok”, which will be 1 if the query succeeded, 0 otherwise
  • “results” which is an array of documents from the db
  • “id” which is an identifier for that particular query

In this case “id” is irrelevant as we only have one document in the collection but if we had a bunch, we could use the id to get more results (_find only returns the first 15 matching documents by default, although it’s configurable). This will probably be clearer with an example, so let’s add some more documents to see how this works:

$ curl --data 'docs=[{"x":2},{"x":3}]' 'http://localhost:27080/foo/bar/_insert'
{"ok" : 1}

Now we have three documents. Let’s do a query and ask for it to return one result at a time:

$ curl -X GET 'http://localhost:27080/foo/bar/_find?batch_size=1'
{"ok": 1, "results": [{"x": 1, "_id": {"$oid": "4b7edc9a1d41c8137e000000"}}], "id": 1}

The only difference between this query and the one above is the “?batch_size=1” which means “send one document back.” Notice that the cursor id is 1 now, too (not 0). To get the next result, we can do:

$ curl -X GET 'http://localhost:27080/foo/bar/_more?id=1&batch_size=1'
{"ok": 1, "results": [{"x": 2, "_id": {"$oid": "4b7ee0731d41c8137e000001"}}], "id": 1}
$ curl -X GET 'http://localhost:27080/foo/bar/_more?id=1&batch_size=1'
{"ok": 1, "results": [{"x": 3, "_id": {"$oid": "4b7ee0731d41c8137e000002"}}], "id": 1}

Now let’s remove a document:

$ curl --data 'criteria={"x":2}' 'http://localhost:27080/foo/bar/_remove'
{"ok" : 1}

Now if we do a _find, it only returns two documents:

$ curl -X GET 'http://localhost:27080/foo/bar/_find'
{"ok": 1, "results": [{"x": 1, "_id": {"$oid": "4b7edc9a1d41c8137e000000"}}, {"x": 3, "_id": {"$oid": "4b7ee0731d41c8137e000002"}}], "id": 2}

And finally, updates:

$ curl --data 'criteria={"x":1}&newobj={"$inc":{"x":1}}' 'http://localhost:27080/foo/bar/_update'

Let’s do a _find to see the updated object, this time using criteria: {“x”:2}. To put this in a URL, we need to escape the ‘{‘ and ‘}’ characters. You can do this by copy-pasting it into any javascript interpreter (Rhino, Spidermonkey, mongo, Firebug, Chome’s dev tools) as follows:

> escape('{"x":2}')
%7B%22x%22%3A2%7D

And now we can use that in our URL:

$ curl -X GET 'http://localhost:27080/foo/bar/_find?criteria=%7B%22x%22%3A2%7D'
{"ok": 1, "results": [{"x": 2, "_id": {"$oid": "4b7edc9a1d41c8137e000000"}}], "id": 0}

If you’re looking to go beyond the basic CRUD, there’s more documentation in the wiki.

This code is super-alpha. Comments, questions, suggestions, patches, and forks are all welcome.

Note: Sleepy.Mongoose is an offshoot of something I’m actually supposed to be working on: a JavaScript API we’re going to use to make an awesome sharding tool.  Administrating your cluster will be a point-and-click interface.  You’ll be able to see how everything is doing, drag n’ drop chunks, visually split collections… it’s going to be so cool.

CouchDB vs. MongoDB Benchmark

Edit (9/1/10): this benchmark is old, silly, and should probably be ignored in favor of more recent and representative ones. I don’t want to take it down for historical purposes, but seriously people, it was never a good benchmark, it’s over a year old at this point, and both databases have changed a lot.

Edit (12/6/09): this is the #1 Google result for “mongodb benchmark”, so I figure I’ll do some community service: if you’re interested in benchmarks, you might want to look at the 3rd party ones listed on the mongodb.org website.


Felix Geisendörfer did a benchmark in PHP that was super-easy for me to port into MongoDB. You can see his post on his blog.

And now… comparing his results for CouchDB with mine for MongoDB’s (I did the graph in Open Office, which is why the quality sucks):

As you can see, MongoDB does, uh, slightly better.  Here are the numbers:

# of Inserts Couch Total Time (sec) Couch Time/Doc (ms) Mongo Total Time (sec) Mongo Time/Doc (ms)
1 .0015 1.46 .0005 .5
2 .0015 .75 .0004 .2096
3 .0017 .56 .0005 .1604
4 .0017 .44 .0005 .1190
5 .0018 .36 .0005 .1060
6 .0019 .32 .0006 .0931
7 .0021 .3 .0006 .0847
8 .0022 .27 .0007 .0789
9 .0023 .25 .0007 .0734
10 .0025 .25 .0007 .0721
50 .0072 .14 .0024 .0476
100 .0136 .14 .0044 .0442
500 .0687 .14 .0253 .0505
1000 .1361 .14 .0372 .0372
2500 .4686 .19 .0278 .0111
5000 .9165 .18 .0488 .0098
7500 1.5116 .2 .0835 .0111
10000 2.3111 .23 .1065 .0107
25000 6.8684 .27 .2711 .0108
50000 15.8227 .32 .5430 .0109
100000 35.3071 .35 .1.7697 .0177
250000 104.0009 .42 6.4533 .0258
500000 230.6021 .46 11.7684 .0235
750000 352.7959 .47 17.0473 .0227
1000000 487.3284 .49 18.4376 .0184

Please let me know if I made any mistakes, all the values were hand-copied.

I ran these tests using the PHP driver on Ubuntu 9.04 on my MacBook Pro.  You can see the test script I forked on Github.

A little analysis: Both DBs start with some overhead, but by 1000 inserts CouchDB seems to be chugging along nicely.  MongoDB takes slightly longer to hit its groove, hitting its peak around 10000.  They both slow a little near the end, as MongoDB starts spending most of its time allocating files and, although I know almost nothing about CouchDB’s structure, I’d guess it’s doing something similar.