If it quacks like a RDBMS…

It might be a turtle duck.
MongoDB feels a lot like a relational database: you can think of documents as rows, do ad hoc queries, and create indexes on fields. There are, however, a ton of differences due to the data model, scalability considerations, and MongoDB’s youth. This can lead to some not-so-pleasant surprises for users. We (the developers) try to document the differences, but there are a few often-overlooked assumptions:

MongoDB assumes that you have a 64-bit machine.

You are limited to ~2GB of data on a 32-bit machine. This is annoying for a lot of people who develop on 32-bit machines. There’ll be a solution for this at some point, but it’s not high on our priority list because people don’t run 32-bit servers in production. (Okay, on rare occasions they do… but MongoDB is only 2 years old, give it a few more releases, we’ll support it eventually!) Speaking of things we’ll support eventually…

MongoDB assumes that you’re using a little-endian system.

Honestly, I assume this, too. When I hear about developers using PPC and Sparc, I picture a “Primitive Computing” diorama at the Natural History Museum.

On the plus side, all of the drivers work on big-endian systems, so you can run the database remotely and still do development on your old-school machine.

MongoDB assumes that you have more than one server.

Again, this is one of those things that’s a “duh” for production but bites people in the ass in development. MongoDB developers have worked super hard on replication, but that only helps if you have more than one server. (Single server durability is in the works for this fall.)

MongoDB assumes you want fast/unsafe, but lets you do slow/safe.

This design decision has turned out to be one of the most controversial we’ve made and has caused the most criticism. We try to make it clear in the documentation, but some people never notice that there’s a “safe” option for writes (that defaults to false), and then get very pissed when something wasn’t written.

MongoDB developers assume you’ll complain if something goes wrong.

This isn’t about the database per se, but the core developers are available on IRC, the mailing list, the bug tracker, and Twitter. Most of us subscribe to Google Alerts, Google Blog Search, the #mongodb hashtag, and so on. We try to make sure everyone gets an answer to their question and we’ll often fix bugs within a few hours of them being reported.

So, hopefully this will save some people some pain.

36 thoughts on “If it quacks like a RDBMS…

  1. Psh, IBM. Practically synonymous with “mainframe” 🙂

    Seriously, though, there are some nice, modern, big-endian servers.

    Like

  2. A “safe” write can do a number of things: at the most basic level, it checks with the database to make sure that it got the write. You can also ask the database to do an fsync or replicate the write to N slaves before returning. An unsafe write doesn’t wait for a database response (aside from the TCP ack).

    On my MacBook Air, I tried inserting 100,000 trivial documents with the PHP driver (each one contains a string, a date, some binary data, and an integer). In unsafe mode, it took ~3 seconds. In safe mode, it took ~30 seconds.

    Like

    1.     I just did a test on safe vs unsafe write with the Java driver (Mongo 1.6.5 – currently used by the application doing unsafe writes). I have a replica set because we definitely need durability. The results are:
      – unsafe: 15000 records inserted in ~3-4 seconds
      – safe: 15000 records in ~50 min (yes it is correct: 50 minutes) (after each insert I perform a db.getLastError(2, 1000, true))

         The tests where made on a Dell Studio XPS with 64 bit  core i7 processor and 12Mb of memory which I don’t think it’s slower than an MacBook Air.

         Not even close to MySQL (or any other RDBMS I ever worked with). Is it the Java driver a lot slower than the PHP driver? Are newer versions of Mongo faster?

         I need to do a getLastError after each insert as we can’t afford to
      loose data if the master machine crashes and the data is not safely
      written on the replica too.

      Like

      1. I am not too familiar with the Java driver (you might want to post this on the user list), but I know that you’re supposed to just use WriteConcern, not call getLastError yourself.  Also, remember that getLastError(2) is making sure that a slave has the write, too, which, AFAIK, is not something a relational db usually does (and is probably what’s taking up most of the time). 

        To compare MongoDB with something like InnoDB, it would be more fair to turn on –journal and then setting WriteConcern to SAFE.  However, I wouldn’t be surprised if that is a bit slower than InnoDB, MongoDB can’t do anything magical to make writing to disk fast and our journaling is probably slower than InnoDB’s due to MongoDB’s youth.

        Like

  3. I don’t know about the true mainframes, but I think mongo would fit just fine in the IBM AS/400^H^H^HiSeries culture. IBM has been pushing PHP on that platform with good results, and iSeries developers never really embraced SQL, or the relational model, despite the fact that that Dr. Codd developed most of the theory for it at IBM.

    BTW, If mongodb ever got ported to AIX I might be crazy enough to get the thing running on an iSeries.

    Like

  4. It’s mostly defined in the language docs, as every language implements it slightly differently (e.g., PHP docs, Python docs, and Java docs).

    http://www.mongodb.org/display/DOCS/Last+Error+Commands is the main page describing the “safe” functionality (which is too hard to find). Maybe it would be clearer to mention this on the inserts, updates, and removes pages (those are the three types of writes) and link to the getLastError page? Feedback welcome.

    Like

  5. Sounds like some dangerous assumptions if someone wanted to use this in a production application. I don’t see how someone could build a data-driven system using a storage method that can’t guarantee that a transaction has been written.

    I think assuming 64-bit, multi-server and little endian environments is a little dangerous as well.

    Like

  6. i just happen to see these yesterday when i was downloading MongoDB… i have a 32bit machine.. and i was disappointed when i saw that limitation.

    Like

  7. >I think assuming 64-bit, multi-server and little endian environments is a little dangerous as well.

    The little endian part doesn’t make sense at all. If your hardware is big endian, mongo won’t run at all, so you just will never deploy mongo. Now, a lot of production hardware is running 32-bit windows OSes (my prod mongo server is). Also, data centers are filled with 32 bit hardware. However, time will correct that problem, and its an issue you will be well aware of if you do your due diligence.

    Like

  8. We’re generally a bit hesitant to add new methods to the API. What you should do, if you want this, is create a feature request at http://jira.mongodb.org/ (bugtracker) and drum up a bunch of votes for it (your comment got three likes, so it seems like there is some support). Probably put it in the Ruby driver project, since that language has the most users. We generally implement things users make noise for (see: single server durability).

    Like

  9. Why does Mongo sometimes have to repair the DB after a machine crash where I did no writes at all during the time the Mongo instance was up? This is disconcerting.

    Like

  10. The “do a repair” message just pops up whenever you crash the database. If you didn’t do any writes in the previous session, you can ignore it, delete the mongod.lock file in the data directory, and start the database.

    Like

  11. Unsafe means “fire-and-forget,” I think the largest, most obvious risk is that if the server loses power right after you send the insert/upsert, before it’s had a chance to write it to disk, you lose that data. By default, mongo flushes stuff every 60(?) seconds, so in “unsafe” mode you could loose 1 minute of data. The other potential risk is that the client sends some invalid command/update/insert to the server and assumes that it worked.

    Safe just means, after every command/update/insert, check the server for an error- many of the drivers do this behind the scenes, then throw exceptions if an error is returned…

    Like

  12. There’s quite a difference between ~3 and 30 seconds. It seems like operating in safe mode isn’t an option if your after performance.

    Like

  13. I’m coming from a MS SQL Server background and am evaluating mongodb for my company. With the objective of determining if it can be used to store data that is in alot of our SQL databases.

    If writes in safe mode take the same time as mySQL, I’ll stick with MS SQL thank you! People need more confidence of using unsafe mode, perhaps you or your co-workers could write a blog post/article on unsafe mode, especially:

    – how mongodb mitigates the possibility for data loss when in unsafe mode
    – should we call getlasterror() on every write, if so, what is the overhead in terms of performance
    – some examples of ‘something’s wrong’ (see your reply below “Zero. Writes should only fail if something’s wrong. “) and how to recover the data

    Anything else that you think will help.

    thanks

    Like

  14. Writing a record with any database will always take X amount of time, MongoDB can’t do magic. A lot of users switch to MongoDB because of the ease of development and its scaling ability, not just speed.

    Writes fail on MongoDB for pretty much for the same reasons writes fail on relational databases: the database is down, the network is unaccessible, or you’re writing something invalid. A post covering this stuff is a good idea.

    Like

    1. Hi. Whenever I find myself doing bulk inserts or update, I consider 2 methods:
      1) Make a REALLY HUGE insert. In Postgres at least, and probably mysql too, you can stack any number of rows in one fat insert to chuck down the pipe.
      2) Create a temporary table and insert or update from that in one operation (preferable). A temporary table will reside in client-side memory, or may be replicated asynchronously. I’m not too sure on actual implementation and constraints, but it is extremely fast and non-blocking in my experience, since it is designed for such operations.

      Of course, I’ve wrapped this into a few neat functions in Ruby with ActiveRecord-compliant API, so that, any bulk update becomes super-fast and flawless, both in the DB and when coding.

      I avoid transactions like the plague, saving it for those odd writes that REALLY need it, as transactions may lead to surprises, as you Mongodevelopers know 🙂 I love my transactions though, whenever I need’em, unless I can design around them. I/O-blocking is not funny when it bites you.

      Indexes go a long way to make things faster also.

      I find these solutions make operations taking many seconds or minutes go down to a few seconds or even milliseconds (an order of magnitude or two).

      Certainly, doing inserts and updates from a loop is a unnecessary waste of I/O and resources, and any application doing that is in need of some optimization. Am not sure if MongoDB have similar ways to perform bulk operations, but it would be a must for doing bulk updates extremely fast. I’m guessing the “unsafe” option is actually performing a similar trick, just without the “bulk” / almost transactional properties of a temporary (in memory) table. Ie, if one insert from temporary table fails, all the inserts fail. Updates are another beast unfortunately, since it will not fail on NULL rows or check much beyond connectivity and columns, which is why I’m interested in MongoDB also.

      I’ve yet to find MongoDB or anything “nosql” mature enough, but am following the development and am hoping for maturity within a few years. The project do seem very interesting, but am not sure if I like the programmatically interface. I kinda like the idea of a descriptive language like SQL and leave most of the job to the DB..

      Like

  15. Pingback: ehcache.net

Leave a comment