Why Command Helpers Suck

This is a rant from my role as a driver developer and person who gives support on the mailing list/IRC.

Command helpers are terrible. They confuse users, result in a steeper learning curve, and make MongoDB’s interface seem arbitrary.

The basics: what are command helpers?

Command helpers are wrappers for database commands. Database commands are everything other than CRUD (create, retrieve, update, delete) that you can do with MongoDB. This includes things like dropping a collection, doing a MapReduce, adding a member to a replica set, seeing what arguments you started mongod with, and finding out if the last write operation succeeded. They’re everywhere, if you’ve used MongoDB, you’ve run a database command (even if you weren’t aware of it).

So, what are command helpers? These are wrappers around the raw command, turning something like db.adminCommand({serverStatus:1}) into db.serverStatus(). This makes it slightly quicker to run and look “nicer” than the command. However, there are honey bunches of reasons that they’re a bad idea and should be avoided whenever possible.

Database helpers are unportable

Helpers are extremely unportable. If you know how to run db.serverStatus() in the shell, that’s great, but all you know is how to do it in the shell. If you know how to run the serverStatus command, you know how to get the server status in every language you’ll ever use.

Similarly, each language handles command options differently. Take a command like group: the shell helper chooses one order of options (a single argument “options”, incidentally) and the Python driver chooses another (“key”, “condition”, “initial”, “reduce”, “finalize”) and the PHP driver another (“key”, “initial”, “reduce”, “options”). If you just learn the group command itself, you can execute it in any language you please.

This affects almost everyone using MongoDB, as almost everyone uses at least two languages (JavaScript and something else). I have seen hundreds of questions of the form “How do I run <shell function> using my driver?” If these users knew it was a database command (and knew what a database command was), they wouldn’t have to ask.

Database helpers lock you to a certain API, often an out-of-date one

Suppose the database changes the options for a command. All of the drivers that support helpers for that command are suddenly out-of-date. Conversely, if you have a recent version of a driver and an old version of the database, you can have helpers for features that don’t exist yet or have different options.

An example of old driver/new database: MapReduce’s options changed in version 1.7.4. As far as I know, none of the drivers support the new options, yet.

You can’t support database helpers for everything

Next, there’s just the sheer volume of database commands, which makes it impossible to implement helpers for all of them. Everyone has their favorites: aggregation is important to some people, administration helpers are important to others, etc. If all of them had helpers, not only would there be a ridiculous number of methods polluting the API documentation, but it would leads to tons of compatibility problems between the driver and the database (as mentioned above).

Database helpers conceal what’s going on, giving users less options

Finally, using command helpers keeps people from understanding what’s actually going on, which is pointless and can lead to problems. It’s pointless to conceal the gory details because the details aren’t very gory: all database commands are queries. This means you can deconstruct command helpers as follows (example in PHP):

// the command helper
$db->lastError();
// is the same as
$db->command(array("getlasterror" => 1));
// is the same as
$db->selectCollection('$cmd')->findOne(array("getlasterror" => 1));
// is the same as
$db->selectCollection('$cmd')->find(array("getlasterror" => 1))->limit(1)->getNext();

Every command helper is just a find() in disguise! This means you can do (almost) anything with a database command that you could with a query.

This gives you more control. Not only can you use whatever options you want, you can do a few other things:

  • By default, drivers send all commands to the master, even if slaveOkay is set. If you want to send a command to a slave, you can deconstruct it to a query bypass the driver’s commands-go-to-master logic.
  • Suppose you have a command that takes a long time to execute and it times out on the client side. If you deconstruct the command into a query, you can (for some drivers) set the client-side timeout on the cursor.

Finally, if you’re using an unfamiliar driver, you might not know what its helpers are called but all drivers have a find() method, so you can always use that.

Exceptions

There are a couple command helpers worth implementing. I think that count and drop (at both the database and collection levels) are common enough to be worth having helpers for. Also, at a higher level (e.g., frameworks on top of the driver and admin GUIs) I think helpers are absolutely fine. However, as someone who has been maintaining a driver and supporting users for the last few years, I think that, at a driver level, command helpers are a terrible idea.

20 thoughts on “Why Command Helpers Suck

  1. You have a point but personally I think that the MongoDB perl driver could do with some more helpers (and dropping the evil AUTOLOAD as well), because one way or the other, I end up writing my own layer on top of the perl driver just to have a semi-sane way of performing certain operations.

    Like

    1. I think that it’s fine to have command helpers at a higher layer than the driver, as I mentioned. There’s a MongoDB::Admin module that would be the perfect place to put most helpers. I’d like to avoid putting them in the driver so that people don’t think it’s doing “magic.”

      Unfortunately, all of the other drivers support whatever their language’s verison of AUTOLOAD is… maybe I could add a configuration option or something that can turn it off.

      Like

    1. It can actually be 1, I’ve changed it above to avoid confusion. Back in the day (i.e., a year ago) it had to be negative or MongoDB wouldn’t recognize the query as a command.

      Limits are actually fairly complex. Negative limits mean: “return exactly N results, no more. If less than N results fit in a single database response, then only return however many documents fit into one response.” It’s specified by writing -N as your limit, and it returns up to N results and then closes the cursor. You can do it on any query, but usually it’s not what you want because if you wanted 6 documents and only 5 fit in the response, you have to do another query to get the 6th document. If you set a positive limit, the driver will automatically fetch the 6th document when you ask for it and then close the cursor.

      Like

  2. I use pymongo and finding out how call the commands directly is extremely difficult. What would be of help is if the doc for Database.server_status() said something like the following as I could then see how to do it normally:

    This is shorthand for Database.admin_command.execute({“server_status”: 1})

    My pet peeve is the “safe” parameter. Who the heck wants to use a database in an unsafe way? It should have been named “async” which then makes it very clear to others what is going on.

    Like

    1. Yeah, I’ve been trying to add translations to db commands for my drivers (e.g., http://www.php.net/manual/en/mongocollection.deleteindex.php). I’ve still got a ton to do, though.

      I agree that “safe” isn’t a great name, but we can’t call it “async” because it’s not async: asynchronous means that it doesn’t wait for the response, but will handle it when it gets there. Writes, by default, don’t get a response at all. A better name than “safe” or “async” would probably be “fire-and-forget”, but that’s kind of long.

      Like

      1. Your translation is great. You only need the this->db->command line. I actually use pymongo so finding this information is harder due to different naming conventions, no overview list of functions for each class, the functions documented in random order (instead of case) etc. Some of this is due to Sphinx (the doc tool). And some of it just because writing doc is time consuming. For example sometimes you want an overview with a beginning, middle and end. Some times you want to know what you can do with a particular item (eg a collection) and sometimes you want to know how to do something as quickly as possible. That requires almost three separate sets of documentation.

        As for safe vs async – it really is async. You get the response by calling getlasterror 🙂

        Like

      2. Yeah, I totally agree. I’ve found documentation incredibly difficult to do well. You want an introduction to get people started, API documentation to tell them what every little thing does, plus advanced documentation so they know how to put stuff together to do non-obvious things (using slaveOkay and a replica set, for instance). And that seems to be the bare minimum!

        It’s still not async 😛
        You either don’t get a response, or you get a response when you ask for it (synchronous). Also, you can’t exactly call getlasterror whenever, you have to bundle it with the write to make sure they are executed in tandem.

        Like

  3. I disagree. First of all yes its a pain for the driver maintainers at the moment, but fast forward to a few years when features mature. Yes a speak of a magical time when stored javascript will not lock the whole database, and I run out of obscure edge case bugs to annoy driver maintainers about. However, you will get there. At that point keeping the driver helper methods in sync with the db commands will be easier.

    Second, look at the other side of the coin. In a world where there are no command helpers, I’m resorting to third party wrapper classes for even the simplest, “I can write this in a day” ETL scripts and json services. If EVERYONE uses a wrapper class, then Dwight and Eliot will eventually decide that all the driver maintainers need to develop best of breed wrappers to all the drivers so 10gen can officially support them. Then you will become a wrapper layer maintainer who has to deal with people not knowing what driver calls the wrapper layer makes. When you write an API some programmers will always think its a magical black box.

    Like

    1. I see it as: we should wait a couple of years to add the helpers, then. Otherwise, we’ll be stuck supporting unpopular and deprecated (getPrevError, anyone?) helpers forever.

      It’s not exactly going to slow someone down to have to write db.runCommand({x : “foo”) instead of db.foo.x(). In fact, I’d argue it’ll speed them up, as they won’t come to a grinding halt when the shell does something that they want to do in their program but don’t know how.

      Also, everyone does use frameworks, ODMs and wrappers already, especially in the Ruby and Java communities (but the others, too). One of the goals of the drivers is to be very low-level: you almost need a framework around it to do anything “serious.” We’re happy having these created and maintained by a 3rd party.

      It’s true that developers will always think of it as a black box, and some abstraction is definitely good! However, when it leads to people thinking that “magic” is happening and inhibits their ability to generalize what they’ve learned, I think it’s bad.

      Like

      1. > I see it as: we should wait a couple of years to add the helpers, then. Otherwise, we’ll be stuck supporting unpopular and deprecated (getPrevError, anyone?) helpers forever.

        Point taken there. If it were me, I’d be meaner from the get-go and tell people “when something is marked deprecated its disappearing after a year.” However, such top-down autocracy would be contrary to the mongo culture.

        > Also, everyone does use frameworks, ODMs and wrappers already, especially in the Ruby and Java communities (but the others, too).

        This is where I part ways with 99% of the developer community. The only db wrapper library I ever liked for any language was PDO (and PearDB before that). Perhaps mongo will change my thinking on this. Part of my reason for having some spartan DALs when coding against a RDBMS was I stuck a lot of business logic in db constraints and sprocs. When I was developing an app with the official C# driver, I had to move a lot of that to my DAL.

        Like

      2. I would love, love, love to be able to mark things as deprecated and remove them. Unfortunately, my users turn into an angry mob whenever I change anything (a couple versions ago I added a new field to a class and I’m still getting complaints).

        Like

  4. From the lastError example I think it’s pretty obvious why the helper methods are (and should) be there.

    There’s a reason wrappers are built around stuff that’s ugly and hard to read and write. Typically I would rather use a wrapper for a http api in my language of choice rather than use the lower level classes to do http requests, parsing the response by hand etc. Same thing applies to databases where we see all modern web frameworks using ORM’s (or ODM’s in the case of Mongo) to ease reading and writing code. I do agree however that it’s important for users to be aware that they’re using code which is part of a wrapper rather than the underlying API, but this is a communication issue rather than anything else.

    From my point of view Mongo still has problems with its APIs though. My experience has been that there is an overall inconsistency of language design which makes it tough to get into. It shares the same kind of randomness that I find in PHP and Linux (command line tools) APIs. It’s sometimes even worse than it ever was to write SQL, which is something lots of people would agree is a less than pleasant experience. If we would implement the base API of the database in a modern consistent and beautiful manner we would probably see even less people preferring ODM’s for Mongo.

    Things like “principle of least surprise” which has been a target when creating the Ruby language would make a good impact on the Mongo API as it stands today.

    Alexander made a comment on Replica Set Configuration page in the docs which kind of sums it up for me:
    Would be nice if config settings followed some convention. As of now there is both bind_ip (underscore), replSet (camelcase) and nohttpinterface (all lowercase).
    Also, the names are shortened despite the fact that they in most cases will be run off a file (not as command line parameters), where the length of the keys are of less importance than the readability.

    Like

    1. The lastError example goes a bit beyond helpers. If you didn’t have a helper, you’d just use the second line. It’s to demonstrate how you can tweak things if you know what’s going on.

      I agree with having a wrapper around the driver. I think that the driver should be pretty low-level, though: a MongoDB driver should be like the C standard library. I think having command helpers in the driver is analogous to stdio.h having a function that reads a file and returns a char array of its contents: that kind of functionality is way too high-level for stdio.h! You could have a wrapper library that does it, but people are going to get the wrong impression if it’s part of the standard library.

      I’m not sure what other complaints you have about the API, but I agree about the bind_ip (that drives me nuts). We are trying to standardize on camel case, but it hasn’t been a huge priority. We’ve made more progress camel casing database commands, which have the same inconsistencies… it started out as “getlasterror” and now you can run either that or “getLastError” (I’m old school, so I usually use getlasterror 🙂 ).

      Like

  5. Commands are fine and using them instead of wrappers is a good idea.
    The list of commands is ugly and it is hard to understand how to use them.
    The docs usually show the js wrapper version instead of the command. (http://www.mongodb.org/display/DOCS/Validate+Command)
    Mongodb is an amazing product and the support in the mailing list and irc are amazing but the wiki is… not so good

    Like

    1. Yeah, all of the MongoDB developers love programming and answering user questions. We’re all more grudging about working on documentation, and it shows :-/

      Everything’s pretty frozen down at the moment, but I’ll try to make the list of commands nicer post-1.8.0.

      Like

Leave a comment