Intro to Fail Points

This is probably exclusively of interest to my coworkers, but MongoDB has a new fail points framework. Fail points make it easier to test things that are hard to fake, like page faults or network errors. Basically, you create a glorified boolean called a fail point, which you can turn on and off while mongod is running.

To show how this works, I’ll modify the humble “ping” command. “ping” is fairly simple:

> db.runCommand({"ping" : 1})
{ "ok" : 1 }

I’d like to make the response include a "pong" : 1 field, on command.

The ping command is defined in src/mongo/db/dbcommands_generic.cpp. To add a fail point, we first have to include the fail points header at the top of the file:

#include "repl/multicmd.h"
#include "server.h"
#include "mongo/util/fail_point_service.h"

namespace mongo {

Then, below the namespace mongo line, add a declaration for the fail point:

namespace mongo {

    MONGO_FP_DECLARE(pingPongPoint);

Feel free to call your failpoint whatever you want.

The ping command is defined lower down in the file in a section that looks like this:

    class PingCommand : public Command {
    public:
        PingCommand() : Command( "ping" ) {}
        virtual bool slaveOk() const { return true; }
        virtual void help( stringstream &help ) const { help << "a way to check that the server is alive. responds immediately even if server is in a db lock."; }
        virtual LockType locktype() const { return NONE; }
        virtual bool requiresAuth() { return false; }
        virtual void addRequiredPrivileges(const std::string& dbname,
                                           const BSONObj& cmdObj,
                                           std::vector* out) {} // No auth required
        virtual bool run(const string& badns, BSONObj& cmdObj, int, string& errmsg, BSONObjBuilder& result, bool) {
            // IMPORTANT: Don't put anything in here that might lock db - including authentication
            return true;
        }
    } pingCmd;

Now, in the run() method of the code above, you can trigger certain actions when the fail point is turned on:

        virtual bool run(const string& badns, BSONObj& cmdObj, int, string& errmsg, BSONObjBuilder& result, bool) {
            // IMPORTANT: Don't put anything in here that might lock db - including authentication
            if (MONGO_FAIL_POINT(pingPongPoint)) {
                result.append("pong", 1.0);
            }
            return true;
        }

Now recompile the database. By default, mongod doesn’t allow failpoints to be run. To even allow the possibility of fail points being triggered, you have to run mongod with the --setParameter enableTestCommands=1 option.

$ ./mongod --setParameter enableTestCommands=1

Note: as of this writing, you cannot enable failpoints with the setParameter command, you must start the database with this option.

The failpoint still isn’t turned on, so if you run db.runCommand({ping:1}), you can see that there’s still just the “ok” field. You can enable the fail point with the configureFailPoint command:

> db.adminCommand({"configureFailPoint" : 'pingPongPoint', "mode" : 'alwaysOn'})
{ "ok" : 1 }
> db.runCommand({ping:1})
{ "pong" : 1, "ok" : 1 }
> db.adminCommand({"configureFailPoint" : 'pingPongPoint', "mode" : 'off'})
{ "ok" : 1 }
> db.runCommand({ping:1})
{ "ok" : 1 }

Possible modes are "alwaysOn", "off", and {"times" : 37} (which would be on for the next 37 times the fail point is hit… obviously the value for “times” is configurable).

This is a derpy example, but I’ve found it super helpful for debugging concurrency issues where I need to force a thread to block until another thread has done something. You can do that with something like:

while (MONGO_FAIL_POINT(looper)) {
    sleep(0);
}

If you wanted to merely delay something, say, immitate a slow connection, you can use MONGO_FAIL_POINT_BLOCK to pass in information:

MONGO_FAIL_POINT_BLOCK(pingPongPoint, myDelay) {
    const BSONObj& data = myDelay.getData();
    sleep(data["delay"].numberInt());
}

Then you’d pass in a delay as so:

> db.adminCommand({"configureFailPoint" : 'pingPongPoint', "mode" : 'alwaysOn', "data" : {"delay" : 5}})
{ "ok" : 1 }

Now, if you run the ping command, it’ll take 5 seconds to return.

2 thoughts on “Intro to Fail Points

    1. I’m not sure if they’ll be any help to a user, but I added two yesterday: rsBgSyncProduce pauses a secondary in pulling new ops from it sync source and rsSyncApplyStop pauses it applying new ops to its data.  The only other one is throwSockExcep, which will cause anything the DB tries to send to cause a socket exception.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: