On working at 10gen

10gen is trying to hire a gazillion people, so I’m averaging two interviews a day (bleh). A lot of people have asked what it’s like to work on MongoDB, so I thought I’d write a bit about it.

A Usual Day

Coffee: the lynchpin of my day.
  • Get in around 10am.
  • Check if there are any commercial support questions that need to be answered right now.
  • Have a cup of coffee and code until lunch.
  • Eat lunch.
  • If nothing dire has happened, go out for coffee+writing. This refuels my brain and is a creative outlet: that’s where I am now. My coffee does not look nearly as awesome as the coffee on the right.
  • Go back to the office, code all afternoon.
  • Depending on the day, usually between 5:30 and 6:30 the programmers will naturally start discussing problems we had over the day, interviews, support, the latest geek news, etc. Often beers are broken out.
  • Wrap up, go home.

There are some variations on this: as I mentioned, a lot of time lately is taken up by interviewing. Other coworkers spend a lot more time than I do at consults, trainings, speaking at conferences, etc.

Other General Workday Stuff

On Fridays, we have lunch as a team. After lunch, we have a tech talk where someone presents on what they’re working on (e.g., the inspiration for my geospatial post) or general info that’s good to know (e.g., the inspiration for my virtual memory post). This is a nice way to end the week, especially since Fridays often wrap up earlier than other days.

A couple people use OS X or Windows for development, most people use Linux. You can use whatever you want. I’d like to encourage emacs users, in particular, to apply, as we’re falling slightly behind vi in numbers.

We sit in an open office plan, everyone at tables in a big room (including the CEO and CTO, who are both programmers). The only people in separate rooms are the people who have to be on the phone all day (sales, marketers, basketweavers… I’m not really clear on what non-technical people do).

And speaking of what people actually do, here are three examples of my job (that are more specific than “coding”):

Fixing Other People’s Bugs

Recently, a developer was using MongoDB and IBM’s DB2 with PHP. After he installed the MongoDB driver, PHP started segfaulting all over the place. I downloaded the ibm_db2 PHP extension to take a look.

PHP keeps a “storage unit” for extensions’ long-term memory use. Every extension shares the space and can store things there.

The DB2 extension was basically fire-bombing the storage unit.

It went through the storage, object by object, casting the objects into DB2 types and then freeing them. This worked fine when DB2 was the only PHP extension being used, but broke down when anyone else tried to use that storage. I gave the user a small patch that stopped the DB2 extension from destroying objects it didn’t create, and everything worked fine for them, after that.

The Game is Afoot

A user reported that they couldn’t initialize their replica set: a member wasn’t coming online. The trick with this type of bug is to get enough evidence before the user wants to beat you over the head with the 800th log you’ve requested.

I asked them to send the first round of logs. It was weird, nothing was wrong from server1‘s point of view: it initialized properly and could connect to everyone in the set. I puzzled over the messages, figuring out that once server1 had created the set, server2 had accepted the connection from server1 but then somehow failed to connect back to server1 and so couldn’t pick up the set config. However, according to server1, it could connect fine to server2 and thought it was perfectly healthy!

I finally realized what must be happening: “It looks like server2 couldn’t connect to any of the others, but all of them could connect to it. Could you check your firewall?”

“Oh, that server was blocking all outgoing connections! Now its working fine.”

Elementary, my dear Watson.

You know you’re not at a big company when…

At least it had "handles."

Someone on Sparc complained that the Perl driver wasn’t working at all for them. My first thought was that Sparc is big-endian, so maybe the Perl driver wasn’t flipping memory correctly. I asked Eliot where our Power PC was, and he said we must have forgotten it when we moved: it was still in our old office around the corner.

“Bring someone to help carry it,” he told me. “It’s heavy.”

Pshaw, I thought. How heavy could an old desktop be?

I went around the corner and the other company graciously let me walk into their server room, choose a server, and walk out with it. Unfortunately, it weighed about 50 pounds, and I have a traditional geek physique (no muscles). The trip back to our office involved me staggering a couple steps, putting it down, shaking out my arms, and repeat.

When I got to our office, I just dragged it down the hallway to our server closet. Eliot saw me tugboating the thing down the hallway.

“You didn’t bring someone to help?”

“It’s *oof* fine!”

Unfortunately, once it was all set up, the Perl driver worked perfectly on it. So it wasn’t big-endian specific.

I was now pretty sure it was Sparc-specific (another person had reported the same problem on a Sparc), so I bought an elderly Sparc server for a couple hundred bucks off eBay. When it arrived a couple days later, Eliot showed me how to rack it and I spent a day fighting with the Solaris/Oracle package manager. However, it was all worth it: I tried running the Perl driver and it instantly failed (success!).

After some debugging, I realized that Sparc was much more persnickety than Intel about byte alignment. The Perl driver was playing fast and loose with a byte buffer, casting pieces of it into other types (which Sparc didn’t like). I changed some casts to memcpys and the Perl driver started working beautifully.

But every day is different

The episodes above are a very small sample of what I do: there are hundreds of other things I’ve worked on over the last few years from speaking to working on the database to writing a freakin Facebook app.

So, if this sounded interesting, please go to our jobs website and submit an application!

Getting Started with MMS

Edit: since this was written, Sam has written some excellent documentation on using MMS. I recommend reading through it as you explore MMS.

Telling someone “You should set up monitoring” is kind of like telling someone “You should exercise 20 minutes three times a week.” Yes, you know you should, but your chair is so comfortable and you haven’t keeled over dead yet.

For years*, 10gen has been planning to do monitoring “right,” making it painless to monitor your database. Today, we released the MongoDB Monitoring Service: MMS.

MMS is free hosted monitoring for MongoDB. I’ve been using it to help out paying customers for a while, so I thought I’d do a quick post on useful stuff I’ve discovered (documentation is… uh… a little light, so far).

So, first: you sign up.

There are two options: register a company and register another account for an existing company. For example, let’s say I wanted to monitor the servers for Snail in a Turtleneck Enterprises. I’ll create a new account and company group. Then Andrew, sys admin of my heart, can create an account with Snail in a Turtleneck Enterprises and have access to all the same monitoring info.

Once you’re registered, you’ll see a page encouraging you to download the MMS agent. Click on the “download the agent” link.

This is a little Python program that collects stats from MongoDB, so you need to have pymongo installed, too. Starting from scratch on Ubuntu, do:

$ # prereqs
$ sudo apt-get install python python-setuptools
$ sudo easy_install pymongo
$ # set up agent
$ unzip name-of-agent.zip
$ cd name-of-agent
$ mkdir logs
$ # start agent
$ nohup python agent.py > logs/agent.log 2>&1 &

Last step! Back to the website: see that “+” button next to the “Hosts” title?

Designed by programmers, for Vulcans

Click on that and type a hostname. If you have a sharded cluster, add a mongos. If you have a replica set, add any member.

Now go have a nice cup of coffee. This is an important part of the process.

When you get back, tada, you’ll have buttloads of graphs. They probably won’t have much on them, since MMS will have been monitoring them for all of a few minutes.

Cool stuff to poke

This is the top bar of buttons:

Of immediate interest: click “Hosts” to see a list of hosts.

You’ll see hostname, role, and the last time the MMS agent was able to reach this host. Hosts that it hasn’t reached recently will have a red ping time.

Now click on a server’s name to see all of the info about it. Let’s look at a single graph.

You can click & drag to see a smaller bit of time on the graph. See those icons in the top right? Those give you:

Add to dashboard: you can create a custom dashboard with any charts you’re interested in. Click on the “Dashboard” link next to “Hosts” to see your dashboard.
Link to a private URL for this chart. You’ll have to be logged in to see it.
Email a jpg of this chart to someone.
This is maybe the most important one: a description of what this chart represents.

That’s the basics. Some other points of interest:

  • You can set up alerts by clicking on “Alerts” in the top bar
  • “Events” shows you when hosts went down or came up, because primary or secondary, or were upgraded.
  • Arbiters don’t have their own chart, since they don’t have data. However, there is an “Arbiters” tab that lists them if you have some.
  • The “Last Ping” tab contains all of the info sent by MMS on the last ping, which I find interesting.
  • If you are confused, there is an “FAQ” link in the top bar that answers some common questions.

If you have any problems with MMS, there’s a little form at the bottom to let you complain:

This will file a bug report for you. This is a “private” bug tracker, only 10gen and people in your group will be able to see the bugs you file.

* If you ran mongod --help using MongoDB version 1.0.0 or higher, you might have noticed some options that started with --mms. In other words, we’ve been planning this for a little while.

More PHP Internals: References

By request, a quick post on using PHP references in extensions.

To start, here’s an example of references in PHP we’ll be translating into C:

This will print:

x is 1
called not_by_ref(1)
x is 1
called by_ref(1)
x is 3

If you want your C extension’s function to officially have a signature with ampersands in it, you have to declare to PHP that you want to pass in refs as arguments. Remember how we declared functions in this struct?

zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)

The second argument to PHP_FE, NULL, can optional be the argument spec. For example, let’s say we’re implementing by_ref() in C. We would add this to php_rlyeh.c:

// the 1 indicates pass-by-reference
ZEND_BEGIN_ARG_INFO(arginfo_by_ref, 1)

zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  PHP_FE(by_ref, arginfo_by_ref)

PHP_FUNCTION(by_ref) {
  zval *zptr = 0;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {

  php_printf("called (the c version of) by_ref(%d)n", (int)Z_LVAL_P(zptr));
  ZVAL_LONG(zptr, 3);

Suppose we also add not_by_ref(). This might look something like:

ZEND_BEGIN_ARG_INFO(arginfo_not_by_ref, 0)

zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  PHP_FE(by_ref, arginfo_by_ref)
  PHP_FE(not_by_ref, arginfo_not_by_ref)

PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {

  php_printf("called (the c version of) not_by_ref(%d)n", (int)Z_LVAL_P(zptr));
  ZVAL_LONG(zptr, 2);

However, if we try running this, we’ll get:

x is 1
called (the c version of) not_by_ref(1)
x is 2
called (the c version of) by_ref(2)
x is 3

What happened? not_by_ref used our variable like a reference!

This is really weird and annoying behavior (if anyone knows why PHP does this, please comment below).

To work around it, if you want non-reference behavior, you have to manually make a copy of the argument.

Our not_by_ref() function becomes:

PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {

  // make a copy                                                                                                                                                          
  memcpy(copy, zptr, sizeof(zval));

  // set refcount to 1, as we're only using "copy" in this function                                                                                                         
  Z_SET_REFCOUNT_P(copy, 1);

  php_printf("called (the c version of) not_by_ref(%d)n", (int)Z_LVAL_P(copy));
  ZVAL_LONG(copy, 2);


Note that we set the refcount of copy to 1. This is because the refcount for zptr is 2: 1 ref from the calling function + 1 ref from the not_by_ref function. However, we don’t want the copy of zptr to have a refcount of 2, because it’s only being used by the current function.

Also note that memcpy-ing the zval only works because this is a scalar: if this were an array or object, we’d have to use PHP API functions to make a deep copy of the original.

If we run our PHP program again, it gives us:

x is 1
called (the c version of) not_by_ref(1)
x is 1
called (the c version of) by_ref(1)
x is 3

Okay, this is pretty good… but we’re actually missing a case. What happens if we pass in a reference to not_by_ref()? In PHP, this looks like:

function not_by_ref($arg) {
   $arg = 2;

$x = 1;

…which displays “x is 2”. Unfortunately, we’ve overridden this behavior in our not_by_ref() C function, so we have to special case: if this is a reference, change its value, otherwise make a copy and change the copy’s value.

PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {

  if (Z_ISREF_P(zptr)) {
    // if this is a reference, make copy point to zptr
    copy = zptr;

    // adding a reference so we can indiscriminately delete copy later
  else {
    // make a copy                                                                                                                                  
    memcpy(copy, zptr, sizeof(zval));

    // set refcount to 1, as we're only using "copy" in this function                                                                                                       
    Z_SET_REFCOUNT_P(copy, 1);

  php_printf("called (the c version of) not_by_ref(%d)n", (int)Z_LVAL_P(copy));
  ZVAL_LONG(copy, 2);


Now it’ll behave “properly.”

There may be a better way to do this, please leave a comment if you know of one. However, as far as I know, this is the only way to emulate the PHP reference behavior.

If you would like to read more about PHP references, Derick Rethans wrote a great article on it for PHP Architect.

Playing with Virtual Memory

Linux: the developer's personal gentleman

When you run a process, it needs some memory to store things: its heap, its stack, and any libraries it’s using. Linux provides and cleans up memory for your process like an extremely conscientious butler. You can (and generally should) just let Linux do its thing, but it’s a good idea to understand the basics of what’s going on.

One easy way (I think) to understand this stuff is to actually look at what’s going on using the pmap command. pmap shows you memory information for a given process.

For example, let’s take a really simple C program that prints its own process id (PID) and pauses:


int main() {
  printf("run `pmap %d`n", getpid());

Save this as mem_munch.c. Now compile and run it with:

$ gcc mem_munch.c -o mem_munch
$ ./mem_munch
run `pmap 25681`

The PID you get will probably be different than mine (25681).

At this point, the program will “hang.” This is because of the pause() function, and it’s exactly what we want. Now we can look at the memory for this process at our leisure.

Open up a new shell and run pmap, replacing the PID below with the one mem_munch gave you:

$ pmap 25681
25681:   ./mem_munch
0000000000400000      4K r-x--  /home/user/mem_munch
0000000000600000      4K r----  /home/user/mem_munch
0000000000601000      4K rw---  /home/user/mem_munch
00007fcf5af88000   1576K r-x--  /lib/x86_64-linux-gnu/libc-2.13.so
00007fcf5b112000   2044K -----  /lib/x86_64-linux-gnu/libc-2.13.so
00007fcf5b311000     16K r----  /lib/x86_64-linux-gnu/libc-2.13.so
00007fcf5b315000      4K rw---  /lib/x86_64-linux-gnu/libc-2.13.so
00007fcf5b316000     24K rw---    [ anon ]
00007fcf5b31c000    132K r-x--  /lib/x86_64-linux-gnu/ld-2.13.so
00007fcf5b512000     12K rw---    [ anon ]
00007fcf5b539000     12K rw---    [ anon ]
00007fcf5b53c000      4K r----  /lib/x86_64-linux-gnu/ld-2.13.so
00007fcf5b53d000      8K rw---  /lib/x86_64-linux-gnu/ld-2.13.so
00007fff7efd8000    132K rw---    [ stack ]
00007fff7efff000      4K r-x--    [ anon ]
ffffffffff600000      4K r-x--    [ anon ]
 total             3984K

This output is how memory “looks” to the mem_munch process. If mem_munch asks the operating system for 00007fcf5af88000, it will get libc. If it asks for 00007fcf5b31c000, it will get the ld library.

This output is a bit dense and abstract, so let’s look at how some more familiar memory usage shows up. Change our program to put some memory on the stack and some on the heap, then pause.


int main() {
  int on_stack, *on_heap;

  // local variables are stored on the stack
  on_stack = 42;
  printf("stack address: %pn", &on_stack);

  // malloc allocates heap memory
  on_heap = (int*)malloc(sizeof(int));
  printf("heap address: %pn", on_heap);

  printf("run `pmap %d`n", getpid());

Now compile and run it:

$ ./mem_munch 
stack address: 0x7fff497670bc
heap address: 0x1b84010
run `pmap 11972`

Again, your exact numbers will probably be different than mine.

Before you kill mem_munch, run pmap on it:

$ pmap 11972
11972:   ./mem_munch
0000000000400000      4K r-x--  /home/user/mem_munch
0000000000600000      4K r----  /home/user/mem_munch
0000000000601000      4K rw---  /home/user/mem_munch
0000000001b84000    132K rw---    [ anon ]
00007f3ec4d98000   1576K r-x--  /lib/x86_64-linux-gnu/libc-2.13.so
00007f3ec4f22000   2044K -----  /lib/x86_64-linux-gnu/libc-2.13.so
00007f3ec5121000     16K r----  /lib/x86_64-linux-gnu/libc-2.13.so
00007f3ec5125000      4K rw---  /lib/x86_64-linux-gnu/libc-2.13.so
00007f3ec5126000     24K rw---    [ anon ]
00007f3ec512c000    132K r-x--  /lib/x86_64-linux-gnu/ld-2.13.so
00007f3ec5322000     12K rw---    [ anon ]
00007f3ec5349000     12K rw---    [ anon ]
00007f3ec534c000      4K r----  /lib/x86_64-linux-gnu/ld-2.13.so
00007f3ec534d000      8K rw---  /lib/x86_64-linux-gnu/ld-2.13.so
00007fff49747000    132K rw---    [ stack ]
00007fff497bb000      4K r-x--    [ anon ]
ffffffffff600000      4K r-x--    [ anon ]
 total             4116K

Note that there’s a new entry between the final mem_munch section and libc-2.13.so. What could that be?

# from pmap
0000000001b84000 132K rw--- [ anon ]
# from our program
heap address: 0x1b84010

The addresses are almost the same. That block ([ anon ]) is the heap. (pmap labels blocks of memory that aren’t backed by a file [ anon ]. We’ll get into what being “backed by a file” means in a sec.)

The second thing to notice:

# from pmap
00007fff49747000 132K rw--- [ stack ]
# from our program
stack address: 0x7fff497670bc

And there’s your stack!

One other important thing to notice: this is how memory “looks” to your program, not how memory is actually laid out on your physical hardware. Look at how much memory mem_munch has to work with. According to pmap, mem_munch can address memory between address 0x0000000000400000 and 0xffffffffff600000 (well, actually 0x00007fffffffffffffff, beyond that is special). For those of you playing along at home, that’s almost 10 million terabytes of memory. That’s a lot of memory. (If your computer has that kind of memory, please leave your address and times you won’t be at home.)

So, the amount of memory the program can address is kind of ridiculous. Why does the computer do this? Well, lots of reasons, but one important one is that this means you can address more memory than you actually have on the machine and let the operating system take care of making sure the right stuff is in memory when you try to access it.

Memory Mapped Files

Memory mapping a file basically tells the operating system to load the file so the program can access it as an array of bytes. Then you can treat a file like an in-memory array.

For example, let’s make a (pretty stupid) random number generator ever by creating a file full of random numbers, then mmap-ing it and reading off random numbers.

First, we’ll create a big file called random (note that this creates a 1GB file, so make sure you have the disk space and be patient, it’ll take a little while to write):

$ dd if=/dev/urandom bs=1024 count=1000000 of=/home/user/random
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 123.293 s, 8.3 MB/s
$ ls -lh random
-rw-r--r-- 1 user user 977M 2011-08-29 16:46 random

Now we’ll mmap random and use it to generate random numbers.


int main() {
  char *random_bytes;
  FILE *f;
  int offset = 0;

  // open "random" for reading                                                                                                                                              
  f = fopen("/home/user/random", "r");
  if (!f) {
    perror("couldn't open file");
    return -1;

  // we want to inspect memory before mapping the file                                                                                                                      
  printf("run `pmap %d`, then press ", getpid());

  random_bytes = mmap(0, 1000000000, PROT_READ, MAP_SHARED, fileno(f), 0);

  if (random_bytes == MAP_FAILED) {
    perror("error mapping the file");
    return -1;

  while (1) {
    printf("random number: %d (press  for next number)", *(int*)(random_bytes+offset));

    offset += 4;

If we run this program, we’ll get something like:

$ ./mem_munch 
run `pmap 12727`, then press 

The program hasn’t done anything yet, so the output of running pmap will basically be the same as it was above (I’ll omit it for brevity). However, if we continue running mem_munch by pressing enter, our program will mmap random.

Now if we run pmap it will look something like:

$ pmap 12727
12727:   ./mem_munch
0000000000400000      4K r-x--  /home/user/mem_munch
0000000000600000      4K r----  /home/user/mem_munch
0000000000601000      4K rw---  /home/user/mem_munch
000000000147d000    132K rw---    [ anon ]
00007fe261c6f000 976564K r--s-  /home/user/random
00007fe29d61c000   1576K r-x--  /lib/x86_64-linux-gnu/libc-2.13.so
00007fe29d7a6000   2044K -----  /lib/x86_64-linux-gnu/libc-2.13.so
00007fe29d9a5000     16K r----  /lib/x86_64-linux-gnu/libc-2.13.so
00007fe29d9a9000      4K rw---  /lib/x86_64-linux-gnu/libc-2.13.so
00007fe29d9aa000     24K rw---    [ anon ]
00007fe29d9b0000    132K r-x--  /lib/x86_64-linux-gnu/ld-2.13.so
00007fe29dba6000     12K rw---    [ anon ]
00007fe29dbcc000     16K rw---    [ anon ]
00007fe29dbd0000      4K r----  /lib/x86_64-linux-gnu/ld-2.13.so
00007fe29dbd1000      8K rw---  /lib/x86_64-linux-gnu/ld-2.13.so
00007ffff29b2000    132K rw---    [ stack ]
00007ffff29de000      4K r-x--    [ anon ]
ffffffffff600000      4K r-x--    [ anon ]
 total           980684K

This is very similar to before, but with an extra line (bolded), which kicks up virtual memory usage a bit (from 4MB to 980MB).

However, let’s re-run pmap with the -x option. This shows the resident set size (RSS): only 4KB of random are resident. Resident memory is memory that’s actually in RAM. There’s very little of random in RAM because we’ve only accessed the very start of the file, so the OS has only pulled the first bit of the file from disk into memory.

pmap -x 12727
12727:   ./mem_munch
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000       0       4       0 r-x--  mem_munch
0000000000600000       0       4       4 r----  mem_munch
0000000000601000       0       4       4 rw---  mem_munch
000000000147d000       0       4       4 rw---    [ anon ]
00007fe261c6f000       0       4       0 r--s-  random
00007fe29d61c000       0     288       0 r-x--  libc-2.13.so
00007fe29d7a6000       0       0       0 -----  libc-2.13.so
00007fe29d9a5000       0      16      16 r----  libc-2.13.so
00007fe29d9a9000       0       4       4 rw---  libc-2.13.so
00007fe29d9aa000       0      16      16 rw---    [ anon ]
00007fe29d9b0000       0     108       0 r-x--  ld-2.13.so
00007fe29dba6000       0      12      12 rw---    [ anon ]
00007fe29dbcc000       0      16      16 rw---    [ anon ]
00007fe29dbd0000       0       4       4 r----  ld-2.13.so
00007fe29dbd1000       0       8       8 rw---  ld-2.13.so
00007ffff29b2000       0      12      12 rw---    [ stack ]
00007ffff29de000       0       4       0 r-x--    [ anon ]
ffffffffff600000       0       0       0 r-x--    [ anon ]
----------------  ------  ------  ------
total kB          980684     508     100

If the virtual memory size (the Kbytes column) is all 0s for you, don’t worry about it. That’s a bug in Debian/Ubuntu’s -x option. The total is correct, it just doesn’t display correctly in the breakdown.

You can see that the resident set size, the amount that’s actually in memory, is tiny compared to the virtual memory. Your program can access any memory within a billion bytes of 0x00007fe261c6f000, but if it accesses anything past 4KB, it’ll probably have to go to disk for it*.

What if we modify our program so it reads the whole file/array of bytes?


int main() {
  char *random_bytes;
  FILE *f;
  int offset = 0;

  // open "random" for reading                                                                                                                                              
  f = fopen("/home/user/random", "r");
  if (!f) {
    perror("couldn't open file");
    return -1;

  random_bytes = mmap(0, 1000000000, PROT_READ, MAP_SHARED, fileno(f), 0);

  if (random_bytes == MAP_FAILED) {
    printf("error mapping the filen");
    return -1;

  for (offset = 0; offset < 1000000000; offset += 4) {
    int i = *(int*)(random_bytes+offset);

    // to show we're making progress                                                                                                                                        
    if (offset % 1000000 == 0) {

  // at the end, wait for signal so we can check mem                                                                                                                        
  printf("ndone, run `pmap -x %d`n", getpid());

Now the resident set size is almost the same as the virtual memory size:

$ pmap -x 5378
5378:   ./mem_munch
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000       0       4       4 r-x--  mem_munch
0000000000600000       0       4       4 r----  mem_munch
0000000000601000       0       4       4 rw---  mem_munch
0000000002271000       0       4       4 rw---    [ anon ]
00007fc2aa333000       0  976564       0 r--s-  random
00007fc2e5ce0000       0     292       0 r-x--  libc-2.13.so
00007fc2e5e6a000       0       0       0 -----  libc-2.13.so
00007fc2e6069000       0      16      16 r----  libc-2.13.so
00007fc2e606d000       0       4       4 rw---  libc-2.13.so
00007fc2e606e000       0      16      16 rw---    [ anon ]
00007fc2e6074000       0     108       0 r-x--  ld-2.13.so
00007fc2e626a000       0      12      12 rw---    [ anon ]
00007fc2e6290000       0      16      16 rw---    [ anon ]
00007fc2e6294000       0       4       4 r----  ld-2.13.so
00007fc2e6295000       0       8       8 rw---  ld-2.13.so
00007fff037e6000       0      12      12 rw---    [ stack ]
00007fff039c9000       0       4       0 r-x--    [ anon ]
ffffffffff600000       0       0       0 r-x--    [ anon ]
----------------  ------  ------  ------
total kB          980684  977072     104

Now if we access any part of the file, it will be in RAM already. (Probably. Until something else kicks it out.) So, our program can access a gigabyte of memory, but the operating system can lazily load it into RAM as needed.

And that’s why your virtual memory is so damn high when you’re running MongoDB.

Left as an exercise to the reader: try running pmap on a mongod process before it’s done anything, once you’ve done a couple operations, and once it’s been running for a long time.

* This isn’t strictly true**. The kernel actually says, “If they want the first N bytes, they’re probably going to want some more of the file” so it’ll load, say, the first dozen KB of the file into memory but only tell the process about 4KB. When your program tries to access this memory that is in RAM, but it didn’t know was in RAM, it’s called a minor page fault (as opposed to a major page fault when it actually has to hit disk to load new info). back to context

** This note is also not strictly true. In fact, the whole file will probably be in memory before you map anything because you just wrote the thing with dd. So you’ll just be doing minor page faults as your program “discovers” it.

PHP Extensions Made Eldrich: Classes

This is the final section of a 4-part series on writing PHP extensions.

  1. Setting Up PHP – compiling PHP for extension development
  2. Hello, world! – your first extension
  3. Working with the API – the PHP C API
  4. Classes – creating PHP objects in C

branch: oop

This section will cover creating objects. Objects are like associative arrays++, they allow you to attach almost any functionality you want to a PHP variable.

You can create an object in much the same way that you’d create an array:

PHP_FUNCTION(makeObject) {

    // add a couple of properties
    zend_update_property_string(NULL, return_value, "name", strlen("name"), "yig" TSRMLS_CC);
    zend_update_property_long(NULL, return_value, "worshippers", strlen("worshippers"), 4 TSRMLS_CC);

If you call var_dump(makeObject()), you’ll see something like:

object(stdClass)#1 (2) {
  string(3) "yig"


branch: cultists

You create a class by designing a class template, stored in a zend_class_entry.

For our extension, we’ll make a new class, Cultist. We want a standard cultist template, but every individual cultist is unique.

I like to give each class its own C file to keep things tidy, but that’s not necessary if it’s more logical to group them together or something. However, it’s my tutorial, so we’re splitting it out.

Add two new files to your extension directory: cultist.c and cultist.h. Add the new C file to your config.m4, so it will get compiled into your extension:

PHP_NEW_EXTENSION(rlyeh, php_rlyeh.c cultist.c, $ext_shared)

Note that there is no comma between php_rlyeh.c and cultist.c.

Now we want to add our Cultist class. Open up cultist.c and add the following code:


#include "cultist.h"

zend_class_entry *rlyeh_ce_cultist;

static function_entry cultist_methods[] = {
  PHP_ME(Cultist, sacrifice, NULL, ZEND_ACC_PUBLIC)

void rlyeh_init_cultist(TSRMLS_D) {
  zend_class_entry ce;

  INIT_CLASS_ENTRY(ce, "Cultist", cultist_methods);
  rlyeh_ce_cultist = zend_register_internal_class(&ce TSRMLS_CC);

  /* fields */
  zend_declare_property_bool(rlyeh_ce_cultist, "alive", strlen("alive"), 1, ZEND_ACC_PUBLIC TSRMLS_CC);

PHP_METHOD(Cultist, sacrifice) {
  // TODO                                                                                                                                                                   

You might recognize the function_entry struct from our original extension: methods are just grouped into function_entrys per class.

The real meat-and-potatoes is in rlyeh_init_cultist. This function defines the class entry for cultist, giving it methods (cultist_methods), constants, and properties.

There are tons of flags that can be set for methods and properties. Some of the most common are:


Currently we’re just using ZEND_ACC_PUBLIC for our sacrifice function, but this could be OR-ed with any of the other flags (for example, if we decided sacrifice2() had a better API, we could change sacrifice‘s flags to ZEND_ACC_PUBLIC|ZEND_ACC_DEPRECATED and PHP would warn the user if they tried to use it).

In cultist.h, define all of the functions used above:

#ifndef CULTIST_H
#define CULTIST_H

void rlyeh_init_cultist(TSRMLS_D);

PHP_METHOD(Cultist, sacrifice);


Now we have to tell the extension to load this class on startup. Thus, we want to call rlyeh_init_cultist in our MINIT function and include the cultist.h header file. Open up php_rlyeh.c and add the following:

// at the top
#include "cultist.h"

// our existing MINIT function from part 3

Because we changed config.m4, we have to do phpize && ./configure && make install, not just make install, otherwise cultist.c won’t be added to the Makefile.

Now if we run var_dump(new Cultist());, we will see something like:

object(Cultist)#1 (1) {
Creating a new class instance

We can also initialize cultists from C. Let’s add a static function to create a cultist. Open cultist.c and add the following:

static function_entry cultist_methods[] = {
  PHP_ME(Cultist, sacrifice, NULL, ZEND_ACC_PUBLIC)

PHP_METHOD(Cultist, createCultist) {
   object_init_ex(return_value, rlyeh_ce_cultist);

Now we can call Cultist::createCultist() to create a new cultist.

What if creating new cultists takes some setup, so we’d like to have a constructor? Well, the constructor is just a method, so we can add that:

static function_entry cultist_methods[] = {
  PHP_ME(Cultist, sacrifice, NULL, ZEND_ACC_PUBLIC)

PHP_METHOD(Cultist, __construct) {
  // do setup

Now PHP will automatically call our Cultist::__construct when we call new Cultist. However, createCultist won’t: it’ll just set the defaults and return. We have to modify createCultist to call a PHP method from C.

Calling method-to-method

branch: m2m

First, add this enormous block to your php_rlyeh.h file:

#define PUSH_PARAM(arg) zend_vm_stack_push(arg TSRMLS_CC)
#define POP_PARAM() (void)zend_vm_stack_pop(TSRMLS_C)
#define PUSH_EO_PARAM()
#define POP_EO_PARAM()

#define CALL_METHOD_BASE(classname, name) zim_##classname##_##name

#define CALL_METHOD_HELPER(classname, name, retval, thisptr, num, param) 
  PUSH_PARAM(param); PUSH_PARAM((void*)num);                            
  CALL_METHOD_BASE(classname, name)(num, retval, NULL, thisptr, 0 TSRMLS_CC); 

#define CALL_METHOD(classname, name, retval, thisptr)                  
  CALL_METHOD_BASE(classname, name)(0, retval, NULL, thisptr, 0 TSRMLS_CC);

#define CALL_METHOD1(classname, name, retval, thisptr, param1)         
  CALL_METHOD_HELPER(classname, name, retval, thisptr, 1, param1);

#define CALL_METHOD2(classname, name, retval, thisptr, param1, param2) 
  CALL_METHOD_HELPER(classname, name, retval, thisptr, 2, param2);     

#define CALL_METHOD3(classname, name, retval, thisptr, param1, param2, param3) 
  PUSH_PARAM(param1); PUSH_PARAM(param2);                               
  CALL_METHOD_HELPER(classname, name, retval, thisptr, 3, param3);     

These macros let you call PHP functions from C.

Add the following to cultist.c:

#include "php_rlyeh.h"

PHP_METHOD(Cultist, createCultist) {
  object_init_ex(return_value, rlyeh_ce_cultist);
  CALL_METHOD(Cultist, __construct, return_value, return_value);


branch: this

We’ve pretty much just been dealing with return_values, but now that we’re working with objects we can also access this. To get this, use the getThis() macro.

For example, suppose we want to set a couple of properties in the constructor:

PHP_METHOD(Cultist, __construct) {
  char *name;
  int name_len;
  // defaults                                                                                                                                                               
  long health = 10, sanity = 4;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s|ll", &name, &name_len, &health, &sanity) == FAILURE) {

  zend_update_property_stringl(rlyeh_ce_cultist, getThis(), "name", strlen("name"), name, name_len TSRMLS_CC);
  zend_update_property_long(rlyeh_ce_cultist, getThis(), "health", strlen("health"), health TSRMLS_CC);
  zend_update_property_long(rlyeh_ce_cultist, getThis(), "sanity", strlen("sanity"), sanity TSRMLS_CC);

Note the zend_parse_parameters argument: “s|ll”. The pipe character (“|”) means, “every argument after this is optional.” Thus, at least 1 argument is required (in this case, the cultist’s name), but health and sanity are optional.

Now, if we create a new cultist, we get something like:

$ php -r 'var_dump(new Cultist("Todd"));'
object(Cultist)#1 (4) {
  string(4) "Todd"
Attaching Structs

As mentioned earlier, you can attach a struct to an object. This lets the object carry around some information that is invisible to PHP, but usable to your extension.

You have to set up the zend_class_entry in a special way when you create it. First, add the struct to your cultist.h file, as well as the extra function declaration we’ll be using:

typedef struct _cult_secrets {
    // required
    zend_object std;

    // actual struct contents
    int end_of_world;
    char *prayer;
} cult_secrets;

zend_object_value create_cult_secrets(zend_class_entry *class_type TSRMLS_DC);
void free_cult_secrets(void *object TSRMLS_DC);
// existing init function
void rlyeh_init_cultist(TSRMLS_D) {
  zend_class_entry ce;

  INIT_CLASS_ENTRY(ce, "Cultist", cultist_methods);
  // new line!
  ce.create_object = create_cult_secrets;
  rlyeh_ce_cultist = zend_register_internal_class(&ce TSRMLS_CC);

  /* fields */
  zend_declare_property_bool(rlyeh_ce_cultist, "alive", strlen("alive"), 1, ZEND_ACC_PUBLIC TSRMLS_CC);

zend_object_value create_cult_secrets(zend_class_entry *class_type TSRMLS_DC) {
  zend_object_value retval;
  cult_secrets *intern;
  zval *tmp;

  // allocate the struct we're going to use
  intern = (cult_secrets*)emalloc(sizeof(cult_secrets));
  memset(intern, 0, sizeof(cult_secrets));

  // create a table for class properties
  zend_object_std_init(&intern->std, class_type TSRMLS_CC);
     (copy_ctor_func_t) zval_add_ref,
     (void *) &tmp,
     sizeof(zval *));

  // create a destructor for this struct
  retval.handle = zend_objects_store_put(intern, (zend_objects_store_dtor_t) zend_objects_destroy_object, free_cult_secrets, NULL TSRMLS_CC);
  retval.handlers = zend_get_std_object_handlers();

  return retval;

// this will be called when a Cultist goes out of scope
void free_cult_secrets(void *object TSRMLS_DC) {
  cult_secrets *secrets = (cult_secrets*)object;
  if (secrets->prayer) {

If we want to access this, we can fetch the struct from getThis() with something like:

PHP_METHOD(Cultist, getDoomsday) {
  cult_secrets *secrets;

  secrets = (cult_secrets*)zend_object_store_get_object(getThis() TSRMLS_CC);



branch: exceptions

All exceptions must descend from the base PHP Exception class, so this is also an intro to class inheritance.

Aside from extending Exception, custom exceptions are just normal classes. So, to create a new one, open up php_rlyeh.c and add the following:

// include exceptions header

zend_class_entry *rlyeh_ce_exception;

void rlyeh_init_exception(TSRMLS_D) {
  zend_class_entry e;

  INIT_CLASS_ENTRY(e, "MadnessException", NULL);
  rlyeh_ce_exception = zend_register_internal_class_ex(&e, (zend_class_entry*)zend_exception_get_default(TSRMLS_C), NULL TSRMLS_CC);


Don’t forget to declare rlyeh_init_exception in php_rlyeh.h.

Note that we could add our own methods to MadnessException with the third argument to INIT_CLASS_ENTRY, but we’ll just leave it with the default exception methods it inherits from Exception.

Throwing Exceptions

An exception isn’t much good unless we can throw it. Let’s add a method that can throw it:

zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  PHP_FE(lookAtMonster, NULL)

PHP_FUNCTION(lookAtMonster) {
  zend_throw_exception(rlyeh_ce_exception, "looked at the monster too long", 1000 TSRMLS_CC);

The 1000 is the exception code, you can set that to whatever you want (users can access it from the exception with the getCode() method).

Now, if we compile and install, we can run lookAtMonster() and we’ll get:

Fatal error: Uncaught exception 'MadnessException' with message 'looked at the monster too long' in Command line code:1
Stack trace:
#0 Command line code(1): lookAtMonster()
#1 {main}
  thrown in Command line code on line 1

Congratulations, now you’ve stared into the abyss!

This tutorial is an ongoing work. I hope you’ve enjoyed it and please comment below if you think I’ve missed any important topics or anything is unclear.

PHP Extensions Made Eldrich: PHP Variables

This is section 3 of a 4-part introduction to PHP extensions:

  1. Setting Up PHP – compiling PHP for extension development
  2. Hello, world! – your first extension
  3. Working with the API – the PHP C API
  4. Classes – creating PHP objects in C

This section is, unfortunately, longer than all of the other sections combined. The upshot is that this section covers 90% of the functions you’ll use when creating extensions.

Using Variables

In the previous sections, we got PHP set up and created our first extension. In this section, we’ll look at how to use more of the PHP API.

Working with input

branch: zend_parse_parameters

Our existing extension is nice, but it isn’t very interactive. We can modify this function to accept variables as arguments using the zend_parse_parameters function:

PHP_FUNCTION(cthulhu) {
    // boolean type
    zend_bool english = 0;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "b", &english) == FAILURE) {

    if (english) {
        php_printf("In his house at R'lyeh dead Cthulhu waits dreaming.n");
    else {
        php_printf("Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.n");

Try re-compiling the extension and calling cthulhu(true); and cthulhu(false);.

If you try calling cthulhu(); (no arguments), you’ll notice that zend_parse_parameters takes care of warning you about it:

$ php -r 'cthulhu();'

Warning: cthulhu() expects exactly 1 parameter, 0 given in Command line code on line 1
A note on return values

zend_parse_parameters and many other PHP API function return SUCCESS or FAILURE, which are int values. Irritatingly, SUCCESS is 0 (false in C) and FAILURE is -1 (true in C)! So, you generally can’t say if (some_php_api_func()), you have to say if (some_php_api_func() == SUCCESS).

zend_parse_parameters input

The parameters passed to zend_parse_parameters are:

The number of arguments passed in (you can hard-code this, but using ZEND_NUM_ARGS() will automatically grab that info for you).
You’ll see this magic variable all over the place in PHP extensions. It’s a macro that defines “, <thread_info>” (or “” if threading is disabled). Note that, because it includes a comma, there’s no comma between ZEND_NUM_ARGS() and TSRMLS_CC. You don’t have to worry about it or do anything with it, just pass it around.
This is a string describing the arguments you expect. Common values are:

  • “b”: boolean, expects zend_bool.
  • “s”: string, expects char* and int.
  • “l”: long, expects long.
  • “d”: double, expects double.
  • “a”: array, expects zval*.
  • “o”: object, expects zval*.
  • “z”: any type, expects zval*.

Except for “b”, “l”, and “d”, zend_parse_parameters does not create a copy of the parameter, it just returns the address. Thus, you generally shouldn’t free this memory, as the calling function “owns” it.

The options listed above can be combined. For example, suppose we had a function that took a number of times to append a given string to a given array. We’d expect it to look something like:

  int str_len;
  long num;
  char *str;
  zval *arr;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "lsa", &num, &str, &str_len, &arr) == FAILURE) {

  /* function body */

Note that you always pass in the address of the variable, not the variable itself.

A list of addresses to use to store passed-in values. zend_bool is just a typedefed numeric type to represent booleans.

Note that you must always use “long” for integers (not int). The long type is a different size on 32-bit and 64-bit machines (except on Windows!), so you’ll get weird segfaults if you use another numeric type on certain platforms.


branch: types

You may have noticed above that arrays, objects, and any type are all returned a zvals by zend_parse_parameters. This is because every PHP variable is, under the covers, a C struct called a zval. For example, if you say $x = "foo"; $y = 123; $z = array();, then $x, $y, and $z are all zvals.

If you want to be able to communicate information from C to PHP, it’s important to understand how to work with zvals. A zval is defined as:

struct _zval_struct {
    zvalue_value value;
    zend_uint refcount__gc;
    zend_uchar type;
    zend_uchar is_ref__gc;

The main components of this struct are:

The actual value of the variable. This is defined as a union of:

typedef union _zvalue_value {
    long lval;
    double dval;
    struct {
        char *val;
        int len;
    } str;
    HashTable *ht;
    zend_object_value obj;
} zvalue_value;

These field correspond to the following types:

  • lval: Longs and booleans
  • dval: Doubles
  • str: Strings
  • ht: Arrays and associative arrays
  • obj: Objects
A reference count for garbage collection. When you (or PHP) asks for a zval to be destroyed, the zval destructor decrements the refcount and checks if it is 0. If the refcount is greater than 0, it will just decremented the refcount and return. Once the refcount is 0, the zval will actually be freed.
The type of this zval. This tells PHP which union element to look for and what to do when it finds it. There are human-readable macros for each type:

#define IS_NULL	          0
#define IS_LONG	          1
#define IS_DOUBLE         2
#define IS_BOOL	          3
#define IS_ARRAY          4
#define IS_OBJECT         5
#define IS_STRING         6
#define IS_RESOURCE       7
#define IS_CONSTANT       8

This tutorial will only cover working with types 0-6. I’ve found that, with object-oriented PHP, the last three are less useful.

Resources are a way of binding C structs to PHP variables (e.g., passing a database connection around with the defunct mysql extension), but objects provide a nicer way of doing struct attachment. The Developer Zone tutorial goes into quite a lot of detail on resources, if you’re interested.

The field type tells PHP which union element to look at for the zval’s value. For example, if zval_p->type == IS_STRING, the value for the zval should be in the zval_p->value.str field.

This field also determines how PHP interprets the value field. For example, lval does double duty for longs and booleans. So, if you have lval set to 1 and zval_p->type==IS_LONG, it will be displayed as 1. If you have zval_p->type==IS_BOOL, it will be displayed as true.

// add function entries for each function you define
zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  PHP_FE(makeBool, NULL)
  PHP_FE(makeLong, NULL)

PHP_FUNCTION(makeBool) {
    Z_TYPE_P(return_value) = IS_BOOL;
    Z_LVAL_P(return_value) = 1;

PHP_FUNCTION(makeLong) {
    Z_TYPE_P(return_value) = IS_LONG;
    Z_LVAL_P(return_value) = 1;

// don't forget to add declarations for these functions to your header file, too!

return_value is passed into PHP_FUNCTIONs and holds the value that is returned (it defaults to null).

If you compile this new code and run:


You’ll see something like:

If this is a PHP reference.

Accessing the Contents of a Zval

The internals of a zval are subject to change, so you should always PHP’s zval macros instead of touching the gooey innards (e.g., don’t actually set a value by putting zval_p->value.lval in your code).

As shown in the example above, you can safely manipulate said innards through these macros:

zval *zval_p;

long l        = Z_LVAL_P(zval_p)
zend_bool b   = Z_BVAL_P(zval_p)
double d      = Z_DVAL_P(zval_p)
char *str     = Z_STRVAL_P(zval_p)
int str_len   = Z_STRLEN_P(zval_p)
HashTable *ht = Z_ARRVAL_P(zval_p)

// objects are a bit complicated... suffice to know that these exist:
Z_OBJ_HANDLER_P(zval_p, h)

As you can see, you can extract each part of a zval’s value using a macro. The ones listed above work on zval pointers (zval*s). If you are using zval or zval**, there are analogous helpers with one fewer or one more P, respectively. For example:

long get_long(zval z) {
  return Z_LVAL(z);

// or 

long get_long(zval **zval_pp) {
  return Z_LVAL_PP(zval_pp);

Creating Zvals

Before using a zval, you must make sure that its refcount, type, and value are set correctly. For scalar types, you can set value and type using a single macro: ZVAL_type.

ZVAL_BOOL(zval_p, 0);
ZVAL_LONG(zval_p, 123);
ZVAL_DOUBLE(zval_p, 12.3);

For strings, it is a little trickier because you have to allocate space for the string or let PHP know that you’ve already allocated space for it.

Thus, ZVAL_STRING takes an argument that tells PHP whether or not to make a copy of the string for the zval. Basically, this should be 0 if you’ve already created a special instance of the string for this zval and 1 if you haven’t.

// "bar" is on the stack and zval_p is on the heap, so we 
// want to make a copy of "bar" on the heap
ZVAL_STRING(zval_p, "bar", 1);
// this means "copy" ------^

// copy "bar" to the heap
char *str = estrdup("bar");
ZVAL_STRING(zval_p, str, 0);
// "don't copy" ---------^

Which brings us to the next section, memory management.

Memory Management

branch: mm

PHP uses its own memory pool and allocation/deallocation functions, which you should generally use instead of malloc, free, and friends.

PHP has similar functions to the standard C library, only everything is prefixed with an “e”:

void* emalloc(size_t size);
void* ecalloc(size_t size);
void* erealloc(size_t size);

void efree(void* ptr);

char* estrdup(char* str);
char* estrndup(char* str, int len);

If you are used to C programming where you check if memory was successfully allocated (x = malloc(sizeof(x)); if (!x) return 0;), know that this is not strictly necessary in PHP. PHP’s memory management functions will exit PHP if you run out of memory, so if emalloc returns, it returned some memory.

Remember how you compiled PHP with --enable-maintainer-zts at the beginning? Well, here’s the payoff: it will let you know about any memory leaks it detects. For example, try adding a function to your extension:

// add to function_entry table and header file, too

Now, if you recompile your extension and run leak(), you’ll see:

[Wed Aug 10 16:34:42 2011]  Script:  '-'
/Users/k/php-5.3.6/Zend/zend_builtin_functions.c(1360) :  Freeing 0x100AA75E0 (3 bytes), script=-
=== Total 1 memory leaks detected ===

This can make tracking down memory leaks much easier. (Getting friendly with valgrind is a good idea, too.)

Creating and Destroying Zvals

Zvals can be created using emalloc, but I’d recommend generally using a different macro: MAKE_STD_ZVAL. This macro not only allocates a zval, but it also sets the refcount and isref fields, so you don’t have to worry about setting those yourself.

zval *zval_p;

If you need to destroy a zval, use zval_ptr_dtor, which takes a zval** (not a zval*).

zval *zval_p;

// back to square one

zval_ptr_dtor decrements the refcount by 1. If the refcount is still greater than 0, then zval_ptr_dtor will just return. If this makes the refcount 0, it also destroys the current zval. If this zval is a string, array, or object, PHP will take care of freeing the associated memory. Thus, you should generally not call free on a zval (as this will cause leaks: orphaned strings or objects with no zval pointing to them).

Also, you should always make sure that you have set the zval to the correct type before calling zval_ptr_dtor: if you call it on garbage, it can segfault if it tries to free, say, an string that was actually an invalid pointer.

The Persistence of Memory

branch: persistence

Theoretically, all memory allocated with emalloc is freed after each request (I say theoretically because in my experience, it’s not so much freed as leaked). If you want something to hang around for longer than a single request, you’ll need to use persistent memory. Persistent memory hangs around for longer than one request (generally), up to the lifetime of the PHP process.

To allocate persistent memory, use “pe”-prefixed memory allocation functions, instead of “e”-prefixed.

void* pemalloc(size_t size, int persistent);
void* pecalloc(size_t size, int persistent);
void* perealloc(size_t size, int persistent);

void pefree(void* ptr, int persistent);

char* pestrdup(char* str, int persistent);
char* pestrndup(char* str, int len, int persistent);

The “persistent” option lets you choose whether you want to allocate persistent memory (1) or transitory memory (0, normal “e”-allocation behavior).

Search and Destroy: Finding and Cleaning Up Persistent Memory

Suppose your extension allocates a persistent struct in one HTTP request. How do you find it during the next HTTP request?

There are three steps:

  1. You have to create a type for this memory.
  2. You have to link this type to a destructor, so that PHP knows how to clean up the memory.
  3. You have to insert your allocated memory into PHP’s persistent memory hash.
Persistent Gods

To try out persistent memory, we want a struct that should persist for multiple requests. Great Old Ones are pretty darn persistent, so we’ll create an old_one struct in php_rlyeh.h:

typedef struct _old_one {
    char *name;
    int worshippers;
} old_one;

Now we need to creating a type for it. Near the beginning of php_rlyheh.c, add an int, named anything you want. This integer will hold the numeric type for Great Old Ones.

// traditionally these start with "le_", which stands 
// for "list entry"
int le_old_one;

Now we need to link the le_old_one type up to a destructor. We’ll do this when our module is first loaded, in the magical PHP_MINIT_FUNCTION(rlyeh) function:

// add MINIT to the module description:
zend_module_entry rlyeh_module_entry = {

// add this to php_rlyeh.h
    le_old_one = zend_register_list_destructors_ex(NULL, rlyeh_old_one_pefree, "Great Old One", module_number);

Also, add a line to php_rlyeh.h:


zend_register_list_destructors_ex says, “make a new type for le_old_one. If you have to automatically free something of this type, call rlyeh_old_one_pefree on it.”

Persistent destructors always take a zend_rsrc_list_entry: this is the container PHP holds list entries (which is how we’re storing persistent memory). So, our destructor would look like:

void rlyeh_old_one_pefree(zend_rsrc_list_entry *rsrc TSRMLS_DC) {
    old_one *god = rsrc->ptr;

    // free the char* field, if set
    if (god->name) {
        pefree(god->name, 1);

    pefree(god, 1);

Now we are ready to create some Great Old Ones!

Let’s make a new function: getYig(). If there’s already been an old_one created, it’ll return information about it, otherwise it’ll create a new one.

    zend_rsrc_list_entry *le;
    char *key = "yig";

    if (zend_hash_find(&EG(persistent_list), key, strlen(key)+1, (void**)&le) == FAILURE) {
        // need to create a new god
        zend_rsrc_list_entry nle;
        old_one *yig;

        yig = (old_one*)pemalloc(sizeof(old_one), 1);
        yig->name = pestrdup("Yig", 1);
        yig->worshippers = 4;

        php_printf("creating a new godn");

        nle.ptr = yig;
        nle.type = le_old_one;
        nle.refcount = 1;

        zend_hash_update(&EG(persistent_list), key, strlen(key)+1, (void*)&nle, sizeof(zend_rsrc_list_entry), NULL);
    else {
        old_one *god = le->ptr;

        php_printf("fetched %s: %d worshippersn", god->name, god->worshippers);

Note that zend_hash_update and zend_hash_find take the key length + 1. The PHP API is a bit inconsistent about this: the best way to figure out if a function takes length or length+1 is to look at the source or find an example of it being used in another extension.

If you have a web server set up, add your extension to the php.ini it’s using (warning: this is probably a different php.ini than the command-line client uses). Restart it and load a page that calls getYig() a couple of times. The first time you’ll see “creating”, the next times you’ll see “fetched…”.

In the code above, you may notice that we use hash functions (zend_hash_find and zend_hash_add) to manipulate the EG(persistent_list). EG(persistent_list) is actually a HashTable that you can use to store persistent memory. However, the name reveals something that I find interesting about PHP internals: all HashTables (associative arrays) are lists, too (they keep the elements in order and you can access elements by index or key).

And speaking of hashes and lists…


Creating Arrays

branch: array

To create an array or associative array, use array_init().

You can insert new elements to an associative array with one of these functions:

add_assoc_long(zval *zval_p, char *key, long n)
add_assoc_null(zval *zval_p, char *key)
add_assoc_bool(zval *zval_p, char *key, zend_bool b) 
add_assoc_double(zval *zval_p, char *key, double d) 
add_assoc_string(zval *zval_p, char *key, char *str, int duplicate) 
add_assoc_stringl(zval *zval_p, char *key, char *str, int length, int duplicate)
add_assoc_zval(zval *zval_p, char *key, zval *value) 

You can also “push” new elements to the array with related functions:

add_next_index_long(zval *zval_p, long n);
add_next_index_null(zval *zval_p);
add_next_index_bool(zval *zval_p, int b);
add_next_index_resource(zval *zval_p, int r);
add_next_index_double(zval *zval_p, double d);
add_next_index_string(zval *zval_p, const char *str, int duplicate);
add_next_index_stringl(zval *zval_, const char *str, uint length, int duplicate);
add_next_index_zval(zval *zval_p, zval *value);

Let’s use this to fill in the function we started in the zend_parse_parameters function:

  int str_len, i;
  long num;
  char *str;
  zval *arr;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "lsa", &num, &str, &str_len, &arr) == FAILURE) {

  // sanity check
  if (num  100) {

  for (i=0; i<num; i++) {
    add_next_index_stringl(arr, str, str_len, 1);

If we run this function, we can see that appends strings to the array correctly.

This should output:


Accessing Array Elements

To find an element in an associative array, use one of the zend_hash functions.

PHP_FUNCTION(findMonster) {
  int monster_len;
  char *monster;
  zval *list, **desc;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sa", &monster, &monster_len, &list) == FAILURE) {

  if (zend_hash_find(Z_ARRVAL_P(list), monster, monster_len+1, (void**)&desc) == FAILURE) {


Note that the fourth argument is the pointer to a pointer to a pointer. I was skeptical about that for a while, but there it is.

Also, we are using a couple of new macros for setting the return value. These RETURN_type macros just set the return_value we were manipulating directly earlier.

Now, if we run something like:

 "The Toad God",
           "Yig" => "Father of Serpents",
           "Ythogtha" => "The Thing in the Pit");

var_dump(findMonster("Yig", $a));


We’ll get “Father of Serpents”.

There are also a couple other hash functions you’ll probably find useful for your code:

int zend_hash_find(const HashTable *ht, const char *arKey, uint nKeyLength, void **pData);
int zend_hash_add(const HashTable *ht, const char *arKey, uint nKeyLength, void *pData, int pDataSize, void **pDest);
int zend_hash_add(const HashTable *ht, const char *arKey, uint nKeyLength, void *pData, int pDataSize, void **pDest);
int zend_hash_num_elements(const HashTable *ht);
int zend_hash_exists(const HashTable *ht, const char *arKey, uint nKeyLength);

Note that these functions do not add references to array elements. Thus, you should not, for example, do zend_hash_find and then call zval_ptr_dtor on the element found or your array will be in a weird half-freed state and PHP will try to double-free the element when the array is properly destroyed. Therefore, if you want to use an array element outside of the context of the array, you should add a reference to it, first. (We avoid that in the situation above by returning duplicates of the element’s string value.)

Iterating Through Arrays

You can iterate through an array, element by element, but it ain’t pretty. Here’s the standard for-loop you need:

HashTable *hindex = Z_ARRVAL_P(zval_p);
HashPosition pointer;
zval **data;

for(zend_hash_internal_pointer_reset_ex(hindex, &pointer);
    zend_hash_get_current_data_ex(hindex, (void**)&data, &pointer) == SUCCESS;
    zend_hash_move_forward_ex(hindex, &pointer)) {

  char *key;
  uint key_len, key_type;
  ulong index;

  key_type = zend_hash_get_current_key_ex(hindex, &key, &key_len, &index, 0, &pointer);

  switch (key_type) {
    // associative array keys
    php_printf("key: %sn", key);
    // numeric indexes
    php_printf("index: %dn", index);

Now let’s never speak of it again.

Instead, let’s move on to objects!

PHP Extensions Made Eldrich: Hello, World!

This is part 2 of a 4-part tutorial on writing PHP extensions:

  1. Setting Up PHP – compiling PHP for extension development
  2. Hello, world! – your first extension
  3. Working with the API – the PHP C API
  4. Classes – creating PHP objects in C

First we need to think of a name for our extension. I’ve been reading some H.P. Lovecraft, so let’s call it “rlyeh”.

For our first extension, we’ll create a new function, cthulhu(). When we call cthulhu() (tee hee), PHP will print “In his house at R’lyeh dead Cthulhu waits dreaming.”

Cheat Sheet

If you don’t want to copy/paste all of the code, you can clone the Github repo for this tutorial and check out sections as you go.

$ git clone git://github.com/kchodorow/rlyeh.git

This part of the tutorial (Hello, world!) is the master branch. Stating in part 3, each “unit” has a branch: <branchname> at the beginning of the section. You can checkout this branch if you want to see the code example in context.

For example, if you see branch: oop, you’d do:

$ git checkout -b oop origin/oop

Then you can compare what you’re doing to the “ideal” example code.

Setting Up

Create a directory for your PHP extension, named “rlyeh”. This is where all of the source code for your extension will live.

$ mkdir rlyeh
$ cd rlyeh

A PHP extension consists of at least three files:

  1. “config.m4”, which contains compilation instructions for PHP
  2. “php_extname.c”: source code
  3. “php_extname.h”: a header file

Creating a config.m4 file is wholly lacking in interest, so just cut/paste the one below.

dnl lines starting with "dnl" are comments

PHP_ARG_ENABLE(rlyeh, whether to enable Rlyeh extension, [  --enable-rlyeh   Enable Rlyeh extension])

if test "$PHP_RLYEH" != "no"; then

  dnl this defines the extension
  PHP_NEW_EXTENSION(rlyeh, php_rlyeh.c, $ext_shared)

  dnl this is boilerplate to make the extension work on OS X
  case $build_os in
    AC_MSG_CHECKING([whether to compile for recent osx architectures])
    CFLAGS="$CFLAGS -arch i386 -arch x86_64 -mmacosx-version-min=10.5"
    AC_MSG_CHECKING([whether to compile for every osx architecture ever])
    CFLAGS="$CFLAGS -arch i386 -arch x86_64 -arch ppc -arch ppc64"


If you want to call your extension something else, global replace “rlyeh” with your extension’s name.

Now for the actual extension: create a file called php_rlyeh.c with the following content:

// include PHP API

// header file we'll create below
#include "php_rlyeh.h"

// define the function(s) we want to add
zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)

// "rlyeh_functions" refers to the struct defined above
// we'll be filling in more of this later: you can use this to specify
// globals, php.ini info, startup and teardown functions, etc.
zend_module_entry rlyeh_module_entry = {

// install module

// actual non-template code!
PHP_FUNCTION(cthulhu) {
    // php_printf is PHP's version of printf, it's essentially "echo" from C
    php_printf("In his house at R'lyeh dead Cthulhu waits dreaming.n");

That’s a whole lotta template, but it’ll make more sense as you go along.

Learning PHP extension programming is sort of like learning Java as your first programming language: “type ‘public static void main’.” “Why? What does that even mean?” “It doesn’t matter, you’ll learn about it later.”

You also have to make a header file, to declare the cthulhu function as well as the two extension info macros used in php_rlyeh.c (PHP_RLYEH_EXTNAME and PHP_RLYEH_VERSION).

Create a new file, php_rlyeh.h, and add a couple of lines:

#define PHP_RLYEH_EXTNAME "rlyeh"
#define PHP_RLYEH_VERSION "0.01"


You can change the version whenever you do a new release. It can be any string. It’s displayed when you do:

$ php --ri rlyeh

(once the extension is installed).

Speaking of, now all that’s left is to compile and install. Make sure that your custom-compiled-PHP is first in your PATH. If it isn’t, put it there before doing the rest of the install.

$ echo $PATH
$ phpize
Configuring for:
PHP Api Version:         20090626
Zend Module Api No:      20090626
Zend Extension Api No:   220090626
$ ./configure
# lots of checks...
$ make
# compile...

Build complete.
Don't forget to run 'make test'.

$ make install
Installing shared extensions:     $PHPDIR/install-debug-zts/lib/php/extensions/debug-zts-20090626/

Now, add your extension to your php.ini file. PHP is probably expecting a php.ini file in the lib/ subdirectory of your install directory ($PHPDIR/install-debug-zts/lib/php.ini). It probably doesn’t exist yet, so create a new php.ini file with one line:


Now you should be able to use your function from PHP without importing, loading, or requiring anything. Do:

$ php -r 'cthulhu();'
In his house at R'lyeh dead Cthulhu waits dreaming.

Your first PHP extension is working!

Next up: a deep dive into the PHP API.

PHP Extensions Made Eldrich: Installing PHP

A PHP extension allows you to connect almost any C/C++ code you want to PHP. This is a 4-part tutorial on how to write an extension:

  1. Setting Up PHP – compiling PHP for extension development
  2. Hello, world! – your first extension
  3. Working with the API – the PHP C API
  4. Classes – creating PHP objects in C

Almost all of the code examples in this tutorial are available on Github.

Zend Developer Zone has an excellent tutorial on writing PHP extensions. I wrote this tutorial because the DevZone article is getting a little old: it doesn’t cover objects, methods, or exceptions and it uses Zend API artifacts dating back to PHP 3. However, it is still an excellent tutorial and I highly recommend reading through it if you’re interested in writing PHP extensions.

Setting Up PHP

Before you start developing an extension, you should compile PHP from source (it’ll make debugging easier later on). If you hate future-you, though, you can just do which phpize and if it returns something, you’re can continue to the next section.

Compiling PHP yourself isn’t too scary (unless your on Windows, in which case welcome to Hell). First, download the source for version you want to develop with. The current stable release is a good choice.

Unpack the tarball and change to the PHP source directory:

$ tar jxvf php-5.3.6.tar.bz2
$ cd php-5.3.6/
$ PHPDIR=`pwd` # setting this up so I can refer to $PHPDIR later

Note: this tutorial assumes that you’re using version 5.3.*. The API changes every version, so if you’re not using 5.3, this tutorial is going to be very frustrating.

To install PHP, run:

$ mkdir install-debug-zts # install dir
$ ./configure --enable-debug --enable-maintainer-zts --prefix=$PHPDIR/install-debug-zts
$ make install

I recommend using a custom install prefix ($PHPDIR/install-debug-zts in the example above), to keep it separate from any PHP you might have installed previously.

If you install multiple versions of PHP to the default location (/usr/local), it’ll get really annoying really fast: you always have to re-install if you want to try a different build, package managers often install PHP in /usr (so you may have two PHP installs floating around), and the PHP install is oddly coy about overwriting existing files: sometimes it decides to just leave the old versions there if a file already exists.

Thus, it pays to keep things organized in custom installation folders.

There are a couple of configuration options that you should enable, too, for extension development: –enable-debug (debugging info) and –enable-maintainer-zts (thread stuff and memory tracking).

Once make install is done, you’ve got PHP installed! Add $PHPDIR/install-debug-zts/bin to your path with:

$ # this will only add it to the path for this shell
$ PATH=$PHPDIR/install-debug-zts/bin:$PATH

Now you’re ready to make an extension.

Next up: writing your first extension.


Since MongoDB was first created, the Mongo shell prompt has just been:


A couple of months ago, my prompt suddenly changed to:


It’s nice to have more information the prompt, but 1) I don’t care about the replica set name and 2) a programmer’s prompt is very personal. Having it change out from under you is like coming home and finding that someone replaced all of your underwear. It’s just disconcerting.

Anyway, I recently got an intern (well, I’m mentoring him, it’s not like I bought him), Matt Dannenberg, who’s interested in working on shell stuff. He committed some code last week that lets you customize the shell’s prompt (it will be in 1.9.1+).

Basically, you define a prompt() function, and then it’ll be executed every time the shell is displayed. Immediately, I did:

myReplSetName:SECONDARY> prompt = "> "
> // ah, bliss
> // some sysadmins think > is a weird prompt, as it's also 
> // used for redirections, so they might prefer $
> prompt = "$ "
$ // there we go

Okay, that’s much better. But there is some information I’d like to add to my prompt.

I often forget which database db is referring to, and then I have to type db to check (which is especially annoying when I’m in the middle of a multi-line script). So, let’s just add the current database name to the prompt.

> prompt = function() { return db+"> "; }
test> use foo
foo> use bar

The prompt no longer shows whether I’m connected to a PRIMARY or a SECONDARY (or something else), which is useful information to have. I hate that long string, though, so let’s neaten it up. I want it to be:

I’m connected to the primary
I’m connected to a secondary
I’m connected to a server with state STATE.

This might look something like:

> prompt = function() { 
... result = db.isMaster();
... if (result.ismaster) {
...     return db+"> "; 
... }
... else if (result.secondary) {
...    return "("+db+")> ";
... }
... result = db.adminCommand({replSetGetStatus : 1})
... return states[result.myState]+":"+db+"> ";
... }

Also, the default prompt displays if you’re connected to a mongos (with mongos>), which is good for keeping track when you’re running a cluster:

> prompt = function() {
... result = db.adminCommand({isdbgrid : 1});
... if (result.ok == 1) {
...     return "mongos> ";
... }
... return "> ";
... }

Another nice thing would be to have the time each time it displays the prompt: then you can kick off a long-running job, go to lunch, and know what time it finished when you get back.

> prompt = function() { 
... var now = new Date(); 
... return now.getHours()+":"+now.getMinutes()+":"+now.getSeconds()+"> ";
... }
10:30:45> db.foo.count()

Defining prompt() as shown above is nice for playing around, but it’s a pain to define your prompt every time you start up the shell. So, you can add it to a function and then either load it on startup (a command line argument) or from the shell itself:

$ # load from command line arg:
$ mongo shellConfig.js
MongoDB shell version 1.9.1-
connecting to: test
> // load from the shell itself
> load("/path/to/my/shellConfig.js")

Or, you can use another feature my intern has implemented: mongo will automatically look for (and, if it finds, load) a .mongorc.js file from your home directory on startup.

// my startup file

prompt = /* ... */

// getting "not master and slaveok=false" errors drives me nuts,
// so I'm overriding the getDB() code to ALWAYS set slaveok=true
Mongo.prototype.getDB = function(name) {
    return new DB(this, name);

/* and so on... */
Actually, 10gen employees would never have an intern make coffee, as they might mess it up: we have at least five different brewers, two grinders, two pages of close-typed instructions on how to grind/brew, and an RFC on coffee-making protocol.

Keep in mind that the is a “proper” JS file, you can’t use magic Mongo shell helpers, like use <dbname> (instead, use db.getSisterDB("<dbname>")). If you don’t want .mongorc.js loaded on startup, start the shell with –norc.

Hopefully these things will make life a little easier for people.

Both of these changes are in master and will be in 1.9.1+. They will not be backported to the 1.8 branch. You can use the 1.9 shell with the 1.8 database server, though, if you want to use these features with a production database.

Mongo in Flatland

MongoDB’s geospatial indexing lets you use a collection as a map. It works differently than “normal” indexing, but there’s actually a nice, visual way to see what geospatial indexing does.

Let’s say we have a 16×16 map; something that looks like this:

All of the coordinates in our map (as described above) are somewhere between [0,0] and [16,16], so I’m going to make the min value 0 and the max value 16.

db.map.ensureIndex({point : "2d"}, {min : 0, max : 16, bits : 4})

This essentially turns our collection into a map. (Don’t worry about bits, for now, I’ll explain that below.)

Let’s say we have something at the point [4,6]. MongoDB generates a geohash of this point, which describes the point in a way that makes it easier to find things near it (and still be able to distribute the map across multiple servers). The geohash for this point is a string of bits describing the position of [4,6]. We can find the geohash of this point by dividing our map up into quadrants and determining which quadrant it is in. So, first we divide the map into 4 parts:

This is the trickiest part: each quadrant can be described by two bits, as shown in the table below:

01 11
00 10

[4,6] is in the lower-left quadrant, which matches 00 in the table above. Thus, its geohash starts with 00.

Geohash so far: 00

Now we divide that quadrant again:

[4,6] is now in the upper-right quadrant, so the next two bits in the geohash are 11. Note that the bottom and left edges are included in the quadrant, the top and right edges are excluded.

Geohash so far: 0011

Now we divide that quadrant again:

[4,6] is now in the upper-left quadrant, so the next two bits in the geohash are 01.

Geohash so far: 001101

Now we divide that quadrant again:

[4,6] is now in the lower-left quadrant, so the next two bits in the geohash are 00.

Geohash so far: 00110100

You may wonder: how far do we keep dividing? That’s exactly what the bits setting is for. We set it to 4 when we created the index, so we divide into quadrants 4 times. If we wanted higher precision, we could set bits to something higher.

You can check your math above by using the geoNear command, which returns the geohash for the point you’re search near:

> db.runCommand({geoNear : "map", near : [4,6]})
	"ns" : "test.map",
	"near" : "00110100",
	"results" : [ ],
	"stats" : {
		"time" : 0,
		"btreelocs" : 0,
		"nscanned" : 0,
		"objectsLoaded" : 0,
		"avgDistance" : NaN,
		"maxDistance" : -1
	"ok" : 1

As you can see, the “near” field contains exactly the geohash we’d expect from our calculations.

The interesting thing about geohashing is that this makes it easy to figure out what’s near us, because things are sorted according to their position on the map: every document with a point geohash starting with 00 is in the lower-left quadrant, every point starting with 00111111 is very near the middle, but in the lower-left quadrant. Thus, you can eyeball where a point is by looking at its geohash.

Bits and Precision

Let’s say a wizard casts a ring of fire around him with a radius of 2. Is the point [4,6] caught in that ring of fire?

It’s pretty obvious from the picture that it isn’t, but if we look at the geohash, we actually can’t tell: [4,6] hashes to 00110100, but so does [4.9, 6.9], and any other value in the square between [4,6] and [5,7]. So, in order to figure out whether the point is within the circle, MongoDB must go to the document and look at the actual value in the point field. Thus, setting bits to 4 is a bit low for the data we’re using/queries we’re doing.

Generally you shouldn’t bother setting bits, I’ve only set it above for purposes of demonstration. bits defaults to 26, which gives you approximately 1 foot resolution using latitude and longitude. The higher the number of bits, the slower geohashing gets (conversely, lower bit values mean faster geohashing, but more accessing documents on lookup). If you’re doing particularly high or low resolution queries, you might want to play around with different values of bits (in dev, on a representative data set) and see if you get better performance.

Thanks to Greg Studer, who gave a geospatial tech talk last Friday and inspired this post. (Every Friday, a 10gen engineer does a tech talk on something they’re working on, which is a really nice way to keep up with all of the cool stuff coworkers are doing. If you’re ever running an engineering department, I highly recommend them!)