MapReduce – The Fanfiction

MapReduce is really cool, useful, and powerful, but a lot of people find it hard to wrap their heads around. This post is a fairly silly, non-technical explanation using Star Trek.

The Enterprise found a new planet, as it tends to do.

Kirk wanted to beam down immediately and start surveying the planet but Spock told him to wait a moment. “It usually takes us one hour to survey a planet, correct Captain?  In less than 5 minutes, I can calculate whether the chance of encountering friendly alien females outweighs the risk of attack by brain-eating monsters.”

“Interesting idea, Spock,” said Kirk.  “Go ahead.”

The Data

“Logically,” thought Spock, “if we can survey a whole planet in one hour, we can survey 1/16th of a planet in 3.75 minutes.”  Spock divided the planet into 16 equal-size pieces and summoned 16 red shirts.

“You’ll be beamed down to the surface of the planet with this special data collection device called an ’emitter.’  If you see a brain-eating monster, you press the “brain-eating monster” button on your emitter.  If you see an attractive female alien, you press the “hot alien chick” button.  Press either, neither, or both buttons, as your situation requires.”

The Map Step

The 16 red shirts were beamed down to the 16 parts of the planet.  As they found things, they would press the buttons on their emitter.

Back on the Enterprise, Spock started getting lots of data pairs that looked like:

| type                 | location |
|----------------------|----------|
| Brain-eating monster | 2        |
| Hot alien chick      | 7        |
| Brain-eating monster | 14       |
| Brain-eating monster | 7        |

The Reduce Step

“Computer,” Spock said.  “Initialize a counter to 0 for each new type you get.  Then, for every subsequent data pair with the same type, increment that counter.”

“I dinnae understand,” said Scotty.  “What’s that, then?”

“I basically told the computer to initialize two variables, ‘Brain-eating monster’ and ‘Hot alien chick’, setting them both to zero.  Every time the computer gets a ‘Brain-eating monster’ emit, it increments the ‘Brain-eating monster’ variable.  Every time it gets a ‘Hot alien chick’ emit, it increments the ‘Hot alien chick’ variable.

“Ah, I see,” said Scotty.  “But don’t you lose the location information?”

“Yes,” replied Spock.  “But I don’t actually care about location for this readout.  If I wanted the location, I could give the computer a slightly more complicated algorithm, but right now I just want the count.”

The Result

After 3.75 minutes, Spock beamed up the red shirts who were still alive and presented to Kirk: “There are brain-eating monsters on 7/8ths of the planet, Captain.  1/16 of the planet has hot alien chicks.”

“Excellent work Spock,” Kirk says.  “Let’s boldly go somewhere else.”

And so they did.

Captain’s log, star date 1419.7 (aka a summary of what we did)

  1. Goal – To generate a report on a planet.
  2. Data – 16 pieces of land with various attributes. Each piece of land could be represented by a JSON object such as:
    {
        "location" : 5
        "contains" : ["Brain-eating monsters", "rocks", "poison gas"]
    }
  3. Map – Send attributes for each piece of data back to the processor. In JSON, each emit would look something like:
    {
        "Brain-eating monsters" : 5
    }
  4. Reduce – Sum up the data, grouping by type
  5. Result – How much of each attribute is on the planet

Further reading: Kyle Banker has an excellent (and more technical) explanation of MapReduce.

27 thoughts on “MapReduce – The Fanfiction

  1. This is easily the most accessible explanation of Map/Reduce I’ve ever seen! Finally, I have something to which I can direct all of my treky-but-non-map-reduce-savvy friends.

    Like

  2. This is easily the most accessible explanation of Map/Reduce I’ve ever seen! Finally, I have something to which I can direct all of my treky-but-non-map-reduce-savvy friends.

    Like

  3. …much better than – “Why MapReduce is better than my cat at doing dishes!”

    This really is an excellent explanation of MR as a technique. great job!

    Like

  4. …much better than – “Why MapReduce is better than my cat at doing dishes!”

    This really is an excellent explanation of MR as a technique. great job!

    Like

  5. Pingback: adult dating
  6. There’s a minor typo “but right not I just want the count” should read “but right now I just want the count” .. was just a minor distraction, takes nothing away from the story. Brilliant explanation of the use of MapReduce.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: