SQL to MongoDB: An Updated Mapping

Rick Osborne's original chart.

The aggregation pipeline code has finally been merged into the main development branch and is scheduled for release in 2.2. It lets you combine simple operations (like finding the max or min, projecting out fields, taking counts or averages) into a pipeline of operations, making a lot of things that were only possible by using MapReduce doable with a “normal” query.

In celebration of this, I thought I’d re-do the very popular MySQL to MongoDB mapping using the aggregation pipeline, instead of MapReduce.

Here is the original SQL:

SELECT
  Dim1, Dim2,
  SUM(Measure1) AS MSum,
  COUNT(*) AS RecordCount,
  AVG(Measure2) AS MAvg,
  MIN(Measure1) AS MMin
  MAX(CASE
    WHEN Measure2  123)
GROUP BY Dim1, Dim2
HAVING (MMin > 0)
ORDER BY RecordCount DESC
LIMIT 4, 8

We can break up this statement and replace each piece of SQL with the new aggregation pipeline syntax:

MongoDB Pipeline MySQL
aggregate: "DenormAggTable"
FROM DenormAggTable
{
    $match : {
        Filter1 : {$in : ['A','B']},
        Filter2 : 'C',
        Filter3 : {$gt : 123}
    }
}
WHERE (Filter1 IN (’A’,’B’))
  AND (Filter2 = ‘C’)
  AND (Filter3 > 123)
{
    $project : {
        Dim1 : 1,
        Dim2 : 1,
        Measure1 : 1,
        Measure2 : 1,
        lessThanAHundred : {
            $cond: [ 
                {$lt: ["$Measure2", 100] },
                "$Measure2", // if
                0]           // else
        }
    }
}
CASE
  WHEN Measure2 < 100
  THEN Measure2
END
{
    $group : {
        _id : {Dim1 : 1, Dim2 : 1},
        MSum : {$sum : "$Measure1"},
        RecordCount : {$sum : 1},
        MAvg : {$avg : "$Measure2"},
        MMin : {$min : "$Measure1"},
        MMax : {$max : "$lessThanAHundred"}
    }
}
SELECT
  Dim1, Dim2,
  SUM(Measure1) AS MSum,
  COUNT(*) AS RecordCount,
  AVG(Measure2) AS MAvg,
  MIN(Measure1) AS MMin
  MAX(CASE
    WHEN Measure2 < 100
    THEN Measure2
  END) AS MMax

GROUP BY Dim1, Dim2
{
    $match : {MMin : {$gt : 0}}
}
HAVING (MMin > 0)
{
    $sort : {RecordCount : -1}
}
ORDER BY RecordCount DESC
{
    $limit : 8
},
{
    $skip : 4
}
LIMIT 4, 8

Putting all of these together gives you your pipeline:

> db.runCommand({aggregate: "DenormAggTable", pipeline: [
{
    $match : {
        Filter1 : {$in : ['A','B']},
        Filter2 : 'C',
        Filter3 : {$gt : 123}
    }
},
{
    $project : {
        Dim1 : 1,
        Dim2 : 1,
        Measure1 : 1,
        Measure2 : 1,
        lessThanAHundred : {$cond: [{$lt: ["$Measure2", 100]}, {
            "$Measure2",
            0]
        }
    }
},
{
    $group : {
        _id : {Dim1 : 1, Dim2 : 1},
        MSum : {$sum : "$Measure1"},
        RecordCount : {$sum : 1},
        MAvg : {$avg : "$Measure2"},
        MMin : {$min : "$Measure1"},
        MMax : {$max : "$lessThanAHundred"}
    }
},
{
    $match : {MMin : {$gt : 0}}
},
{
    $sort : {RecordCount : -1}
},
{
    $limit : 8
},
{
    $skip : 4
}
]})

As you can see, the SQL matches the pipeline operations pretty clearly. If you want to play with it, it’ll be available soon to a the development nightly build.

If you’re at MongoSV today (December 9th, 2011), check out Chris Westin’s talk on the new aggregation framework at 3:45 in room B4.

6 thoughts on “SQL to MongoDB: An Updated Mapping

Leave a comment