SQL to MongoDB: An Updated Mapping

SQL-to-MongoDB2 — Rick Osborne's original chart.

The aggregation pipeline code has finally been merged into the main development branch and is scheduled for release in 2.2. It lets you combine simple operations (like finding the max or min, projecting out fields, taking counts or averages) into a pipeline of operations, making a lot of things that were only possible by using MapReduce doable with a “normal” query.

In celebration of this, I thought I’d re-do the very popular MySQL to MongoDB mapping using the aggregation pipeline, instead of MapReduce.

Here is the original SQL:

SELECT
  Dim1, Dim2,
  SUM(Measure1) AS MSum,
  COUNT(*) AS RecordCount,
  AVG(Measure2) AS MAvg,
  MIN(Measure1) AS MMin
  MAX(CASE
    WHEN Measure2  123)
GROUP BY Dim1, Dim2
HAVING (MMin > 0)
ORDER BY RecordCount DESC
LIMIT 4, 8

We can break up this statement and replace each piece of SQL with the new aggregation pipeline syntax:

MongoDB Pipeline	MySQL
aggregate: "DenormAggTable"	FROM DenormAggTable
{ $match : { Filter1 : {$in : ['A','B']}, Filter2 : 'C', Filter3 : {$gt : 123} } }	WHERE (Filter1 IN (’A’,’B’)) AND (Filter2 = ‘C’) AND (Filter3 > 123)
{ $project : { Dim1 : 1, Dim2 : 1, Measure1 : 1, Measure2 : 1, lessThanAHundred : { $cond: [ {$lt: ["$Measure2", 100] }, "$Measure2", // if 0] // else } } }	CASE WHEN Measure2 < 100 THEN Measure2 END
{ $group : { _id : {Dim1 : 1, Dim2 : 1}, MSum : {$sum : "$Measure1"}, RecordCount : {$sum : 1}, MAvg : {$avg : "$Measure2"}, MMin : {$min : "$Measure1"}, MMax : {$max : "$lessThanAHundred"} } }	SELECT Dim1, Dim2, SUM(Measure1) AS MSum, COUNT(*) AS RecordCount, AVG(Measure2) AS MAvg, MIN(Measure1) AS MMin MAX(CASE WHEN Measure2 < 100 THEN Measure2 END) AS MMax GROUP BY Dim1, Dim2
{ $match : {MMin : {$gt : 0}} }	HAVING (MMin > 0)
{ $sort : {RecordCount : -1} }	ORDER BY RecordCount DESC
{ $limit : 8 }, { $skip : 4 }	LIMIT 4, 8

Putting all of these together gives you your pipeline:

> db.runCommand({aggregate: "DenormAggTable", pipeline: [
{
    $match : {
        Filter1 : {$in : ['A','B']},
        Filter2 : 'C',
        Filter3 : {$gt : 123}
    }
},
{
    $project : {
        Dim1 : 1,
        Dim2 : 1,
        Measure1 : 1,
        Measure2 : 1,
        lessThanAHundred : {$cond: [{$lt: ["$Measure2", 100]}, {
            "$Measure2",
            0]
        }
    }
},
{
    $group : {
        _id : {Dim1 : 1, Dim2 : 1},
        MSum : {$sum : "$Measure1"},
        RecordCount : {$sum : 1},
        MAvg : {$avg : "$Measure2"},
        MMin : {$min : "$Measure1"},
        MMax : {$max : "$lessThanAHundred"}
    }
},
{
    $match : {MMin : {$gt : 0}}
},
{
    $sort : {RecordCount : -1}
},
{
    $limit : 8
},
{
    $skip : 4
}
]})

As you can see, the SQL matches the pipeline operations pretty clearly. If you want to play with it, it’ll be available soon to a the development nightly build.

If you’re at MongoSV today (December 9th, 2011), check out Chris Westin’s talk on the new aggregation framework at 3:45 in room B4.

6 thoughts on “SQL to MongoDB: An Updated Mapping”

Pingback: SQL to MongoDB: An Updated Mapping « Another Word For It
Anonymous says:

December 30, 2011 at 1:16 pm

This kinda stuff shouldn’t be in blog. It should go straight into tutorial! It’s that good! Thanks.

LikeLike

shawny says:

January 15, 2012 at 10:27 pm

I think:
{
$limit : 4
},
{
$skip : 8
}

should be changed into:

{ $limit : 8},{ $skip : 4}

LikeLike

1. Anonymous says:
  
  January 17, 2012 at 2:37 pm
  
  Whoops, thank you! Fixed.
  
  LikeLike
  
Juan Borda says:

December 13, 2012 at 9:28 am

I would like to view this chart reviewed with the new aggregation framework introduced in mongo 2.1 (i can help too)

LikeLike

1. kristina1 says:
  
  December 13, 2012 at 11:28 am
  
  That’d be cool, let me know if you come up with a new version.
  
  LikeLike