Mongo MapReduce FTW!

Submitted by Barrett on Tue, 09/27/2011 - 19:53
Mongo MapReduce FTW!

One of the systems I've lately inherited makes heavy use of Mongo for data storage, a data system I've not used previously. So, when the boss called tonight and said that his boss needed counts of an object in our system by state in the next 10 minutes my thinking went something like...

No problem, that's a simple SQL group-by....Oh, wait. This is mongo. Oh, crap! How do I do that?! It's a function; Map...Something.

The function I was looking for was MapReduce. Basically, MapReduce is a function applied to a collection set which itself takes two functions as parameters. The first parameter, the map function, converts every item in the collection to a key-value pair. The second function then reduces (hence the name MapReduce) the set of key-value pairs coming out of the first function to a set of items with key-value pairs representing the distinct values of the key and their counts. It's Mongo's answer to the SQL group-by.

For instance, the functions which solved my problems were:

var map = function() {
  var key = {'state': this.user.state};
  emit(key, {count: 1});

var reduce = function(key, values) {
  var sum = 0;
  values.forEach(function(value) {
    sum += value['count'];
  return {count: sum}

db.my_collection.mapReduce(map, reduce, {out: {inline:1}});

I haven't worked with it much beyond this immediate usage, but I get the sense that, while it's nowhere near as simple, it's probably significantly more powerful than the SQL group-by clause.

Barrett Tue, 09/27/2011 - 19:53