Saturday, June 15, 2013

Programming Puzzle: Multiply Matrixes with MongoDB


Today I want to share an idea on how to multiply matrixes using MondoDB's facilities. It's not the way you usually multiply them and more likely you never need to do this with Mongo. However the idea provided here can be applied to any MapReduce framework (for example: Hadoop). Frankly speaking it's not a clean and robust way of multiplying the matrixes, it's done just for fun, so don't take this article to seriously.

I'm going to consider two approaches, the first one - using only Mongo's MapReduce facilities and the second - MapReduce + Aggregation Framework. Intrigued? :) Continue reading then...


For those who prefer reading a code instead of text - GitHub: Approach1(multiply_map_reduce.js) and Approach2 (multiply_map_aggregate.js)

To run the code
mongo < multiply_map_reduce.js
mongo < multiply_map_aggregate.js


First of all lets recap the multiplication formula:
\[\mathbf{A} \times \mathbf{B} = \begin{vmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \\ \end{vmatrix} \times \begin{vmatrix} b_{00} & b_{01} \\ b_{10} & b_{11} \\ b_{20} & b_{21} \end{vmatrix} = \]
\[ \begin{vmatrix} a_{00} * b_{00} + a_{01} * b_{10} + a_{02} * b_{20} & a_{00} * b_{01} + a_{01} * b_{11} + a_{02} * b_{21} \\ a_{10} * b_{00} + a_{11} * b_{10} + a_{12} * b_{20} & a_{10} * b_{01} + a_{11} * b_{11} + a_{12} * b_{21} \\ \end{vmatrix} \]

And as a test sample we're using the following equation:

\[\mathbf{A} \times \mathbf{B} = \begin{vmatrix} 0 & 1 & 2 \\ 3 & 4 & 5 \\ \end{vmatrix} \times \begin{vmatrix} 0 & 1 \\ 2 & 3 \\ 4 & 5 \end{vmatrix} = \begin{vmatrix} 10 & 13 \\ 28 & 40 \\ \end{vmatrix} \]

Using MapReduce

So let's start with the first approach, we gonna use MapReduce and some 'not-fancy' Javascript code to solve the problem. First let's generate two matrices (in according to the test above):

use matrix;

var row_num = 2;
var col_num = 3;

db.matrix.remove({});

for (i = 0; i < row_num; i++) {
    for (var j = 0; j < col_num; j++) {
        db.matrix.insert({"name": "M1", row: i, col: j, value: col_num * i + j});
        db.matrix.insert({"name": "M2", row: j, col: i, value: row_num * j + i});
    }
}

db.matrix.find({name:"M1"}, {}).sort({row: 1, col: 1});
db.matrix.find({name:"M2"}, {}).sort({row: 1, col: 1});

The matrix M1 and M2 are stored in a single collection called 'matrix'. given that we try to do a calculation by just running a single MapReduce. Before we got to the actual code let's try to understand it's origins.

  • As described in formula above each a(x,y) and b(w,u) can be found on  certain position with a certain pair (a=>b, b=>a). For example b(0,1) always corresponds to second column in multiplied matrix and a(1,0) always corresponds to second row in the result. 
  • Next, how much times each b(x,y) should be emited? Since the column is already fixed in a result, we should emit each b row_num(A), and each a(x,y) should be emited col_num(B) in opposite. 
  • For each position (x,y) in a result matrix for each a(x,n) we have b(n, y) that a(x,n) should be multiplied to. So we can just iterate trhough n = [0..col_num(A)] and emit the pairs of corresponding a(x,n) and b(n,y) (note that col_num(A) == row_num(B)). 
mapFunction = function () {
    if (this.name == "M1") {
        for (var i = 0; i < 2; i++)
            emit({row: this.row, col: i}, {m_index: this.row * 10000 + i * 100 + this.col, value: this.value});
    }
    else if (this.name == "M2") {
        for (var i = 0; i < 2; i++)
            emit({row: i, col: this.col}, {m_index: i * 10000 + this.col * 100 + this.row, value: this.value});
    }
};

Note the m_index field, it's used to determine the corresponding b(n,y) to a given a(x,n) for a given position (x,y). In this implementation I assume that the number of rows/columns is less than 100, but this restriction can be omitted if m_index is a string (for example: m_index:"" + this.row + "," + i + "," + this.col). Also note a name:[1,2] - we

Next define a reduce function:

reduceFunction = function (key, values) {
    var multipliers = {};
    var sum = 0;

    values.forEach(function (x) {
        if (multipliers[x.m_index] != undefined)
            sum += multipliers[x.m_index] * x.value;
        else
            multipliers[x.m_index] = x.value
    });

    return {value: sum};
};

We simply iterate through emitted values and check if we've already added any value with the same m_index, if we have - then simple multiply and add to a sum. Easy :) And the output matches the test case:
{ "_id" : { "row" : 0, "col" : 0 }, "value" : { "value" : 10 } }
{ "_id" : { "row" : 0, "col" : 1 }, "value" : { "value" : 13 } }
{ "_id" : { "row" : 1, "col" : 0 }, "value" : { "value" : 28 } }
{ "_id" : { "row" : 1, "col" : 1 }, "value" : { "value" : 40 } }

Using aggregation

In the second approach I'll try to get rid of Reduce part and use only aggregation. Let's say matrices are stored in different collections.
use matrix

var row_num = 2;
var col_num = 3;

db.matrix1.remove({});
db.matrix2.remove({});

for (i = 0; i < row_num; i++) {
    for (var j = 0; j < col_num; j++) {
        db.matrix1.insert({row: i, col: j, value: col_num * i + j});
        db.matrix2.insert({row: j, col: i, value: row_num * j + i});
    }
}

db.matrix1.find({}, {}).sort({row: 1, col: 1});
db.matrix2.find({}, {}).sort({row: 1, col: 1});

Let's emit them in an intermediate collection 'matrix_intermediate' (in order to run aggregation later) with some additional fields .
map1 = function () {
    for (var i = 0; i < 2; i++)
        emit({matrix: 1, row: this.row, col: i}, {m_index: this.row * 10000 + i * 100 + this.col, value: this.value});
};

map2 = function () {
    for (var i = 0; i < 2; i++)
        emit({matrix: 2, row: i, col: this.col}, {m_index: i * 10000 + this.col * 100 + this.row, value: this.value});
};

reduceFunction = function (key, values) {
    return {value: values};
};

db.matrix1.mapReduce(map1, reduceFunction, { out: "matrix_intermediate" });
db.matrix2.mapReduce(map2, reduceFunction, { out: {merge: "matrix_intermediate"} });
Then we are ready to run aggregation. First of all what we need is to unwind the values field, then group by m_index and multiply a(x,n) by b(n,y). Since we can't just use $group->multiply here is a trick with $group->min-max and then $project->multiply.
db.matrix_intermediate.aggregate([
    {
        $unwind: "$value.value"
    },
    {
        $group: {
            _id: {
                m_index: "$value.value.m_index",
                row: "$_id.row",
                col: "$_id.col"
            },
            value1: {$min: "$value.value.value"},
            value2: {$max: "$value.value.value"}
        }
    },
    {
        $project: {
            _id: {
                row: "$_id.row",
                col: "$_id.col"
            },
            value: {$multiply: ["$value1", "$value2"]}
        }
    },
    {
        $group: {
            _id: {
                row: "$_id.row",
                col: "$_id.col"
            },
            value: {$sum: "$value"}
        }
    },
    {
        $project: {
            _id: 0,
            row: "$_id.row",
            col: "$_id.col",
            value: "$value"
        }
    }
]);
So we got the same result of matrix multiplication as in the first case but documents are formatted a bit different. Feel free to experiment with a larger number of rows-columns.


Hope you find this article interesting (if not so helpful :))

If you are new to MongoDB's MapReduce and aggregation - refer the website. If you want to perform MapReduce from Grails - check out this article

No comments:

Post a Comment