Monday, September 3, 2012

Mongo MapReduce performance

A lot of people complain about mongo's performance while making a MapReduce tasks. The problem here is that the mongo is not designed to handle real-time analytic tasks.

The price of using MapReduce is speed: group is not particularly speedy, but MapReduce is slower and is not supposed to be used in “real time.” You run MapReduce as a background job, it creates a collection of results, and then you can query that collection in real time.



One of the reason the mapreduce operation is not very fast is that Mongo (2.2 and below) uses single-threaded engine to execute javascript code so you can't take advantage of multiple cores. But they actually plan to add parallelism to Javascript execution by migrating to google v8 engine.

There is a javascript lock so that only one thread can execute JS code at one point in time. But most JS steps of the MR (e.g. a single map()) are very short and consequently the lock is yielded very often. Note ticket https://jira.mongodb.org/browse/SERVER-4258 will allow multi-threading. 

So if you need to do some grouping and speed matters you better use an aggregation framework introduced in Mongo 2.1


If you want parallel MapReduce jobs to be running against your database - use the Hadoop plugin for Mongo to utilize the Hadoop MapReduce. 

No comments:

Post a Comment