You are on page 1of 26

Aggregation

Alvin Richards

New framework in MongoDB


Technical Director, EMEA alvin@10gen.com @jonnyeight

What problem are we solving?


Map/Reduce can be used for aggregation
Currently being used for totaling, averaging, etc

Map/Reduce is a big hammer


Simpler tasks should be easier Shouldnt need to write JavaScript Avoid the overhead of JavaScript engine

Were seeing requests for help in handling

complex documents

Select only matching subdocuments or arrays

How will we solve the problem?


New aggregation framework
Declarative framework (no JavaScript) Describe a chain of operations to apply Expression evaluation Return computed values Framework: new operations added easily C++ implementation

Aggregation - Pipelines
Aggregation requests specify a pipeline A pipeline is a series of operations Members of a collection are passed

through a pipeline to produce a result


e.g. ps -ef | grep -i mongod

Example - twitter
{ "_id" : ObjectId("4f47b268fb1c80e141e9888c"), "user" : { "friends_count" : 73, "location" : "Brazil", "screen_name" : "Bia_cunha1", "name" : "Beatriz Helena Cunha", "followers_count" : 102, } }

Find the # of followers and # friends by location

Example - twitter
db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );

Example - twitter
db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );

Predicate

Example - twitter
db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );

Predicate

Parts of the document you want to project

Example - twitter
db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );

Predicate

Parts of the document you want to project Function to apply to the result set

Example - twitter
{ ... } "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 }, ], "ok" : 1

10

Pipeline Operations
$match
Uses a query predicate (like .nd({})) as a lter

$project
Uses a sample document to determine the shape

of the result (similar to .nd()s optional argument)


This can include computed values

$unwind
Hands out array elements one at a time

$group
Aggregates items into buckets dened by a key

11

Pipeline Operations (continued)


$sort
Sort documents

$limit
Only allow the specied number of

documents to pass

$skip
Skip over the specied number of documents

12

Computed Expressions
Available in $project operations Prex expression language
$add:[$eld1, $eld2] $ifNull:[$eld1, $eld2] Nesting:

$add:[$eld1, $ifNull:[$eld2, $eld3]] Other functions.


$divide, $mod, $multiply

13

Computed Expressions
String functions
$toUpper, $toLower, $substr

Date eld extraction


$year, $month, $day, $hour...

Date arithmetic $ifNull Ternary conditional


Return one of two values based on a

predicate

14

Projections
$project can reshape results
Include or exclude elds Computed elds Arithmetic expressions Pull elds from nested documents to the top Push elds from the top down into new virtual documents

15

Unwinding
$unwind can stream arrays
Array values are doled out one at time in the

context of their surrounding documents Makes it possible to lter out elements before returning

16

Grouping
$group aggregation expressions
Dene a grouping key as the _id of the result Total grouped column values: $sum Average grouped column values: $avg Collect grouped column values in an array or

set: $push, $addToSet Other functions

$min, $max, $rst, $last

17

Sorting
$sort can sort documents
Sort specications are the same as today,

e.g., $sort:{ key1: 1, key2: -1, }

18

Demo
Demo les are at https://gist.github.com/2036709

19

Usage Tips
Use $match in a pipeline as early as

possible

The query optimizer can then be used to

choose an index and avoid scanning the entire collection

Use $sort in a pipeline as early as

possible

The query optimizer can sometimes be used

to choose an index to scan instead of sorting the result

20

Driver Support
Initial version is a command
For any language, build a JSON database

object, and execute the command

{ aggregate : <collection>, pipeline : {} }


Beware of result size limit of 16MB

21

When is this being released?


Now! 2.1.0 - unstable 2.2.0 - stable (soon)

22

Sharding support
Initial release supports sharding Mongos analyzes pipeline
forwards operations up to $group or $sort to

shards combines shard server results and returns them

23

Pipeline Operations Future


$out
Saves the document stream to a collection Similar to M/R $out, but with sharded output Functions like a tee, so that intermediate

results can be saved

24

Documentation, Bug Reports


http://www.mongodb.org/display/DOCS/

Aggregation+Framework

https://jira.mongodb.org/browse/SERVER/

component/10840

25

download at mongodb.org
alvin@10gen.com

conferences, appearances, and meetups


http://www.10gen.com/events

http://bit.ly/mongoE

Facebook | Twitter | LinkedIn


@mongodb

http://linkd.in/joinmongo

26