You are on page 1of 19

MongoDB

Advance
MapReduce with MongoDB

• MongoDB provides mapReduce command

• Syntax: db.collection-name.mapReduce(mapFunction,
reduceFunction, options)
MapReduce Operation

• the map function, defined as JS function

• the reduce function also defined as function using the


function keyword
MapReduce Operation

• query: is the criteria to select documents first, if not given,


then the map reduce will be applied on all docs

• out field specify where to store the MapReduce output

• here, in the example order_totals will be a new collection


MapReduce
• The map and reduce can be defined as functions outside
the mapReduce operator,

• can be called inside the mapReduce operator

db.testMR.mapReduce(mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
Example
• Collection contains documents, document as shown
below
{
cust_id: "abc123",

ord_date: new Date("Oct 04, 2012"),

status: 'A',

price: 25,

items: [ { sku: "mmm", qty: 5, price: 2.5 },

{ sku: "nnn", qty: 5, price: 2.5 } ]

• The MapReduce task: return the total price per customer


Define Map Function

• Map function reads the input and emits/produce

• cust_id and price as key-value pairs

• We can define the map function using JS


var mapFunction1 = function() {

emit(this.cust_id, this.price);

};

* this refers to the document that the map function is processing


Define the reduce function

• The reduce function will get the cust_id as a key, and an


Array of all prices as value

• Using JS, we can define the reduce function as

var reduceFunction1 = function(keyCustId, valuesPrices) {

return Array.sum(valuesPrices);

};
Perform the MapReduce
db.orders.mapReduce(

mapFunction1,

reduceFunction1,

{ out: “sum_prices_per_customer” }

• We call the mapReduce operator on the collection


)

• pass the map function name

• pass the reduce function name

• and specify where to store the output


Other parameters
mapReduce takes

• Beside the map function, reduce function, and the out

• mapReduce operator can take

• query: perform selection (filter on the data)

• sort

• limit
Example

• count number of movies per year, starting from 2005

sample doc
{
title: "Anthropoid",
year: 2016,
actors: [ ObjectId("8") ]
}
Example
• count number of movies per year, starting from 2005, and
sort

sample doc
{
title: "Anthropoid",
year: 2016,
actors: [ ObjectId("8") ]
}
Index
• If data is not indexed, this means that the DB will scan the entire
collection to find docs based on given conditions

• With indexes, mongoDB can efficiently find docs

• primary feature for performance

• Primary index is applied on the identifier field (_id)

• created automatically

• Secondary indexes can be applied on any field

• created manually
Index types
• Default: _id

• Single field

• user-defined on single field

• Compound fields

• user-defined on multiple fields

• multikey index

• used to index content stored in array

• index entry for each array element


Index Structures
• Ordered

• values in the indexed field are sorted either ascending or


descending

• Hashed

• index the hashing of the values

• Text

• index the text of the fields

• useful for full-text search


Behind the scenes: BSON

• MongoDB uses BSON (Binary JSON) to store


representation of JSON docs

• the JSON objects, arrays are serialized into binary


Behind the scenes:
Replication

• Master/slave replication

• one replica is the master

• other replicas are slaves

• client perform operations on


the master replica
Behind the scenes:
Sharding
• MongoDB automatically partition the data

• MongoDB partition Collection

• using the indexed key that is immutable (for example the _id)

• divide into chunks

• when the chunk grows beyond configured limit, it will be split

• In the background

• MongoDB runs chunk migration process

• to achieve load balancing


Summary
• MongoDB is a JOSN document Database

• Master/slave replication approach

• Query functionality

• CRUD operations

• Create , Read, Update, Delete

• MapReduce

• index structures

You might also like