Professional Documents
Culture Documents
CPU
Memory
Disk
C0 C1 C4
C2 C3 …
Chunk server 1 Chunk server 2 Chunk server 3 Chunk server N
C0 C1 C1 C2 C0 C4
C4 C2 C3 … C3
Chunk server 1 Chunk server 2 Chunk server 3 Chunk server N
C0 C1 D0 C1 C2 C0 C4
C4 C2 C3 D0 D1 … D1 C3
Chunk server 1 Chunk server 2 Chunk server 3 Chunk server N
Sample application:
Analyze web server logs to find popular URLs
Term Statistics for Search
Input Intermediate
key-value pairs key-value pairs
k v
map
k v
k v
map
k v
k v
… …
k v k v
Output
Intermediate Key-value groups key-value pairs
key-value pairs
reduce
k v k v v v k v
reduce
Group
k v k v v k v
by key
k v
… …
…
k v k v k v
data
reads
The crew of the space
shuttle Endeavor recently
(The, 1) (crew, 1)
read the
returned to Earth as (crew, 1) (crew, 1)
ambassadors, harbingers (crew, 2)
of a new era of space (of, 1) (space, 1)
sequential
(space, 1)
exploration. Scientists at
(the, 1) (the, 1)
NASA are saying that the (the, 3)
Sequentially
recent assembly of the (space, 1) (the, 1)
Dextre bot is the first step (shuttle, 1)
in a long-term space-based (shuttle, 1) (the, 1)
(recently, 1)
man/mache partnership. (Endeavor, 1) (shuttle, 1)
'"The work we're doing now …
Only
-- the robotics we're doing (recently, 1) (recently, 1)
-- is what we're going to
need …………………….. …. …
Big document (key, value) (key, value) (key, value)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 21
Word Count Using MapReduce
map(key, value):
// key: document name; value: text of the document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; value: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key, result)
Group by key:
Collect all pairs with
same key
(Hash merge, Shuffle,
Sort, Partition)
Reduce:
Collect all values
belonging to the
key and output
All phases are distributed with many tasks doing the work
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 24
Map-Reduce
Programmer specifies:
Input 0 Input 1 Input 2
Map and Reduce and input files
Workflow:
Read inputs as a set of key-value-pairs
Map 0 Map 1 Map 2
Map transforms input kv-pairs into a new set
of k'v'-pairs
Sorts & Shuffles the k'v'-pairs to output nodes Shuffle
All k’v’-pairs with a given k’ are sent to the
same reduce
Reduce processes all k'v'-pairs grouped by key Reduce 0 Reduce 1
into new k''v''-pairs
Write the resulting pairs to files
Other examples:
Link analysis and graph processing
Machine Learning algorithms
A B B C A C
a1 b1 b2 c1 a3 c1
a2
a3
b1
b2
⋈ b2 c2 = a3 c2
b3 c3 a4 c3
a4 b3
S
R