Professional Documents
Culture Documents
• Fault tolerant.
Shuffle,
Sort,
Reduce
output
Map reduce
map
map reduce
input output
map reduce
map
Map reduce
(laptop,50)
map
(usb,57)
input
(laptop, 78)
map
(mouse, 25)
(laptop, 78)
map
(mouse, 25) M2
(mouse, 25)
(mouse, 67)
(phone, 49) (usb, 12)
map (usb, 57)
(mouse,67)
(laptop, 5)
Map reduce
(laptop, 10)
map (usb, 12) M1
(laptop, 10)
(laptop, 50)
(laptop,50) (laptop, 78)
map (usb,57) (laptop, 5)
(laptop, 78)
map (mouse, 25) M2
(mouse, 25)
(mouse, 67)
(phone, 49) (usb, 12)
map (mouse,67) (usb, 57)
(laptop, 5)
Map reduce
(laptop, 10)
map (usb, 13)
M1
(laptop, [10, 50, 78, 5]) laptop, 143
(laptop,50)
map (usb,57)
(laptop, 78)
map (mouse, 25)
M2
mouse, 92
(mouse, [25,67])
usb 70
(phone, 49) (usb, [13,57])
map (mouse,67)
(laptop, 5)
MapReduce Hadoop
• Open source from Apache. https://hadoop.apache.org/
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-
client-core/MapReduceTutorial.html
• Components
• MapReduce
• PageRank-ing
D1: data base systems,
Inverted index D2: economic base analysis
D3: distributed systems
D4: data analysis
(data, 1) M1
map (base, 6) (data, D1:1)
(systems, 11) (base, D1:1)
(systems, D1: 11)
(economic, 1) (economic, D2:1)
map (base, 10) (base, D2:10)
(analysis, 15)
input shuffle, sort
M2
(distributed,1) (analysis, D2:15)
map
(systems, 14) (distributed, D3:1)
(systems, D3:14)
(data, D4:1)
(data, 1) (analysis, D4: 6)
map (analysis, 6)
D1: data base systems,
Inverted index D2: economic base analysis
D3: distributed systems
D4: data analysis
(data, 1) M1
map (base, 6) (analysis, D2:15)
(systems, 11) (analysis, D4: 6)
(base, D1:1)
(economic, 1) (base, D2:10)
map (base, 10) (data, D1:1)
(analysis, 15) (data, D4:1)
input shuffle, sort
(distributed,1) M2
map (distributed, D3:1)
(systems, 14)
(economic, D2:1)
(systems, D1: 11)
(systems, D3:14)
(data, 1)
map (analysis, 6)
D1: data base systems,
Inverted index D2: economic base analysis
D3: distributed systems
D4: data analysis
M1
(data, 1) (analysis, D2:15)
M1
map (base, 6) (analysis, D4: 6)
(analysis, [D2:15,
(systems, 11) (base, D1:1)
D4:6])
(base, D2:10)
(base, [D1:1, D2, 10])
(data, D1:1)
(economic, 1) (data, [D1:1,D4:1])
(data, D4:1)
map (base, 10)
(analysis, 15)
input shuffle, sort
(distributed,1) M2 M2
map (distributed, D3:1) (distributed, [D3:1])
(systems, 14)
(economic, D2:1) (economic, [D2:1])
(systems, D1: 11) (systems, [D1:
(systems, D3:14) 11,D3:14])
(data, 1)
map (analysis, 6)
Sql operators
MapReduce: Sql operators
• Selection
• Group by
• Join
EMPLOYEES DEPARTMENTS
emp_id name dep_id dep_id dep_name
100 Steven King 90 30 Purchasing
102 Lex De Hann 90 90 Executive
108 Nancy Greenberg 100 100 Finance
116 Shelli Baida 30 20 Marketing
117 Sigal Tobias 30
map map
map map
map map