Professional Documents
Culture Documents
red
red
red
red
red
red
red
red
red
red
red
red
red
red
red
red
red
red
red
2/30
Outline
1 Big Data
2 Hadoop...
3 HDFS
4 Map Reduce
3/30
Big Data
4/30
Big Data
5/30
Big Elephant
Numerous small chicken..?
6/30
Issues
How to handle a system up and downs ?
How to combine the data from all the systems ?
7/30
8/30
9/30
Map Reduce
10/30
Outline
1 Big Data
2 Hadoop...
3 HDFS
4 Map Reduce
11/30
HADOOP
12/30
History
13/30
History
Doug Cutting
Open source version of MapReduce system called Hadoop
Yahoo and others rallied around to support this effort.
Now Hadoop is core part in : Facebook, Yahoo, LinkedIn,
Twitter
14/30
Core Concepts
HDFS
Map Reduce
15/30
Outline
1 Big Data
2 Hadoop...
3 HDFS
4 Map Reduce
16/30
HDFS...
Hadoop Distributed File System
Commodity Cluster
No High end Servers
Yes, high chance of failure (But HDFS is tolerant
enough)
Replication is done
17/30
HDFS
Hadoop Distributed File System...
Services
Masters
Name Node
Secondary Name Node
Job Tracker
Slaves
Data Node
Task Tracker
18/30
HDFS
Hadoop Distributed File System...
Name Node
Master Node
Maintains Name System
Meta Data
Secondary Name Node
Periodically updating fsimage file
Data Node
Slaves
Actual Storage
19/30
HDFS Architecture
20/30
Outline
1 Big Data
2 Hadoop...
3 HDFS
4 Map Reduce
21/30
Map Reduce
22/30
Map Reduce
Job Tracker
Master
Manages the jobes in the cluster
Task Tracker
Slaves
Responsible for Map Reduce
23/30
Map Reduce
24/30
Map Reduce
Map Phase
map(inKey,invalue)-list(outKey, intermediateValue)
Processes input key/value pair
Produces set of intermediate pairs
Reduce Phase
reduce(outKey,list(intermediateValue))- list(outValue)
Combines all intermediate values for a particular key
Produces a set of merged output values (usually just one)
25/30
Map Reduce
26/30
Map Reduce
27/30
Map Reduce
28/30
References
If you want to improve this style
29/30
Happy Hadooping.... :)
30/30