Professional Documents
Culture Documents
1 - Introduction to Hadoop
A. Hammad, A. Garca | September 7, 2011
S TEINBUCH C ENTRE
FOR
C OMPUTING (SCC)
www.kit.edu
What is it for?
Data intensive computing on commodity hardware
Yahoos (re)implementation of Googles Map-Reduce
simple-process huge amounts of data in efficient way
highly scalable filesystem, computing coupled to storage
September 7, 2011
Hadoop tutorial
(Split step)
Map step
(Shuffle step)
Reduce step
September 7, 2011
Hadoop tutorial
Scalability
September 7, 2011
Hadoop tutorial
Scalability
September 7, 2011
Hadoop tutorial
Applications
September 7, 2011
Hadoop tutorial
Use cases
Data size
Access
Updates
Structure
Integrity
Scaling
September 7, 2011
Hadoop tutorial
Traditional RDBMS
MapReduce
Gigabytes
Interactive and batch
Read and write many times
Static schema
High
Nonlinear
Petabytes
Batch
Write once, read many times
Dynamic schema
Low
Linear
September 7, 2011
Hadoop tutorial
More info
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
September 7, 2011
Hadoop tutorial
Questions?
10
September 7, 2011