Professional Documents
Culture Documents
Database
vs.
1
Why Hadoop is able to compete?
Database
vs.
2
What is Hadoop
• Hadoop is a software framework for distributed processing
of large datasets across large clusters of computers
– Large datasets Terabytes or petabytes of data
– Large clusters hundreds or thousands of nodes
• Hadoop is open-source implementation for Google
MapReduce
• Hadoop is based on a simple programming model called
MapReduce
• Hadoop is based on a simple data model, any data will fit
3
What is Hadoop (Cont’d)
4
Hadoop Master/Slave Architecture
• Hadoop is designed as a master-slave shared-nothing architecture
5
Design Principles of Hadoop
8
Hadoop: How it Works
9
Hadoop Distributed File System (HDFS)
Centralized namenode
- Maintains metadata info about files
File F 1 2 3 4 5
10
Main Properties of HDFS