/  13
 
Apache Hadoop
- Kumaresan Manickavelu
 
2
Problems With Scale
Failure is the defining difference betweendistributed and local programming
If components fail, their workload must be pickedup by still-functioning units
Nodes that fail and restart must be able to rejointhe group activity without a full group restart
Increased load should cause graceful decline
Increasing resources should support a proportionalincrease in load capacity
Storing and Sharing data with processing units.
 
3
Hadoop Echo System
Apache Hadoop is a collection of open-source softwarefor reliable, scalable, distributed computing.
Hadoop Common: The common utilities that support theother Hadoop subprojects.
HDFS: A distributed file system that provides highthroughput access to application data.
MapReduce: A software framework for distributedprocessing of large data sets on compute clusters.
Pig: A high-level data-flow language and executionframework for parallel computation.
HBase: A scalable, distributed database that supportsstructured data storage for large tables.

Share & Embed

More from this user

Add a Comment

Characters: ...