Professional Documents
Culture Documents
The key to Hadoops scalability is bringing data and processing together. Its 2 major
components are:
1. Hadoop Distributed File System (HDFS): This is where our data is stored
in a distributed file system. Provides Scalability, Redundancy and Fault
Tolerance.
2. Hadoop MapReduce: Processing of the data in the Hadoop framework can
be done through the Hadoop MapReduce framework. It allows programming
for large scale data-sets in a distributed manner. Map Reduce brigs the
Computing TO the Data. Minimizes communication and transportation of
data.
The basic Hadoop Stack has shifted to Hadoop 2.0. The MapReduce has been broken
down as above. Yarn is a resource management system that allows more
flexibility to the way we submit jobs. We can map and reduce data from multiple
nodes etc.
The Apache Hadoop Ecosystem and new Tools (built on top of framework):
These tools allow us to perform more complex analysis of the data.