You are on page 1of 2

Hadoop is a framework for managing big-data.

Its primary goal is to allow


distributed processing of large datasets across clusters of commodity
servers. It can help us solve data problems that cannot be computed on
regular single computers. It features are:
1. Open Source
2. Reliable
3. Scalable(Add servers to increase performance linearly without disturbing
current processing)
4. Highly Fault Tolerant (If one server fails, It continues processing using other
systems)
5. Scheme free (Manages structures and un-structured data)

The key to Hadoops scalability is bringing data and processing together. Its 2 major
components are:
1. Hadoop Distributed File System (HDFS): This is where our data is stored
in a distributed file system. Provides Scalability, Redundancy and Fault
Tolerance.
2. Hadoop MapReduce: Processing of the data in the Hadoop framework can
be done through the Hadoop MapReduce framework. It allows programming
for large scale data-sets in a distributed manner. Map Reduce brigs the
Computing TO the Data. Minimizes communication and transportation of
data.

The basic Hadoop Stack has shifted to Hadoop 2.0. The MapReduce has been broken
down as above. Yarn is a resource management system that allows more
flexibility to the way we submit jobs. We can map and reduce data from multiple
nodes etc.

The Apache Hadoop Ecosystem and new Tools (built on top of framework):
These tools allow us to perform more complex analysis of the data.

You might also like