You are on page 1of 1

Nom et Prénom : EL AMINE MEHDI

Apache Hadoop is an open source, Scalable, and Fault tolerant


framework written in Java. It efficiently processes large volumes of data on
a cluster of commodity hardware. Hadoop is not only a storage system but
it is a platform for large data storage as well as processing.
In this lecture, we get a look on how Apache Hadoop works uder the
hood. So when Apache Hadoop is getting fed a huge file, the framework
divides that chunk of big data into smaller pieces and stores them across
multiple machines to be processed in parallel, so that’s why Hadoop
interconnects an army of widely-available and relatively inexpensive
machines that form a Hadoop cluster, and no matter what the size of the
file that the user feeds to Hadoop, each one of its clusters accommodates
three functional layers, Hadoop distributed file systems for data storage,
Hadoop MapReduce for processing, and Hadoop Yarn for resource
management.
Then we get a brief introduction to HDFS, a distributed file systems that
follows master/slave architecture. It consists of a single namenode and
many datanodes. In the HDFS architecture, a file is divided into one or
more blocks of 128 Mb (the size can be changed in the configurations)
and stored in separate datanodes. Datanodes are responsible for
operations such as block creation, deletion and replication according to
namenode instructions. Apart from that, they are responsible to perform
read-write operations on file systems.
Namenode acts as the master server and the central controller for HDFS. It
holds the file system metadata and maintains the file system namespace.
Namenode oversees the condition of the datanode and coordinates
access to data.

You might also like