You are on page 1of 11

HDFS

• Features of HDFS
– HDFS is a distributed file system which is horizontally scalable and
reliable.
– Data in HDFS is stored on multiple nodes in a distributed manner.
– HDFS is developed in Java.
– The architecture of HDFS is inspired from Google File System.

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India 1
Hadoop 1.x - HDFS

• Three major daemons perform the responsibilities of


HDFS in Hadoop 1.x.
• They are
– NameNode
– DataNode

– Secondary NameNode. 

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India 2
Hadoop 1.x - HDFS
Hadoop 1.x - HDFS

• NameNode
– It is the master server and maintains the metadata of the stored
files.
– NameNode stores the name, size, owner, group, permissions etc. of a
file as the metadata of the data files.

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India
Hadoop 1.x - HDFS

• Secondary NameNode
– Runs on a separate machine in the cluster.

– It manages the metadata for the NameNode, i.e. it reads the file
system edits and creates the updated metadata for the NameNode.

– If the existing NameNode fails, then the updated metadata is used to


set up a new NameNode.

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India
Hadoop 1.x - HDFS

• DataNode
– It runs on the slave nodes.

– The machines which host the DataNode daemon store the data files.
– It provides access to the files when requested by the client.
– The DataNode periodically sends heartbeat messages to the
NameNode indicating that it is alive. 

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India
Hadoop 1.x - HDFS

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India
Hadoop 1.x – MapReduce
Hadoop 1.x - MapReduce

• MapReduce is the data processing layer of Hadoop.


– Job Tracker

– Task Tracker

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India
Hadoop 1.x – MapReduce
Hadoop 1.x - MapReduce

• Job Tracker
– Job tracker is the single instance running on the master server.

– The job to be executed is first submitted to the Job Tracker.


– Job tracker also initiates separate tasks on various DataNodes in the
cluster.

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India
Hadoop 1.x – MapReduce

• Task Tracker
– Task tracker runs on the slave nodes along with the DataNode
daemons.
– Task tracker initiates the tasks which are assigned by the Job Tracker.
– Task tracker returns the status of the tasks running on the slave
machines to the job tracker.

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India 9
Hadoop 1.x – MapReduce

Dr.N.G.P. Arts and Science College


Coimbatore,Tamil Nadu, India 10
Dr.N.G.P. Arts and Science College
Coimbatore,Tamil Nadu, India 11

You might also like