Professional Documents
Culture Documents
UNIT II Hadoop
UNIT II Hadoop
• Features of HDFS
– HDFS is a distributed file system which is horizontally scalable and
reliable.
– Data in HDFS is stored on multiple nodes in a distributed manner.
– HDFS is developed in Java.
– The architecture of HDFS is inspired from Google File System.
– Secondary NameNode.
• NameNode
– It is the master server and maintains the metadata of the stored
files.
– NameNode stores the name, size, owner, group, permissions etc. of a
file as the metadata of the data files.
• Secondary NameNode
– Runs on a separate machine in the cluster.
– It manages the metadata for the NameNode, i.e. it reads the file
system edits and creates the updated metadata for the NameNode.
• DataNode
– It runs on the slave nodes.
– The machines which host the DataNode daemon store the data files.
– It provides access to the files when requested by the client.
– The DataNode periodically sends heartbeat messages to the
NameNode indicating that it is alive.
– Task Tracker
• Job Tracker
– Job tracker is the single instance running on the master server.
• Task Tracker
– Task tracker runs on the slave nodes along with the DataNode
daemons.
– Task tracker initiates the tasks which are assigned by the Job Tracker.
– Task tracker returns the status of the tasks running on the slave
machines to the job tracker.