You are on page 1of 6

ANATOMY

OF
FILE WRITE AND READ
• HDFS has master slave architecture. NameNode acts as master and
data node acts as slave.
• Namenode has all meta information and data node has original data.
• HDFS client will interact with HDFS to perform write and read
operations.
• In Write operation data will be stored in DataNodes sent by the
client.
• In Read operation data will be retrieved from DataNodes in client
machines.
ANATOMY OF FILE WRITE

DataOutputStream
step 1: client will create a file by calling create() method.
Step 2: DFS makes RPC (Remote Procedural Call) to Namenode to create a new
file
(name node performs various operations to check file doesn’t
already exists and client has permission to create file etc)
Step 3: as client write data DataOutputStream split into data packets.
Step 4: Data packets are formed in pipeline(dataQueue) and stored in
DataNode 1 and datanode2 and soon.
Step 5: After data packets are stored in all DataNodes acknowledgement
package will be sent to DataOutputStream to remove packets from pipeline.
Step 6: client has finished data writing , it calls close() method on the stream.
Step 7: FSimage will be updated in NameNode
ANATOMY OF FILE READ
Step 1: client will call open() method to open a file
Step 2: DFS calls NameNode using RPC to determine the location of data blocks.
NameNode will return the address of Data Node's that have block.
Step 3: then client will call read() method on the stream. DataInputStream which
store address of Data Node's which has blocks.
Step 4: Data is streamed from DataNode back to the client.
Step 5: When the end of blocks is reached DataInputStream will close the
connection with DataNodes.
Step 6: when client finished reading, the client will call close() method of
DataInputStream.

You might also like