You are on page 1of 22

The Design of HDFS

Unit - 2
HDFS Concepts
Hadoop File System was developed using distributed file system design. It is run on commodity hardware.
Unlike other distributed systems, HDFS is highly fault tolerant and designed using low-cost hardware.
HDFS holds very large amount of data and provides easier access. To store such huge data, the files are
stored across multiple machines. These files are stored in redundant fashion to rescue the system from
possible data losses in case of failure. HDFS also makes applications available to parallel processing.
Features of HDFS
● It is suitable for the distributed storage and processing.
● Hadoop provides a command interface to interact with HDFS.
● The built-in servers of namenode and datanode help users to easily check the status of cluster.
● Streaming access to file system data.
● HDFS provides file permissions and authentication.
HDFS Architecture
Given below is the architecture of a Hadoop File System.

HDFS follows the master-slave architecture and it has the following elements.
Namenode

1
The namenode is the commodity hardware that contains the GNU/Linux operating system and the
namenode software. It is a software that can be run on commodity hardware. The system having the
namenode acts as the master server and it does the following tasks −
● Manages the file system namespace.
● Regulates client’s access to files.
● It also executes file system operations such as renaming, closing, and opening files and directories.
Datanode
The datanode is a commodity hardware having the GNU/Linux operating system and datanode software.
For every node (Commodity hardware/System) in a cluster, there will be a datanode. These nodes manage
the data storage of their system.
● Datanodes perform read-write operations on the file systems, as per client request.
● They also perform operations such as block creation, deletion, and replication according to the
instructions of the namenode.
Block
Generally the user data is stored in the files of HDFS. The file in a file system will be divided into one or
more segments and/or stored in individual data nodes. These file segments are called as blocks. In other
words, the minimum amount of data that HDFS can read or write is called a Block. The default block size
is 64MB, but it can be increased as per the need to change in HDFS configuration.

Goals of HDFS
Fault detection and recovery − Since HDFS includes a large number of commodity hardware, failure of
components is frequent. Therefore HDFS should have mechanisms for quick and automatic fault detection
and recovery.
2
Huge datasets − HDFS should have hundreds of nodes per cluster to manage the applications having huge
datasets.
Hardware at data − A requested task can be done efficiently, when the computation takes place near the
data. Especially where huge datasets are involved, it reduces the network traffic and increases the
throughput.

Command Line Interface to HDFS


HDFS is the primary or major component of the Hadoop ecosystem which is responsible for storing large
data sets of structured or unstructured data across various nodes and thereby maintaining the metadata in
the form of log files. To use the HDFS commands, first you need to start the Hadoop services using the
following command:
sbin/start-all.sh
To check the Hadoop services are up and running use the following command:
jps
Commands:
ls: This command is used to list all the files. Use lsr for recursive approach. It is useful when we want a
hierarchy of a folder.
Syntax:
bin/hdfs dfs -ls <path>
Example:

3
bin/hdfs dfs -ls /
mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So lets first create it.
Syntax:
bin/hdfs dfs -mkdir <folder name>
creating home directory:
hdfs/bin -mkdir /user
hdfs/bin -mkdir /user/username -> write the username of your computer
Example:
bin/hdfs dfs -mkdir /geeks => '/' means absolute path
bin/hdfs dfs -mkdir geeks2 => Relative path -> the folder will be
created relative to the home directory.
touchz: It creates an empty file.
Syntax:
bin/hdfs dfs -touchz <file_path>
Example:
bin/hdfs dfs -touchz /geeks/myfile.txt
copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is the most
important command. Local filesystem means the files present on the OS.
Syntax:
bin/hdfs dfs -copyFromLocal <local file path> <dest(present on hdfs)>
Example: .
bin/hdfs dfs -copyFromLocal ../Desktop/AI.txt /geeks
(OR)
bin/hdfs dfs -put ../Desktop/AI.txt /geeks
cat: To print file contents.
Syntax:
bin/hdfs dfs -cat <path>
Example:
bin/hdfs dfs -cat /geeks/AI.txt ->
copyToLocal (or) get: To copy files/folders from hdfs store to local file system.
Syntax:
bin/hdfs dfs -copyToLocal <<srcfile(on hdfs)> <local file dest>
Example:

4
bin/hdfs dfs -copyToLocal /geeks ../Desktop/hero
(OR)
bin/hdfs dfs -get /geeks/myfile.txt ../Desktop/hero
moveFromLocal: This command will move file from local to hdfs.
Syntax:
bin/hdfs dfs -moveFromLocal <local src> <dest(on hdfs)>
Example:
bin/hdfs dfs -moveFromLocal ../Desktop/cutAndPaste.txt /geeks
cp: This command is used to copy files within hdfs. Lets copy folder geeks to geeks_copied.
Syntax:
bin/hdfs dfs -cp <src(on hdfs)> <dest(on hdfs)>
Example:
bin/hdfs -cp /geeks /geeks_copied
mv: This command is used to move files within hdfs. Lets cut-paste a file myfile.txt from geeks folder to
geeks_copied.
Syntax:
bin/hdfs dfs -mv <src(on hdfs)> <src(on hdfs)>
Example:
bin/hdfs -mv /geeks/myfile.txt /geeks_copied
rmr: This command deletes a file from HDFS recursively. It is very useful command when you want to
delete a non-empty directory.
Syntax:
bin/hdfs dfs -rmr <filename/directoryName>
Example:
bin/hdfs dfs -rmr /geeks_copied -> It will delete all the content inside the
directory then the directory itself.
du: It will give the size of each file in directory.
Syntax:
bin/hdfs dfs -du <dirName>
Example:
bin/hdfs dfs -du /geeks
dus:: This command will give the total size of directory/file.
Syntax:

5
bin/hdfs dfs -dus <dirName>
Example:
bin/hdfs dfs -dus /geeks
stat: It will give the last modified time of directory or path. In short it will give stats of the directory or file.
Syntax:
bin/hdfs dfs -stat <hdfs file>
Example:
bin/hdfs dfs -stat /geeks
setrep: This command is used to change the replication factor of a file/directory in HDFS. By default it is
3 for anything which is stored in HDFS (as set in hdfs core-site.xml).
Example 1: To change the replication factor to 6 for geeks.txt stored in HDFS.
bin/hdfs dfs -setrep -R -w 6 geeks.txt
To change the replication factor to 4 for a directory geeksInput stored in HDFS.
bin/hdfs dfs -setrep -R 4 /geeks
Count: This command is used to count the number of directors, files inside the given directory. It is also
shows file sizes of the directory.
Syntax:
bin/hdfs dfs -count<path>
Example:
bin/hdfs dfs -count /geeks
Anatomy of File Read and Write
Big data is nothing but a collection of data sets that are large, complex, and which are difficult to store and
process using available data management tools or traditional data processing applications. Hadoop is a
framework (open source) for writing, running, storing, and processing large datasets in parallel and
distributed manner. It is a solution that is used to overcome the challenges faced by big data.
Some of the characteristics of HDFS are:
● Fault-Tolerance
● Scalability
● Distributed Storage
● Reliability
● High availability
● Cost-effective
● High throughput
6
File read operation

7
File write operation

Java Interface to Hadoop


8
Java interface used for accessing Hadoop file system. In order to interact with Hadoop file system
programmatically. Hadoop provides multiple Java classes. Package named org.apache.hadoop.fs contains
classes useful in manipulations of a file in Hadoop file system.
HDFS cluster primarily consists of a NameNode that manages file system metadata and a DataNode that
stores the actual data.
A file in a Hadoop filesystem is represented by a Hadoop Path object. FileSystem is a
general filesystem API, so the first step is to retrieve an instance for the filesystem we
want to useHDFS, in this case. There are several static factory methods for getting a
FileSystem instance
public static FileSystem get(Configuration conf) throws IOException
public static FileSystem get(URI uri, Configuration conf) throws IOException
public static FileSystem get(URI uri, Configuration conf, String user) throws IOException
public static LocalFileSystem getLocal(Configuration conf) throws IOException

A Configuration object encapsulates a client or servers configuration, which is set using


configuration files read from the classpath, such as core-site.xml. The first method
returns the default filesystem (as specified in core-site.xml, or the default local
filesystem if not specified there). The second uses the given URIs scheme and authority
to determine the filesystem to use, falling back to the default filesystem if no scheme is
specified in the given URI. The third retrieves the filesystem as the given user, which
is important in the context of security.The fourth one retrieves a local filesystem
instance.
With a FileSystem instance in hand, we invoke an open() method to get the input stream
for a file.The first method uses a default buffer size of 4 KB.The second one gives an
option to user to specify the buffer size.
public FSDataInputStream open(Path f) throws IOException
public abstract FSDataInputStream open(Path f, int bufferSize) throws IOException

FSDataInputStream

9
The open() method on FileSystem actually returns an FSDataInputStream rather than a
standard java.io class. This class is a specialization of java.io.DataInputStream with
support for random access, so you can read from any part of the stream:
package org.apache.hadoop.fs;
public class FSDataInputStream extends DataInputStream implements Seekable, PositionedReadable {
}

The Seekable interface permits seeking to a position in the file and provides a query
method for the current offset from the start of the file (getPos()) . Calling seek() with a
position that is greater than the length of the file will result in an IOException. Unlike
the skip() method of java.io.InputStream, which positions the stream at a point later
than the current position, seek() can move to an arbitrary,
absolute position in the file.
public interface Seekable {
void seek(long pos) throws IOException;
long getPos() throws IOException;
}

FSDataInputStream also implements the PositionedReadable interface for reading parts


of a file at a given offset.
public interface PositionedReadable {
public int read(long position, byte[] buffer, int offset, int length) throws IOException;
public void readFully(long position, byte[] buffer, int offset, int length) throws IOException;
public void readFully(long position, byte[] buffer) throws IOException;
}

The read() method reads up to length bytes from the given position in the file into the
buffer at the given offset in the buffer. The return value is the number of bytes actually
read; callers should check this value, as it may be less than length.The readFully()
methods will read length bytes into the buffer, unless the end of the file is reached, in
which case an EOFException is thrown.
Finally, bear in mind that calling seek() is a relatively expensive operation and should
be done sparingly. You should structure your application access patterns to rely on

10
streaming data (by using MapReduce, for example) rather than performing a large
number of seeks.
Writing Data
The FileSystem class has a number of methods for creating a file. The simplest is the
method that takes a Path object for the file to be created and returns an output stream to
write to.
public FSDataOutputStream create(Path f) throws IOException

There are overloaded versions of this method that allow you to specify whether to
forcibly overwrite existing files, the replication factor of the file, the buffer size to use
when writing the file, the block size for the file, and file permissions.
Note : The create() methods create any parent directories of the file to be written that
dont already exist. Though convenient, this behavior may be unexpected. If you want
the write to fail when the parent directory doesnt exist, you should check for the
existence of the parent directory first by calling the exists() method. Alternatively, use
FileContext, which allows you to control whether parent directories are created or not.
Theres also an overloaded method for passing a callback interface, Progressable, so
your application can be notified of the progress of the data being written to the
datanodes.
package org.apache.hadoop.util;
public interface Progressable {
public void progress();
}

As an alternative to creating a new file, you can append to an existing file using the
append() method (there are also some other overloaded versions)
public FSDataOutputStream append(Path f) throws IOException
The append operation allows a single writer to modify an already written file by opening
it and writing data from the final offset in the file. With this API, applications that
produce unbounded files, such as logfiles, can write to an existing file after having

11
closed it. The append operation is optional and not implemented by all Hadoop
filesystems.Below example shows how to copy a local file to a Hadoop filesystem. We
illustrate progress by printing a period every time the progress() method is called by
Hadoop, which is after each 64 KB packet of data is written to the datanode pipeline.
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
public class CopyFileWithProgress {
public static void main(String[] args) throws Exception {
String localSrc = args[0];
String dst = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4096, true);
}
}

FSDataOutputStream
The create() method on FileSystem returns an FSDataOutputStream, which, like
FSDataInputStream, has a method for querying the current position in the file:
package org.apache.hadoop.fs;
public class FSDataOutputStream extends DataOutputStream implements Syncable {
12
public long getPos() throws IOException {
}
}

However, unlike FSDataInputStream, FSDataOutputStream does not permit seeking.


This is because HDFS allows only sequential writes to an open file or appends to an
already written file. In other words, there is no support for writing to anywhere other
than the end of the file, so there is no value in being able to seek while writing.
FileSystem provides a method to create a directory also
public boolean mkdirs(Path f) throws IOException

This method creates all of the necessary parent directories if they dont already exists
and returns true if its success full.
Hadoop File System
The Hadoop Distributed File System (HDFS) is the primary data storage system used
by Hadoop applications. HDFS employs a NameNode and DataNode architecture to
implement a distributed file system that provides high-performance access to data
across highly scalable Hadoop clusters.
Hadoop itself is an open source distributed processing framework that manages data
processing and storage for big data applications. HDFS is a key part of the many
Hadoop ecosystem technologies. It provides a reliable means for managing pools of big
data and supporting related big data analytics applications.
How does HDFS work?
HDFS enables the rapid transfer of data between compute nodes. At its outset, it was
closely coupled with MapReduce, a framework for data processing that filters and
divides up work among the nodes in a cluster, and it organizes and condenses the results
into a cohesive answer to a query. Similarly, when HDFS takes in data, it breaks the
information down into separate blocks and distributes them to different nodes in a
cluster.

13
With HDFS, data is written on the server once, and read and reused numerous times
after that. HDFS has a primary NameNode, which keeps track of where file data is kept
in the cluster.
HDFS also has multiple DataNodes on a commodity hardware cluster -- typically one
per node in a cluster. The DataNodes are generally organized within the same rack in
the data center. Data is broken down into separate blocks and distributed among the
various DataNodes for storage. Blocks are also replicated across nodes, enabling highly
efficient parallel processing.
The NameNode knows which DataNode contains which blocks and where the
DataNodes reside within the machine cluster. The NameNode also manages access to
the files, including reads, writes, creates, deletes and the data block replication across
the DataNodes.
The NameNode operates in conjunction with the DataNodes. As a result, the cluster can
dynamically adapt to server capacity demand in real time by adding or subtracting
nodes as necessary.
The DataNodes are in constant communication with the NameNode to determine if the
DataNodes need to complete specific tasks. Consequently, the NameNode is always
aware of the status of each DataNode. If the NameNode realizes that one DataNode isn't
working properly, it can immediately reassign that DataNode's task to a different node
containing the same data block. DataNodes also communicate with each other, which
enables them to cooperate during normal file operations.
Moreover, the HDFS is designed to be highly fault-tolerant. The file system replicates
-- or copies -- each piece of data multiple times and distributes the copies to individual
nodes, placing at least one copy on a different server rack than the other copies.
Replica placement
HDFS as the name says is a distributed file system which is designed to store large files. A large file
is divided into blocks of defined size and these blocks are stored across machines in a cluster. These

14
blocks of the file are replicated for reliability and fault tolerance. For better reliability Hadoop
framework has a well defined replica placement policy.
Rake aware replica placement policy
Large HDFS instances run on a cluster of computers that commonly spread across many racks so rack
awareness is also part of the replica placement policy in Hadoop.
If two nodes placed in different racks have to communicate that communication has to go through
switches.
If machines are on the same rack then network bandwidth between those machines is generally greater
than the network bandwidth between machines in different racks.
HDFS replica placement policy
Taking rank awareness and fault tolerance into consideration the replica placement policy followed by
Hadoop framework is as follows-
For the default case, when the replication factor is three
1. Put one replica on the same machine where the client application (application which is
using the file) is, if the client is on a DataNode. Otherwise choose a random datanode for
storing the replica.
2. Store another replica on a node in a different (remote) rack.
3. The last replica is also stored on the same remote rack but the node where it is stored
is different.
In case replication factor is greater than 3, for the first 3 replicas policy as described above is followed.
From replica number 4 onward node location is determined randomly while keeping the number of
replicas per rack below the upper limit (which is basically (replicas - 1) / racks + 2).
HDFS Replication pipelining
While replicating blocks across DataNodes, pipelining is used by HDFS. Rather than client writing to
all the chosen DataNodes data is pipelined from one DataNode to the next.
For the default replication factor of 3 the replication pipelining works as follows-
The NameNode retrieves a list of DataNodes that will host the replica of a block. Client gets this list
of 3 DataNodes from NameNode and writes to the first DataNode in the list. The first DataNode starts
receiving the data in portions, writes each portion to its local storage and then transfers that portion to
the second DataNode in the list. The Second DataNode follows the same procedure writes the portion
to its local storage and transfers the portion to the third DataNode in the list.

15
For replication factor of 3 following image shows the placement of replicas.

Coherency Modal

16
17
Parallel coping with Discp

18
19
20
Keeping an HDFS cluster Balanced

● Hadoop is open source frame work. To store and process bigdata in distributed and parallel
environment at that time at storing data in distributed environment use HDFS cluster.
● Initially all HDFS cluster sizes are equal.
● The usage of HDFS clusters the clusters sizes are unbalanced that means to add data nodes on
cluster, remove or delete datanodes on clusters.
● Some clusters datanodes are almost used while some other cluster datanodes are empty.
Why HDFS data becomes undalanced
The following are the reasons for HDFS clusters are unbalanced.
🡺 Addition of datanodes.
🡺 Block allocation in HDFS.

21
🡺 Behaviour of the client applications.

To overcome above problems Hadoop system introduced one tool i.,e HDFS balancer.

● HDFS balancer is command line interface tool for balancing the data accoross the storage
devices of a HDFS clusters.
● Usage of HDFS clusters the clusters sizes are unbalanced ,because adding data nodes to clusters
removing datanodes from clusters copying datanodes from clusters.
● Hadoop provider the balancer to redistribute the data[ Threshold].
● To run a cluster balancing utility we run the following command.
NOTE
● Balancer does not balancer between individual datanodes on HDFS cluster.
● Threshold- difference to total utilization of every datanodes to total utilization of HDFS.
● Threshold default configurable value 10.

22

You might also like