You are on page 1of 12

Big Data Huawei Course

HDFS
AVISO
Este documento foi gerado a partir de um material de estudo
da Huawei. Considere as informações nesse documento
como material de apoio.

Centro de Inovação EDGE - Big Data Course


Table of Contents

1. HDFS – Hadoop Distributed File System ....................................................................1

2. Basic System Architecture.................................................................................................1

3. HDFS Data write process...................................................................................................2

4. HDFS Data Read Process...................................................................................................3

5. Key Design of HDFS Architecture ..................................................................................4

5.1. HDFS High Availability (HA) ..................................................................................4

5.2. Metadata persistence ..............................................................................................5

5.3. HDFS Federation .......................................................................................................6

5.4. Data Replication ........................................................................................................6

5.5. Data Storage Policies ...............................................................................................7

5.6. Unified file system ....................................................................................................7


5.7. Space reclamation ....................................................................................................8

5.8. Data organization .....................................................................................................8

5.9. Access Mode...............................................................................................................8

Centro de Inovação EDGE - Big Data Course


HDFS - Huawei Course

1. HDFS – Hadoop Distributed File System

� Highly Fault-tolerant
� Is developed based on Google File System (GFS) and runs on commodity hardware.
� The failure of components is frequent. That is why high fault-tolerance makes sure that
we can still access the file system if one node fails.
� The time to read the whole dataset is more important than the latency in reading the
first record
� HDFS is inapplicable to store massive small files
o The metadata of a file system is loaded to memory of a specific server called
NameNode. So the number of files that can be stored in the system is restricted
by NameNode memory capacity, the directory and data block of a file each is
about 150 bytes in size.
� HDFS is inapplicable to random write
o Files in HDFS may be written by a single writer. Writes are always made at the
end of the file. There is no support for multiple writers or for arbitrary modifica-
tions in the file.
� HDFS is inapplicable for low-delay read
o Applications that require low latency access to data in tens of milliseconds
range do not work well with HDFS.
� HDFS is applicable to store large files; Streaming Data Access
� HDFS is optimized for delivering a high throughput of data and this may bring high
delay.

2. Basic System Architecture

� NameNode: Used to store and generate metadata of the file system. It also knows the
DataNodes on which all the blocks for a given file are located.
o HDFS RUNS ONLY ONE NAMENODE INSTANCE (not considering HDFS Federa-
tion technique)

Centro de Inovação EDGE - Big Data Course 1


� DataNode is used to store actual data and report back to the NameNode with data
blocks under its management. They also perform operations such as block create,
delete, and replicate according to the instructions of the NameNode.
o HDFS CAN RUN MULTIPLE DATANODES INSTANCES
o Inside a DataNode, there are blocks which are the minimum amount of data
that HDFS can read or write
o The file in a file system will be divided into one or more segments and stored in
individual DataNodes. These file segments are called s-blocks, the default
block size is 128MB, but it can be changed in HDFS configuration according to
the need.
� The client allows services to access HDFS and returns data obtained from NameNode
and DataNode to services
o A HDFS RUNS MULTIPLE CLIENT INSTANCES

3. HDFS Data write process

� A service Application invokes API provided by HDFS Client to request for data writing
� The HDFS client creates the file by calling the create method of the distributed file sys-
tem;
� Distributed filesystem send an RPC request to the NameNode to create a new file in the
namespace

Centro de Inovação EDGE - Big Data Course 2


� The distributed filesystem also return FSDataOutputStream to the client for writing data
into it.
� After receiving the service data the HDFS client obtains the data block number and
location from NameNode, connects to DataNode, and establishes a pipeline of DataN-
odes where data needs to be written to.
� Then, the HDFS client uses the proprietary protocol to write data to DataNode1 and
copy DataNode1 toDataNode2 and DataNode2 to DataNode3
� After data writing is complete, the DataNodes send a confirmation message back to the
HDFS client. After all data writing is confirmed, the service invokes the HDFS Client to
close the file and connect to the NameNode to confirm that the data writing is com-
plete.

4. HDFS Data Read Process

� A service application invokes an API provided by the HDFS client to open a file
� The HDFS client opens a file by calling the open method of Distributed Filesystem ob-
ject
� Distributed Filesystem sends an RPC request to the NameNode to locate the blocks of
a file to be read;
� Distributed Filesystem returns FsDataInputStream to the HDFS client for the client to
read data
� The HDFS client connects to multiple DataNode according to the information obtained
from the NameNode
� After data reading is complete, the service application invokes the close API to close the
connection

Centro de Inovação EDGE - Big Data Course 3


5. Key Design of HDFS Architecture

5.1. HDFS High Availability (HA)

� Mainly reflected in Active/Standby NameNode elected by Zookeeper to solve NameN-


ode failure problems
� Zookeeper is mainly used to store status files and active/standby status information.
� ZKFC (Zookeeper Failover Controller) is used for monitoring the active/standby status
of NameNode.

Centro de Inovação EDGE - Big Data Course 4


5.2. Metadata persistence

� Metadata persistence is achieved through a process called checkpoint


� First, the standby NameNode informs the active NameNode to generate a new log file
called "EditLog.new" so that the active NameNode can continue recording logs to the
"EditLog.new" file and the standby NameNode can obtain the old Editlog from the Ac-
tive NameNode and download the FSImage which stores the file mirroring periodically
from the active NameNode.
� The standby NameNode merges these two and generates a new metadata file called
"FSImage.ckpt". Then the standby NameNode uploads the new metadata file to the
active NameNode.
� The name of the file is changed to FSImage to overwrite the original FSImage file and
the "EditLog.new" file is also renamed as "EditLog".
� NameNode triggers the metadata persistence operation every hour or when the Edit-
log reaches 64MB.

Centro de Inovação EDGE - Big Data Course 5


o

5.3. HDFS Federation

� The NameNode keeps a reference to every file and block in the file system in memory
which means that on very large clusters with many files, the memory becomes a limit-
ing factor for scaling.
� HDFS federation allows a cluster to scale by adding NameNodes, each one manages a
portion of the file system Namespace; For example, one NameNode might manage all
the files rootage under the user directory and second NameNode might handle files
under the database directory and each NameNode has its corresponding standby Na-
meNodes.
� Under federation, each NameNode manages a Namespace volume, which is made up
of the metadata for the Namespace and a block pool containing all the blocks for the
files in the Namespace.

5.4. Data Replication

� HDFS is highly fault-tolerant, this is achieved by the data replication mechanism


� HDFS replicates file blocks and stores in different DataNodes

Centro de Inovação EDGE - Big Data Course 6


� An application can specify the number of replicas of a file at the line itʼs created. This
number can be changed any time after that
� Uses an intelligent replica placement policy, which places the first replica on the same
node as the client. The second replica is placed on a remote rack. The third replica is
placed on another rack if the first two replicas are placed on the same rack, otherwise,
the third replica and the first replica are placed on different nodes in the same rack.
Other replicas are randomly placed on the cluster.

5.5. Data Storage Policies

� By default, the HDFS NameNode automatically selects DataNodes to store data repli-
cas. But we can configure the way of HDFS data storage based on the actual needs.
� Store data in different storage devices (RAM_DISK, DISK, ARCHIVE, SSD) – Layered Stor-
age: Storage policies for different scenarios are formulated by combining the four types
of storage devices
� Store data with labels – Tag Storage: Store data with tags. Users can flexibly configure
HDFS data block policies based on service requirements and data features. Set one tag
for each HDFS directory and one or more tags for each DataNode.
� Store data in highly reliable node groups – Node Group Storage:
o Force to store key data in highly reliable nodes
o The system can force to save data to specified node groups by modifying
DataNode Storage Policies.
o The first replica is written to a mandatory rack group. If there is no available
node in this rack group, data write fails
o The second replica is written to a random node from a non-mandatory rack
group where a local client is located. If the local client is located in the manda-
tory rack group, then, the second replica is written to a node from other rack
groups and the third replica is written to a node of other rack groups.
o If the number of replicas is greater than that of available rack groups extra repli-
cas are stored in random rack groups.
� Colocation: To store associated data or data that is going to be associated with the
same storage node

5.6. Unified file system

HDFS presents itself as one unified file system externally

Centro de Inovação EDGE - Big Data Course 7


5.7. Space reclamation

The recycle bin mechanism is provided and the number of replicas can be dynamically set.

5.8. Data organization

Data is stored by blocks in the HDFS

5.9. Access Mode

Data can be accessed through Java APIs, HTTP, or shell commands.

Centro de Inovação EDGE - Big Data Course 8


Centro de Inovação EDGE - Big Data Course

You might also like