Professional Documents
Culture Documents
HDFS
AVISO
Este documento foi gerado a partir de um material de estudo
da Huawei. Considere as informações nesse documento
como material de apoio.
� Highly Fault-tolerant
� Is developed based on Google File System (GFS) and runs on commodity hardware.
� The failure of components is frequent. That is why high fault-tolerance makes sure that
we can still access the file system if one node fails.
� The time to read the whole dataset is more important than the latency in reading the
first record
� HDFS is inapplicable to store massive small files
o The metadata of a file system is loaded to memory of a specific server called
NameNode. So the number of files that can be stored in the system is restricted
by NameNode memory capacity, the directory and data block of a file each is
about 150 bytes in size.
� HDFS is inapplicable to random write
o Files in HDFS may be written by a single writer. Writes are always made at the
end of the file. There is no support for multiple writers or for arbitrary modifica-
tions in the file.
� HDFS is inapplicable for low-delay read
o Applications that require low latency access to data in tens of milliseconds
range do not work well with HDFS.
� HDFS is applicable to store large files; Streaming Data Access
� HDFS is optimized for delivering a high throughput of data and this may bring high
delay.
� NameNode: Used to store and generate metadata of the file system. It also knows the
DataNodes on which all the blocks for a given file are located.
o HDFS RUNS ONLY ONE NAMENODE INSTANCE (not considering HDFS Federa-
tion technique)
� A service Application invokes API provided by HDFS Client to request for data writing
� The HDFS client creates the file by calling the create method of the distributed file sys-
tem;
� Distributed filesystem send an RPC request to the NameNode to create a new file in the
namespace
� A service application invokes an API provided by the HDFS client to open a file
� The HDFS client opens a file by calling the open method of Distributed Filesystem ob-
ject
� Distributed Filesystem sends an RPC request to the NameNode to locate the blocks of
a file to be read;
� Distributed Filesystem returns FsDataInputStream to the HDFS client for the client to
read data
� The HDFS client connects to multiple DataNode according to the information obtained
from the NameNode
� After data reading is complete, the service application invokes the close API to close the
connection
� The NameNode keeps a reference to every file and block in the file system in memory
which means that on very large clusters with many files, the memory becomes a limit-
ing factor for scaling.
� HDFS federation allows a cluster to scale by adding NameNodes, each one manages a
portion of the file system Namespace; For example, one NameNode might manage all
the files rootage under the user directory and second NameNode might handle files
under the database directory and each NameNode has its corresponding standby Na-
meNodes.
� Under federation, each NameNode manages a Namespace volume, which is made up
of the metadata for the Namespace and a block pool containing all the blocks for the
files in the Namespace.
� By default, the HDFS NameNode automatically selects DataNodes to store data repli-
cas. But we can configure the way of HDFS data storage based on the actual needs.
� Store data in different storage devices (RAM_DISK, DISK, ARCHIVE, SSD) – Layered Stor-
age: Storage policies for different scenarios are formulated by combining the four types
of storage devices
� Store data with labels – Tag Storage: Store data with tags. Users can flexibly configure
HDFS data block policies based on service requirements and data features. Set one tag
for each HDFS directory and one or more tags for each DataNode.
� Store data in highly reliable node groups – Node Group Storage:
o Force to store key data in highly reliable nodes
o The system can force to save data to specified node groups by modifying
DataNode Storage Policies.
o The first replica is written to a mandatory rack group. If there is no available
node in this rack group, data write fails
o The second replica is written to a random node from a non-mandatory rack
group where a local client is located. If the local client is located in the manda-
tory rack group, then, the second replica is written to a node from other rack
groups and the third replica is written to a node of other rack groups.
o If the number of replicas is greater than that of available rack groups extra repli-
cas are stored in random rack groups.
� Colocation: To store associated data or data that is going to be associated with the
same storage node
The recycle bin mechanism is provided and the number of replicas can be dynamically set.