You are on page 1of 2

Previous year last question

DFS blocks are large compared to disk blocks, because to minimize the cost of
seeks. If we have many smaller size disk blocks, the seek time would be
maximum (time spent to seek/look for an information). And also, having
multiple small sized blocks is the burden on name node/master, as ultimately
the name node stores metadata, so it has to save this disk block information.
If the Data Block is large enough, the time it takes to transfer the data from the
disk can be significantly longer than the time to seek to the start of the block.
Thus, transferring a large file made of multiple blocks operates at the disk
transfer rate.
For each block we need a Mapper. So, in the case of small-sized blocks, there
will be a lot of Mappers. Each will be processing the data, which isn’t efficient.

Diff b/w HDFS and network attacked storage

 1) HDFS is the primary storage system of Hadoop.


HDFS designs to store very large files running on a cluster
of commodity hardware.
Network-attached storage (NAS) is a file-level computer
data storage server.
NAS provides data access to a heterogeneous group of
clients.

2) HDFS distribute blocks across all the machines in a


Hadoop cluster.
NAS data stores on a dedicated hardware.

3) HDFS is designed to work


with MapReduce Framework.
In MapReduce Framework computation move to the data
instead of Data to computation.
NAS is not suitable for MapReduce, as it stores data
separately from the computations.
 September 20, 2018 at 4:03 pm#5730

DataFlair Team
1)NAS stands for Network Attached storage which is a
file-level computer data storage server connected to a
computer network providing network access to
heterogeneous group of clients
HDFS stands for Hadoop distributed file system which is
a java based file system that provides scalable and reliable
data storage and is designed to span large clusters of
commodity hardware.
2)In HDFS data blocks are distributed across the local
drives of all machines in a cluster whereas in NAS data is
stored on a dedicated server.

3)HDFS includes commodity hardware which will be


cost-effective, but NAS is a high-end storage device
which is expensive.

4)It includes features like rack-awarenessHDFS, data


locality which makes it more scalable and effective then
NAS.

You might also like