Professional Documents
Culture Documents
Data Sizes
1024 bytes
= 1 KB
KB
Kilobyte
1024 KB
= 1 MB
MB
Megabyte
1024 MB
= 1 GB
GB
Gigabyte
1024 GB
= 1 TB
TB
Terabyte
1024 TB
= 1 PB
PB
Petabyte
Why Hadoop?
The data flow has been increased tremendously hence the storage and analysis of data has been
causing a problem to process in estimated time. When compared to previous years data size and
read and write operation speeds, we can say that a framework to handle data in parallel fashion
is much useful.
It took 20 years to reach 100 mbps transfer speed from 4.4mbps speed. But it is not considerable
when compared to disk sizes which increased from 1370 MB to TBs of current data storages.
So we need to write or transfer data in parallel fashion to attain reasonable speeds. One should
think of data loss when looking for such frameworks. Hadoop satisfies all these requirements by
providing replication factor of 3 by default to write in parallel fashion and data loss is almost
impossible to happen.
MapReduce
Data Size
GB
PB
Access
Batch
Updates
Structure
Static schema
Dynamic schema
Integrity
High
Low
Scaling
Non Linear
Linear