You are on page 1of 2

Given 1 PB of data to migrate into Hadoop, tell me every aspect and task

involved in the migration


My humble attempt:

Understand vision - how the data will be used

Get data growth rate

Understand how future-proofed your solution should be - 2 years, 3


years etc

Assumed disk space on commodity hardware

Assumed replication factor - default of 3

Calculate the disk space for the projected years

Add 30% extra to the space calculation to allow for hadoop machiner
usage for MR and other aspects

Factor in NN server requirement and HA of the same

Factor in JT server requirements and HA of the same

Factor in Zookeeper requirement and HA of the same

Decide
format
of
date
storage
text
file,
sequence
file/AVRO/RCFile/Compression & codec - although this might impact
space requirement - minimize it

Factor in HBase master server requirement and HA of the same if


HBase is to run

Factor in racks/switches

Factor
in
number
of
environments
parallel/production..and cluster size

Plan out users/groups/authentication/authorization


encryption needed.

Plan the load tasks, and data integrity checks

dev/test/QA/prod
and

any

data

You might also like