You are on page 1of 5

Distributed storage & horizontal scalability

Increasing the no of systems & operate in Parallel.

Vertical scalability

Increasing the disk size & RAM


HDFS -> Used for storage-> distributed file system ->based on Name node[master] & multiple Data
nodes[slaves] based on the data size ->HDFS fault tolerance is based on 2 factors->Replication factor
& Block size -> Block Size by default is 64MB ->The total blocks required is (File Size)/(Default Block
Size) -> Each block will be distributed based on the Replication factor ->used for replication the same
dataset over N no of Datanodes, where N is the Replication factor.
MAP REDUCE ->Native support for Java ->framework used for processing of data->Mapper &
Reducer combination->Mapper used for parallel processing of the instructions input to the Map
Reduce framework-> distributes the instructions set among the Data nodes for parallel processing ->
Reducer will merge the results obtained from the parallel processed instruction from different data
nodes & aggregates(merge) them.

HIVE -> SQL query based support for analytics

PIG -> defined Functions support for analytics

SQOOP ->for Importing/Exporting data from DMS/RDBMS systems to HDFS

FLUME -> for Importing Streaming data from to HDFS

HBASE -> a NoSQL database ->Column based storage->Only Database supported by Hadoop

APACHE OOZIE -> scheduler to control the workflow of all the process

Overview of the Hadoop ecosystem

You might also like