You are on page 1of 9

MAP REDUCE

MAP REDUCE

• MapReduce is a software framework and programming model used for processing huge
amounts of data.
• MapReduce program work in two phases, namely, Map and Reduce.
• Map tasks deal with splitting and mapping of data while reduce tasks shuffle and reduce
the data.
APPLICATIONS OF MAPREDUCE

• Analysis of Logs, data analysis, recommendation mechanisms,


fraud detection, user behavior analysis, scheduling problems,
resource planning among others, is applications that use
MapReduce.
PRACTICAL EXAMPLES OF MAPREDUCE

• Social Networks
• Entertainment
• Electronic Commerce
• Data Warehouse
• Search and Advertisement mechanisms
FEATURES OF HDFS

• HDFS is the Hadoop Distributed File System for storing large data ranging in size from
megabytes to Petabytes across multiple nodes in a Hadoop cluster.
• Features of HDFS are :
• Cost Effective
• Large Datasets
• Replication
• Fault tolerance and reliability
• High Availability
• Data Integrity
• High Throughput
FEATURES OF GFS

Google File System (GFS) is a scalable distributed file system created by Google Inc. and
developed to accommodate Google’s expanding data processing requirements,
Features of GFS are :
• GFS was designed for high fault tolerance.
• Master and chunk servers can be restarted in a few seconds and with such a fast
recovery capability, the window of time in which data is unavailable can be greatly
reduced.
• The shadow master handles the failure of the GFS master.
• For data integrity, GFS makes checksums on every 64kb block in each chunk.
• GFS can achieve the goals of high availability, high performance and implementation.
• It demonstrates how to support large scale processing workloads on commodity
hardware designed to tolerate frequent component failures optimized for huge files.
COMPARISON AMONG HDFS AND GFS

HDFS GFS
• HDFS is Cross-platform. • It works on Linux only.
• HDFS is written in java. • GFS is written in C, C++.
• Nodes are divided into two parts, • Nodes are divided into two parts,
namely, NameNodes & DataNodes. namely, MasterNodes & ChunkServers.
• Data that is written by a client is sent • ChunkServers use checksums to detect
to a pipeline of DataNodes and corruption of the stored data.
checksum is verified by the last
DataNode in the Pipeline

You might also like