You are on page 1of 3

MapReduce

& Pig & Spark

Ioanna Miliou
Giuseppe Attardi

Advanced Programming
Università di Pisa
Hadoop
• The Apache™ Hadoop® project develops open-source software for
reliable, scalable, distributed computing.

• Framework that allows for the distributed processing of large data sets
across clusters of computers using simple programming models.

• It is designed to scale up from single servers to thousands of machines,


each offering local computation and storage.

• It is designed to detect and handle failures at the application layer.

The core of Apache Hadoop consists of a storage part, known as Hadoop


Distributed File System (HDFS), and a processing part called MapReduce.
Hadoop
• The project includes these modules:

– Hadoop Common: The common utilities that support the other


Hadoop modules.

– Hadoop Distributed File System (HDFS): A distributed file


system that provides high-throughput access to application
data.

– Hadoop YARN: A framework for job scheduling and cluster


resource management.

– Hadoop MapReduce: A YARN-based system for parallel


processing of large data sets.

You might also like