You are on page 1of 10

Historic Problems with Data Collection

DATA 1201 – Module 3 - Week 6


Topics to learn & discuss

Notifications

Key Terms and Concepts

Summary & To-do List


Key terms and Concepts. Continue

Map Reduce
MapReduce is a programming model and an associated
implementation for processing and generating big data sets
with a parallel, distributed algorithm on a cluster.
Key terms and Concepts. Continue

http://www.youtube.com/watch?v=vADJy7ZtopY
Key terms and Concepts

Map Reduce Example


Key terms and Concepts

Check the following online


detailed Map reduce Video
in Week 5&6 Folder
Map Reduce Example
Key terms and Concepts

Hadoop
Hadoop is a software framework for distributed processing of
large data sets on compute clusters of commodity hardware

🐘 based on the MapReduce programming model. Hadoop is


a collection of programs that take care of scheduling
tasks, monitoring them and re-executing any failed
tasks.
Key terms and Concepts. Continue

https://www.youtube.com/watch?v=aReuLtY0YMI
Key terms and Concepts. Continue
Hadoop & MapReduce
Hadoop and MapReduce is currently the BEST attempt to avoid the CAP theorem.

Hadoop tries to provide eventual Consistency (if you wait long enough the answer will be consistent),
Availability, and Partitioning.

Hadoop does this through TWO big differences with other data stores:
1.Data is repeated many times across many different servers
2.Queries are performed using Java code that breaks a larger job down into many smaller tasks that can
each be run on a separate node.
Key terms and Concepts. Continue

You might also like