You are on page 1of 27

Big Data

What is Big Data


• Big data as name suggests, it represents massive amount of data which is
generated from different sources in different formats such as structured, semi
structured and unstructured which can't be handled by traditional tools.

• Example: Let's take a simple example Netflix


• Netflix has over 190 million subscribers. The company collects huge amount of
data everyday from it's subscribers. Based on the past search and watch data it
sends you suggestions of the next movie you can watch.
Classification of Big Data
• There are 5 V's of big data:
• Volume: Amount of data
• Velocity: Speed at which data is generated
• Variety: Different types/forms of data
• Veracity: Accuracy of data
• Value: Usability of data
5 V’s of Big Data
1. Volume
Volume, the first of the 5 V's of big data, refers to the amount of data that exists. Volume is like the
base of big data, as it is the initial size and amount of data that is collected. If the volume of data is
large enough, it can be considered big data. What is considered to be big data is relative, though, and
will change depending on the available computing power that's on the market.
2. Velocity
The next of the 5 V's of big data is velocity. It refers to how quickly data is generated and how
quickly that data moves. This is an important aspect for companies that need their data to flow
quickly, so it's available at the right times to make the best business decisions possible.
3. Variety
The next V in the five 5 V's of big data is variety. Variety refers to the diversity of data types. An
organization might obtain data from a number of different data sources, which may vary in value.
Data can come from sources in and outside an enterprise as well. The challenge in variety concerns
the standardization and distribution of all data being collected.
5 V’s of Big Data
4. Veracity
Veracity is the fourth V in the 5 V's of big data. It refers to the quality and accuracy of data. Gathered
data could have missing pieces, may be inaccurate or may not be able to provide real, valuable
insight. Veracity, overall, refers to the level of trust there is in the collected data.
5. Value
The last V in the 5 V's of big data is value. This refers to the value that big data can provide, and it
relates directly to what organizations can do with that collected data. Being able to pull value from
big data is a requirement, as the value of big data increases significantly depending on the insights
that can be gained from them.
Big Data Framework
• Nowadays, there’s probably no single Big Data software that wouldn’t be able to process
enormous volumes of data. Special Big Data frameworks have been created to implement and
support the functionality of such software. They help rapidly process and structure huge chunks of
real-time data.
Hadoop Framework
• Hadoop is great for reliable, scalable, distributed calculations. However, it can also be exploited as
common-purpose file storage. It can store and process petabytes of data. This solution consists of
three key components:

• HDFS file system, responsible for the storage of data in the Hadoop cluster;

• MapReduce system, intended to process large volumes of data in a cluster;

• YARN, a core that handles resource management.


Hadoop Framework
• HDFS: Hadoop distributed file system distributes data on different systems as well as create copy
of data on each system.

• MapReduce: Distributes the processing of data on your cluster.


• Divides your data up into partitions that are mapped(transformed) and reduced(aggregated) by
mapper and reducer functions as user define.

• YARN: Yet Another Resource Negotiation, a system that manages the resources on the cluster like
what nodes are available and when. what tasks need to be done by which node.
Hadoop Distributed File System

HDFS distributes a large file into


small chucks of data (default 128
MB) and distribute it over the
machines.

Hadoop Distributed File System


also creates copy of data on each
system
Hadoop Distributed File System
Map Reduce (Count no occurrence of words)

You might also like