You are on page 1of 18

Module 1

Introduction to Big
Data
Big Data Analytics
BEITC802 Prof. Priyanka Bandagale
FAMT, Ratnagiri
Learning Objectives
• In this lesson you will learn about:
• Characteristics of Big Data
• The V’s of Big Data?
• Types of Big Data
• The impact of Big Data
Prerequisite for the course
• Database
• DBMS
• Data Mining
Evolution of Big Data
• 1970s and Before – Mainframes- Basic Data Storage
Primitive and Structured

• 1980s and 1990s (Relational)- Data Intensive Applications


Data Utilization (Machine
Learning, Image Processing)
• 2000s and beyond – Data Driven (Structured, Unstructured
Data)
Introduction to Big Data

What is Big Data?


What makes data, “Big” Data?
Big Data Definition
• No single standard definition…
“Big Data” is data whose scale, diversity, and complexity
require new architecture, techniques, algorithms, and
analytics to manage it and extract value and hidden
knowledge from it…

6
Facts and Figures
• Walmart handles 1 million customer transaction per hours
• Facebook handles 40 million, billion photos from its user base.
• Amazon Prime that offers, videos, music, and Kindle books in a
one-stop shop is also big on using big data.
• With over 100 million subscribers, the company collects huge
data, which is the key to achieving the industry status Netflix
boosts. Netflix used predictive data analysis to craft its show
House of Cards since the data validated that it’d be a hit with
consumers.

“Big Data Computation”


Lots of data
• 2.5 quintillion bytes of data are generated every day!
• A quintillion is 1018
• Data come from many quarters.
• Social media sites
• Sensors
• Digital photos
• Business transactions
• Location-based data

Source: IBM http://www-01.ibm.com/software/data/bigdata/


Characteristics of Big Data

• Big Data is high-volume, high-velocity, and/or high-variety


information assets that demand cost-effective, innovative
forms of information processing that enable enhanced insight,
decision making and process automation.

• There is no one definition of Big Data, but there are certain


elements that are common across the different definitions,
• such as velocity, volume, variety, and veracity.
• These are the V's of Big Data.
Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
• 44x increase from 2009 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Bits-> Bytes -> Kilobytes-> Megabytes -> Gigabytes ->
Terabytes -> Petabytes -> Exabytes -> Zettabytes ->
Yottabytes.

• Internal Data Source


• External Data Source
10
• Both
Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media
data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of
data

To extract knowledge all these types of


data need to linked together 11
Characteristics of Big Data:
3-Speed (Velocity)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions  missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase history, what
you like  send promotions right now for store next to you

• Healthcare monitoring: sensors monitoring your activities and body 


any abnormal measurements require immediate reaction

12
Other Characteristics of Big
Data
Veracity Value Volatility
Type of Big Data
• Structured Data
• Unstructured Data
• Semi-structured Data
Structured Data

Database Such as
Oracle, DB2, MySQL
etc.

Structured Spreadsheets
Data

OLTP System

Figure:- Sources of Structured Data


Semi Structured Data

XML

JSON
Semi-Structured Data

Other
Markup
Languages

• Figure:- Sources of Semi-Structured Data


Unstructured Data
Web
Pages

Images

Audios
Unstructured
Data
Videos

Social
Media
Data

Chats
• Figure:- Sources of Unstructured Data
Any Questions?

You might also like