You are on page 1of 9

Golam Kaderye

Lecturer (Dept. of CSE), IUS


Website: https://sites.google.com/view/golamkaderyeLecture No 1

Chapter 1: Introduction to Big Data Analytics

1.1 Big Data Overview


1.1.1 Data Structures
1.1.2 Analyst Perspective on Data Repositories
1.2 State of the Practice in Analytics
1.2.1 BI Versus Data Science
1.2.2 Current Analytical Architecture
1.2.3 Drivers of Big Data
1.2.4 Emerging Big Data Ecosystem and a New Approach to Analytics
1.3 Key Roles for the New Big Data Ecosystem
1.4 Examples of Big Data Analytics
Exercises

1.1 Big Data Overview

Data is created at every moment and it’s an ever-increasing rate.


Sources: Mobile phones, social media, medical diagnosis and create new data.
Data must be stored somewhere for some purpose.
Challenge: To analysis a vast amount of data.
Examples:
✓ Credit card companies
✓ Mobile phone companies
✓ Social media
Definition of big data:
Big data is data whose scale, distribution, diversity and timeliness require the use of new technical
architecture and analytics to enable insights that unlock new sources of business value.

Dimensions of big data is 3Vs:


✓ Volume (the amount of data)
✓ Variety (the number of types of data)
✓ Velocity (the speed of data processing)

Page 1 of 9
Golam Kaderye
Lecturer (Dept. of CSE), IUS
Website: https://sites.google.com/view/golamkaderyeLecture No 1

VOLUME
Within the Social Media space for example, Volume refers to the amount of data generated through
websites, portals and online applications. Especially for B2C companies, Volume encompasses the available
data that are out there and need to be assessed for relevance. Consider the following -Facebook has 2 billion
users, YouTube 1 billion users, Twitter 350 million users and Instagram 700 million users. Every day, these
users contribute to billions of images, posts, videos, tweets etc. You can now imagine the insanely large amount
-or Volume- of data that is generated every minute and every hour.

VELOCITY
With Velocity we refer to the speed with which data are being generated. Staying with our social media
example, every day 900 million photos are uploaded on Facebook, 500 million tweets are posted on Twitter, 0.4
million hours of video are uploaded on YouTube and 3.5 billion searches are performed in Google. This is like
a nuclear data explosion. Big Data helps the company to hold this explosion, accept the incoming flow of data
and at the same time process it fast so that it does not create bottlenecks.

VARIETY
Variety in Big Data refers to all the structured and unstructured data that has the possibility of getting
generated either by humans or by machines. The most commonly added data are structured -texts, tweets,
pictures & videos. However, unstructured data like emails, voicemails, hand-written text, ECG reading, audio
recordings etc., are also important elements under Variety. Variety is all about the ability to classify the
incoming data into various categories.

Page 2 of 9
Golam Kaderye
Lecturer (Dept. of CSE), IUS
Website: https://sites.google.com/view/golamkaderyeLecture No 1

FIGURE 1-1 What’s driving the data deluge

Page 3 of 9
Golam Kaderye
Lecturer (Dept. of CSE), IUS
Website: https://sites.google.com/view/golamkaderyeLecture No 1

FIGURE 1-2 Examples of what can be learned through genotyping, from 23andme.com

Page 4 of 9
Golam Kaderye
Lecturer (Dept. of CSE), IUS
Website: https://sites.google.com/view/golamkaderyeLecture No 1

1.1.1 Data Structures

✓ Big data can come in multiple forms.


✓ Most of the Big Data is unstructured or semi-structured in nature.
✓ To analyze architecture are:
✓ Distributed computing environment
✓ Massively Parallel Processing (MPP)
✓ RDBMS
✓ Unstructured data growth is 80% to 90%
✓ There are four types of data:
✓ Unstructured
✓ Quasi structured
✓ Semi structured
✓ Structured

Page 5 of 9
Golam Kaderye
Lecturer (Dept. of CSE), IUS
Website: https://sites.google.com/view/golamkaderyeLecture No 1

FIGURE 1-3 Big Data Growth is increasingly unstructured

Unstructured data:
Data that has no inherent structure.
Example:
Text documents
PDFs
Images
Video

FIGURE 1-7 Example of unstructured data: video about Antarctica expedition

Page 6 of 9
Golam Kaderye
Lecturer (Dept. of CSE), IUS
Website: https://sites.google.com/view/golamkaderyeLecture No 1

Quasi structured data:


Textual data with erratic data formats that can be formatted with effort, tools and time.
Example
Web clickstream data

FIGURE 1-6 Example of EMC Data Science search results

Page 7 of 9
Golam Kaderye
Lecturer (Dept. of CSE), IUS
Website: https://sites.google.com/view/golamkaderyeLecture No 1

Semi structured data:


Textual data files with a discernible pattern that enables parsing.
Example:
Extensible Markup Language (XML)

FIGURE 1-5 Example of semi-structured data

Page 8 of 9
Golam Kaderye
Lecturer (Dept. of CSE), IUS
Website: https://sites.google.com/view/golamkaderyeLecture No 1

Structured data:
Data containing a defined data type, format and structure.
Examples:
✓ Transaction data
✓ Online Analytical Processing (OLAP) data cubes
✓ RDBMS
✓ CSV files
✓ Spread-sheets (MS Excel)

FIGURE 1-4 Example of structured data

ThAnKyOU
Page 9 of 9

You might also like