You are on page 1of 29

Introduction to Big Data

Sizes of data
Name Symbol Value
Kilobyte KB 10^3
Megabyte MB 10^6
Gigabyte GB 10^9
Terabyte TB 10^12
Petabyte PB 10^15
Exabyte EB 10^18
Zettayte ZB 10^21
Yottabyte YB 10^24
What is actually Big Data?
Big data-: So large data that it becomes difficult to process
it using the traditional system.
Example...
• Do you ever tried opening 0.5GB of file on your machine?
• Its difficult to edit 10TB file in limited time in traditional system
• Attach 100 MB file to e-mail
Difficult to process by
the Traditional
System
Unable to View Unable to Sent

100GB Image 100MB Document

Unable to Edit

Depends on the
100TB Video Capability of
the System.
4.6
30 billion RFID billion
tags today
12+ TBs (1.3B in 2005)
camera
of tweet data phones
every day world wide

100s of
millions
data every day

of GPS
? TBs of

enabled
devices sold
annually

25+ TBs of
log data 2+
every day billion
people on
the Web
76 million smart meters by end
in 2009… 2011
200M by 2014
Challenges

Capture Search

Storage Sharing

Curation Tranfer

Analysis Visualization
What is “big data”?

• “Big Data” is data whose scale, diversity, and complexity require new
architecture, techniques, algorithms, and analytics to manage it and
extract value and hidden knowledge from it…
• "Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to enable
enhanced decision making, insight discovery and process
optimization”
• Complicated (intelligent) analysis of data may make a small data
“appear” to be “big”.
• Bottom line: Any data that exceeds our current capability of processing
can be regarded as “big”
Big Data Everywhere!
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• purchases at department/
grocery stores
• Bank/Credit Card
transactions
• Social Network
Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
• 44x increase from 2009 to 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially

Exponential increase in
collected/generated data
Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media data,
multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of data

To extract knowledge all these


types of data need to linked together
13
Characteristics of Big Data:
3-Speed (Velocity)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions  missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase
history, what you like  send promotions right now for store next to
you

• Healthcare monitoring: sensors monitoring your activities and body


 any abnormal measurements require immediate reaction
14
Big Data:
3V’s

15
Some Make it 4V’s

16
Type of Data

• Relational Data (Tables/Transaction Data)


• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can only scan the data once
Classification of Big Data
1. Structured Data:
 It refers to data that has a defined
length and format for big data
 Ex.numbers, dates, and groups of
words and numbers called strings.
 It’s usually stored in a database.
2. Unstructured Data
 No fields
 Massive data ex.
 Newspaper
Music(Audio) Applications

Movie(vedio)
X-Rays Pictures
3. Semi-Structured Data
The data which do not have a proper formate atteched to it.
Ex.
–Data within an email
–Data in Doc File
Lifecycle of Data: 4 “A”s

Aggregatio
n
Integrated
Scattered
Data
Data

Acquisition Analysis

Log data Knowledge


Application
What to do with these data?

• Aggregation and Statistics


• Data warehouse and OLAP (Online Analytical Processing)
• Indexing, Searching, and Querying
• Keyword based search
• Pattern matching (XML/RDF)
• Knowledge discovery
• Data Mining
• Statistical Modeling
Traditional Data Analytics vs. Big Data Analytics
Traditional Data Analytics Big Data Analytics
   
Clean Data Clean Data/Messy Data/Noisy Data
   
TBs of Data PBs of Data/Lots of Data/Big Data
   
Often Know in advance the questions to ask Often Don’t know all the questions I want to ask
   
Architecture doesn’t lend for high computation Need distributed storage and computation
   
Typically, answers are factual Typically, answers are probabilistic in nature
   
Structure Structured and Unstructured
 
Dealing 1-2 domain data sets Dealing with dozens of domain data sets
   
Traditional Data
Big Data Analytics
Analytics
 
Hardware Proprietary Commodity
Cost High Low
Expansion Scale Up Scale Out

Loading Batch, Slow Batch and Real-Time, Fast

Reporting Summarized Deep


Analytics Operational Operational, Historical, and Predictive

Data Structured Structured and Unstructured

Architecture Physical Physical or Virtual

Agility Reactive Proactive, Sense and Respond


Risk High Low
Applications of Big Data Analytics
1.  Using Big Data Analytics to Boost Customer Acquisition and Retention
• In the year 2015, Coca-Cola managed to strengthen its data strategy by
building a digital-led loyalty program. 

2. Use of Big Data Analytics to Solve Advertisers Problem and Offer


Marketing Insights
• Netflix uses big data analytics for targeted advertising. With over 100
million subscribers, the company collects huge data, which is the key to
achieving the industry status Netflix boosts. If you are a subscriber, you
are familiar to how they send you suggestions of the next movie you
should watch. Basically, this is done using your past search and watch
data. This data is used to give them insights on what interests the
subscriber most
3. Big Data Analytics for Risk Management
• UOB bank from Singapore is an example of a brand that uses big data
to drive risk management. Being a financial institution, there is huge
potential for incurring losses if risk management is not well thought
of. UOB bank recently tested a risk management system that is based
on big data. The big data risk management system enables the bank
to reduce the calculation time of the value at risk. Initially, it took
about 18 hours, but with the risk management system that uses big
data, it only takes a few minutes.
4. Big Data Analytics As a Driver of Innovations and Product
Development
• Amazon Fresh and Whole Foods is a perfect example of how big data
can help improve innovation and product development. Amazon
leverages big data analytics to move into a large market. The data-
driven logistics gives Amazon the required expertise to enable
creation and achievement of greater value. Focusing on big data
analytics, Amazon whole foods is able to understand how customers
buy groceries and how suppliers interact with the grocer. This data
gives insights whenever there is need to implement further changes.
5. Use of Big Data in Supply Chain Management
• PepsiCo is a consumer packaged goods company that relies on huge
volumes of data for an efficient supply chain management. The
company is committed to ensuring they replace the retailers’ cancels
with appropriate volumes and types of products. The company’s
clients provide reports that include their warehouse inventory and
the POS inventory to the company, and this data is used to reconcile
and forecast the production and shipment needs. This way, the
company ensures retailers have the right products, in the right
volumes and at the right time.
Conclusion
• Big data analytics is an important investment for a growing business.
• Through implementing big data analytics businesses can achieve
competitive advantage, reduced the cost of operation and drive customer
retention.
• There are various sources of customer data that businesses can leverage.
• Data is becoming readily available to all organizations.
• Technically, it is fair enough to say that organizations already have data at
their disposal.
• It is up to the individual organizations to ensure they implement
appropriate data analysis systems that can handle the huge data.
• Does your business have big data analysis mechanism in place? 

You might also like