Professional Documents
Culture Documents
Sizes of data
Name Symbol Value
Kilobyte KB 10^3
Megabyte MB 10^6
Gigabyte GB 10^9
Terabyte TB 10^12
Petabyte PB 10^15
Exabyte EB 10^18
Zettayte ZB 10^21
Yottabyte YB 10^24
What is actually Big Data?
Big data-: So large data that it becomes difficult to process
it using the traditional system.
Example...
• Do you ever tried opening 0.5GB of file on your machine?
• Its difficult to edit 10TB file in limited time in traditional system
• Attach 100 MB file to e-mail
Difficult to process by
the Traditional
System
Unable to View Unable to Sent
Unable to Edit
Depends on the
100TB Video Capability of
the System.
4.6
30 billion RFID billion
tags today
12+ TBs (1.3B in 2005)
camera
of tweet data phones
every day world wide
100s of
millions
data every day
of GPS
? TBs of
enabled
devices sold
annually
25+ TBs of
log data 2+
every day billion
people on
the Web
76 million smart meters by end
in 2009… 2011
200M by 2014
Challenges
Capture Search
Storage Sharing
Curation Tranfer
Analysis Visualization
What is “big data”?
• “Big Data” is data whose scale, diversity, and complexity require new
architecture, techniques, algorithms, and analytics to manage it and
extract value and hidden knowledge from it…
• "Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to enable
enhanced decision making, insight discovery and process
optimization”
• Complicated (intelligent) analysis of data may make a small data
“appear” to be “big”.
• Bottom line: Any data that exceeds our current capability of processing
can be regarded as “big”
Big Data Everywhere!
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• purchases at department/
grocery stores
• Bank/Credit Card
transactions
• Social Network
Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
• 44x increase from 2009 to 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Exponential increase in
collected/generated data
Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media data,
multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of data
15
Some Make it 4V’s
16
Type of Data
Movie(vedio)
X-Rays Pictures
3. Semi-Structured Data
The data which do not have a proper formate atteched to it.
Ex.
–Data within an email
–Data in Doc File
Lifecycle of Data: 4 “A”s
Aggregatio
n
Integrated
Scattered
Data
Data
Acquisition Analysis