Professional Documents
Culture Documents
and Mining
Introduction
People
• Instructor: Dr. Ping Deng
– E-mail: pideng@gmu.edu
– Office: ENGR 4608
– Office hours: W 2-4 PM
• TA: Archange Destine
– E-mail: adestine@masonlive.gmu.edu
– Office hours: TR 3-4PM @ ENGR 4456
Textbooks
• NoSQL Distilled: A Brief Guide to the Emerging World of
Polyglot Persistence by Sadalage and Fowler
• Data Science for Business: What you need to know about data
mining and data-analytic thinking by Provost and Fawcett
Recommended Books
• Introducing Data Science: Big data, machine learning, and
more using Python tools by Cielen, Meysman, and Ali
• Python for Data Analysis by Wes McKinney
• Data Mining Concepts and Techniques by Han and Kamber
• Fundamentals of Database Systems by Elmasri and Navathe
s o f re cords
co l l e ction
e Bs
Massiv ink 10 s of P To some, Big Data means using a
– th
NoSQL system or Parallel relational
DBMS like
Typically ho
used on lar
Facebook h ge clusters
as 2700 no of low-cost
storage!! des in their processors
cluster with
Truly stunn 60PB of
ing.
How much data?
• Google processes 20 PB a day (2008)
• Facebook has 2.5 PB of user data + 15 TB/day (2009)
• eBay has 6.5 PB of user data + 50 TB/day (2009)
• 2.5 quintillion bytes generated each day (IBM estimate)
640K ought to be
enough for anybody.
Some Big Data Stats
966
1000
848
900
Mars
800 715
700 619
Petabytes
600
500 434
364
400
300
Earth 269
227
200
100
0
g
g
n
re
t
ail
ns
en
in
in
io
tio
ca
t
tio
ur
Re
nk
m
at
ta
th
ica
uc
rn
1 zettabyte?
t
Ba
or
ac
al
e
Ed
sp
un
uf
He
ov
an
an
m
G
= 1 million petabytes
m
M
Tr
Co
Sources:
= 1 billion terabytes
"Big Data: The Next Frontier for Innovation, Competition and Productivity." = 1 trillion gigabytes
US Bureau of Labor Statistics | McKinsley Global Institute Analysis
Why the Sudden Explosion
of Interest?
• An increased number and variety of data sources that generate
large quantities of data
– Sensors (e.g. location, acoustical, …)
– Web 2.0 (e.g. twitter, wikis, … )
– Web clicks
• Realization that data was too valuable to delete
0 1
0 0 1 0 1 0 1 0
0 1 0 0 0 1 1 1 0
0 1 1 1 1 0 0 1 1
1 0 0 0 0 0 1 1 0
0 0 1 0 1 1 1 1
1 1 1 1