You are on page 1of 19

What’s Big Data?

How does it relate to Data Science?

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 1
From data to analysis and execu<on
Intensity

Analytical Capability

Data Availability

Execution Capability
40’s 50’s 60’s 70’s 80’s 90's 2000's 2010's 2020’s 2030’s

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 2
The appearance of the “Big Data”
Intensity

Analytical Capability

Data Availability

Execution Capability
40’s 50’s 60’s 70’s 80’s 90's 2000's 2010's 2020’s 2030’s

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 3
What's Big Data?

Data at scale
Volume

Terabytes to
hexabyte of data
cumulated on
cheaper and cheaper
storages

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 4
What's Big Data?
Variety

Data in many form


Structured, unstructured,
text, images, videos,
multimedia

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 5
What's Big Data?
Velocity

Data in motion
Analysis of streaming data
to enable decision within
fractions of a second

Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 6
What's Big Data?
Veracity

Data Uncertainty
Managing the reliability and
predictability of inherently
imprecise data type

a b o u t u n c e r tainty
ty
the one certain to g o away
o t li ke ly
is that it is n
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 7
What's Big Data? (cont.)
Volume

Velocity
Variety
Data Data
Data in many
at scale in motion
forms More Vs
• Value
• Volatility
• Validity
• …
Data Uncertainty

Veracity

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 8
Big Data techs are like "crude oil"
• … that we have to
• Extract
• Transport in mega-tankers
• Ship through pipelines
• Store in massive silos
• …

Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 9
Data Science is "refining crude oil"

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 10
Big Data and Data Science are
Closing the Gab
Intensity

Analytical Capability

Data Availability

Execution Capability
40’s 50’s 60’s 70’s 80’s 90's 2000's 2010's 2020’s 2030’s

Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 11
What's Data Science?
• The Science [and Art] of…
• Discovering what we don’t know from data
• Obtaining predic<ve, ac<onable insight from data
• Crea<ng Data Products that have business impact now
• Communica<ng relevant business stories from data
• Building confidence in decisions that drive business value

Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 12
Who's a Data Scientist?
• Drew Conway, 2010 e
c
ic en

M Kn
at ow
S

h
er Machine

& led
ut Learning

St ge
p

aF
m
Co

sF
Data

cs
Science Tr
a
ge
r Re ditio
n se n a
Da ne! arc l
Zo h

Substantive
Expertise

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 13
Who's a Data Scientist?
• Drew Conway, 2010 e
c

M Kn
ic en

at ow
S

h
r Machine

& led
te

St ge
pu Learning

at
m

ist
Co Data

ics
Science Tr
a
ge
r Re diFo
n se n a
Da ne! arc l
Zo h

SubstanFve
ExperFse

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 14
Who's a Data Scientist?
• Drew Conway, 2010 e
c

M Kn
ic en

at ow
S

h
r Machine

& led
te

St ge
pu Learning

at
m

ist
Co Data

ics
Science Tr
a
ge
r Re ditio
n se n a
Da ne! arc l
Zo h

SubstanFve
ExperFse

Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 15
Statistical Modeling: The Two Cultures
Data modeling Algorithmic modeling
(a.k.a, staFsFcal analysis) (a.k.a, Machine Learning)
y = F(x, random noise, parameters) y = algorithm(x)

Linear regression
y Logistic regression x y unknown x

Decision Tree
Neural Nets

Valida<on Valida<on
yes/no using goodness-of-fit predicFve accuracy
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 16
Statistical Modeling: The Two Cultures
• Starting with data

response independent
variable variable

y nature x

• Two goals of analyzing data


• Descriptions: how nature associates responses to inputs
• Predictions: response for future input variables
Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 17
Data-driven decision (by Gartner)

AnalyFcs Human-Centered & Machine-Centered


Descrip<ve
What happen?
Diagnostic
Why did it happen?

Decision

Action
Data

Predic<ve
What will happen?

Decision Support
Prescriptive
What should I do?
Decision Automation

Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 18
Credits
• Big Data [sorry] & Data Science: What Does a Data ScienFst Do? Carlos Somohano, 2013
• hjps://www.slideshare.net/datasciencelondon/big-data-sorry-data-science-what-does-a-data-
scienFst-do-world
• 2017 Planning Guide for Data and AnalyFcs. John Hagerty (Gartner), 2016
• hjps://www.gartner.com/binaries/content/assets/events/keywords/catalyst/catus8/2017_plannin
g_guide_for_data_analyFcs.pdf
• Big Data: the next fronFer for innovaFon, compeFFon, and producFvity. McKinsey Global
InsFtute. May, 2011.
• hjp://www.mckinsey.com/insights/business_technology/big_data_the_next_fronFer_for_innovat
ion
• AnalyFcs: The real-world use of big data. IBM InsFtute for Business Value In collaboraFon
with Saïd Business School at the University of Oxford. 2012
• hjp://www-03.ibm.com/systems/hu/resources/the_real_word_use_of_big_data.pd
• Big Data & AnalyFcs: Next GeneraFon Architecture and CapabiliFes. Marc Andrews,
2014
• hjps://www.ibm.com/partnerworld/wps/servlet/RedirectServlet?cmsId=isv_ast_smp_ecosystem-
webcasts&ajachmentName=Data_Warehouse_deck.pdf

23/07/2018 @manudellavalle - http://emanueledellavalle.org 19

You might also like