Professional Documents
Culture Documents
2
AGENDA
4
What is Data
1 Science
What’s Data Science is All
about?
They care about volume Data science is only useful
and velocity and whatever when the data are used to
other buzzwords describe answer a question.
data that is too big for you “I have this really hard
to analyze in Excel. question, can I answer it
with my data?”
6
Moneyball
can we build a winning
baseball team if we Place your screenshot here
7
Voter Turnout
“How do we find the
people who vote for
Barack Obama and Place your screenshot here
8
“The goal is to turn data into
9
Reports
Presentations
Dashboards
10
Data Science
2 Ecosystem
Data
Ecosystem
An ecosystem: is the network
of organizations—including
suppliers, distributors,
customers, competitors,
government agencies, and so
on—involved in the delivery of
a specific product or service
through both competition and
cooperation.
12
Commercial
Landscape
13
Data Science
3 Building Blocks
Data Science Building Blocks
15
• Python • Tableau
• R • Power BI
• Feature Engineering • Matplotlib
• Data wrangling • Seaborn
• EDA • ggplot
• D3.js
• Gephi
• Classification
• Jupyter
• Regression • Colaboratory
• Reinforcement learning • Spyder
• Deep learning
• PyCharm
• Clustering
• Statistics • Scrapy
• Linear Algebra • AWS • URLLIB
• Differential calculus • AZURE • Beautiful Soup
Data Science
3 Team
Data science is a team sport
Data Sciene
“Unicorn”
The data science unicorn is a
somewhat mythical person who is a
leader in data science, technology,
and business. ...
18
Data Scientist
Design experiments
Pull and clean data
Analyze data
Communicate results
19
Data Engineer
20
Data Science
is a Marathon
21
Roadmap to data science
22
Data Science
4 Methodology
CRISP-DM
24
From Problem to Approach
25
From Problem to Approach
Clustering:
Expand machine learning techniques
“Are there groups of users that seem to behave
similarly to each other?”
“Predicting press problem in context of
statistical revenue in the next quarter?” Recommendation/Personalization:
“How can I target discounts to specific
customers?”
“Does this patient have cancer A, cancer B, or
are they healthy?” Outlier Detection
26
From Requirements to Collection
27
From Understanding to Preparation
Then is gained. Data preparation encompasses all activities t
• Initial insights about data construct and clean the data set.
29
From Deployment to Feedback
Once finalized, the model is into Getting Feedback :
a production environment. • How well did the model perform?
▸ • May start in a limited / test • Iterative process for model refinement
environment
and redeployment
• A/B testing
▸ Solution owner
▸ Marketing
▸ Application developers
▸ IT administration
30
Let’s start our
Data science
Hands on
Journey!
Stay Tuned
31
THANKS!
Any questions?
32