You are on page 1of 32

Data Science Career Track

Introduction to Data Science


Some Basic Rules
Before we Start!

2
AGENDA

01 What is Data Science


What’s Data Science is
All about?
02 Data Science Ecosystem
Data science World and
commercial landscape

03 Data Science Building Blocks

and Team Roles and skills


04 Data Science Methodology
Steps of a Data science
Project
HELLO!
Meet Mohamed
I am here because I love to Start a Career in Data
Science!
Join me throughout my journey !

4
What is Data
1 Science
What’s Data Science is All
about?
They care about volume Data science is only useful
and velocity and whatever when the data are used to
other buzzwords describe answer a question.
data that is too big for you “I have this really hard
to analyze in Excel. question, can I answer it
with my data?”

6
Moneyball
can we build a winning
baseball team if we Place your screenshot here

have a really limited


budget?.

7
Voter Turnout
“How do we find the
people who vote for
Barack Obama and Place your screenshot here

make sure that those


people end up at the
polls on polling day?”

8
“The goal is to turn data into

“ information, and information


into insight.”
–Carly Fiorina

“Numbers have an important


story to tell. They rely on you
to give them a voice.”
–Stephen Few

9
Reports

Presentations

The outputs of a data Interactive web


pages

science Projects APPS

Dashboards

10
Data Science
2 Ecosystem
Data
Ecosystem
An ecosystem: is the network
of organizations—including
suppliers, distributors,
customers, competitors,
government agencies, and so
on—involved in the delivery of
a specific product or service
through both competition and
cooperation.

12
Commercial
Landscape

13
Data Science
3 Building Blocks
Data Science Building Blocks

the foundation of knowledge and knowledge of programing languages,


techniques that are used in Data data structures, data bases and
analysis and inference. algorithms .

the knowledge and experience of the


business domain problems where data
science will try to solve.

15
• Python • Tableau
• R • Power BI
• Feature Engineering • Matplotlib
• Data wrangling • Seaborn
• EDA • ggplot
• D3.js
• Gephi
• Classification
• Jupyter
• Regression • Colaboratory
• Reinforcement learning • Spyder
• Deep learning
• PyCharm

• Clustering

• Statistics • Scrapy
• Linear Algebra • AWS • URLLIB
• Differential calculus • AZURE • Beautiful Soup
Data Science
3 Team
Data science is a team sport
Data Sciene
“Unicorn”
The data science unicorn is a
somewhat mythical person who is a
leader in data science, technology,
and business. ...

18
Data Scientist

What does a data scientist do?

Design experiments
Pull and clean data
Analyze data
Communicate results

19
Data Engineer

What does a data engineer do? + friendly!


Build data infrastructure
Manage data storage and use
Implement production tools

20
Data Science
is a Marathon

21
Roadmap to data science

22
Data Science
4 Methodology
CRISP-DM

24
From Problem to Approach

Every project begins Example : Traffic


Problem: Traffic congestion wastes time
▸ Project objective? and money
▸ Business sponsors play the most critical Clear question: How can we optimize
role
traffic light duration using data on traffic
▸ What are we trying to do – what is the patterns, weather, and pedestrian traffic?
goal?
Measurable outcomes:
▸ How do you define “success” and how
can you measure it? - % decrease in commute time
- % decrease in length/duration of
traffic jams

25
From Problem to Approach

Clustering:
Expand machine learning techniques
“Are there groups of users that seem to behave
similarly to each other?”
“Predicting press problem in context of
statistical revenue in the next quarter?” Recommendation/Personalization:
“How can I target discounts to specific
customers?”
“Does this patient have cancer A, cancer B, or
are they healthy?” Outlier Detection

26
From Requirements to Collection

The chosen analytic approach determines the


Initial data collection is performed.
▸ Content, • Available Data?
▸ formats, • Obtain data?
▸ representations • Revise data requirements or collect more data?

27
From Understanding to Preparation
Then is gained. Data preparation encompasses all activities t
• Initial insights about data construct and clean the data set.

• Descriptive statistics and visualization Data cleaning


• Additional data collection to fill gaps, if needed • Missing or invalid values
• Eliminating duplicate rows
• Formatting properly
Combining multiple data sources
Transforming data
Feature engineering
Text analysis
Accelerate data preparation by automating common
steps
28
From Modeling to Evaluation
Model evaluation is performed during
▸ Developing predictive or descriptive models model development and before model
deployment
▸ May try using multiple algorithms
▸ Highly iterative process
• .Understand the model’s quality
• Ensure that it properly addresses the
business problem
Diagnostic measures
• Suitable to the modeling technique used
• Training/Testing set
• Refine model as needed

29
From Deployment to Feedback
Once finalized, the model is into Getting Feedback :
a production environment. • How well did the model perform?
▸ • May start in a limited / test • Iterative process for model refinement
environment
and redeployment
• A/B testing
▸ Solution owner
▸ Marketing
▸ Application developers
▸ IT administration

30
Let’s start our
Data science
Hands on
Journey!
Stay Tuned

31
THANKS!
Any questions?

32

You might also like