You are on page 1of 62

Intro to Career in Data Science

Md. Rabiul Islam


Content

1. FAQ - Summary of the course


2. Understanding the Role: Data Science Career Overview
3. Demand, supply and the Job Market
4. Salaries in Data Science
5. Typical Data Science Course
6. Excelling as Data Scientist
Career & Jobs

in Data Science
Questions - FAQs
5-10 Minutes
Course Info
Data Science Jobs and Salaries. Made for non technical and beginners wanting to move
into Data Science.

Made from experience and inputs gathered from meetups and classes in New York, NY.
Course introduces Data Science, vocabularies, salaries, job roles, course content, career
strategy.

How to get the best salary from your data science courses? Which skills are most in
demand?

Learn about the new jobs created in data science, big data, machine learning and data
analytics.
Data Science Career Overview
● Why a choose a career in
Data Science?
● Data Science is
Interdisciplinary
● Data Science Position and
Titles
● Thoughts on Higher
Education
Bootcamp vs MS Data Science
18-20k 50-80k
Bootcamp Duration 12-24 weeks MS 1 year (20-30 hours per week)
Full time (50-70 hours per week) 10 Courses
Very Rigorous Normal Workload
Portfolio Centric (Github) & Interview Course Centric
Entry very hard - includes technical Entry relatively easy - based on GRE
round
Instructored focused on Job / Networking Intructored focused on Pedantic style
(meetup, eventbrite)
Graduation means getting a job Graduation means completing
coursework
Intro to Data Science - What is Data Science
What is Data Science?
Why do we need Data Science
What does a Data Scientist do
How the Data Scientist everyday looks like
Data Science Roles
Salary Range
How does a typical Data Science project work
Future of Data Science
Will Data Science be on Demand
Prerequisites
How much Math's / Stat is required?
How many Machine Learning algorithms should I
know?
Importance of having a Master or PhD
What is the minimum basic requirements to start a
career in Data Science?
What background do you need to start a career in
Data Science?
What is the most important habit for becoming a
good Data Scientist?
What are the best 3 qualities to have in Data Science?
Pathways to study Data Science
What are the main skills to learn?
Statistics topics to learn
What programming languages you should know
R vs. Python vs Scala
Do I need to know R & Python or will only one
do?
Recommended Books
Resources to learn
How to become a Top Level Data Scientist
Where to get practical experience
Portfolio & Resume Prep
Should I create a blog or portfolio in order to get a DS
job?
Best places to promote your skills
Facebook / Linkedin Groups
Github Kaggle
How to stand out from the crowd
How to prepare a CV / Interview
What questions should you expect
How to prepare yourself for the interview
Understanding the
Role
Data Science vs Artificial
Intelligence vs Machine Learning
Data Science vs Data Analyst vs
Business Intelligence
5-10 Minutes
Data Scientist
The title “data scientist” is relatively new and is not yet clearly defined. Due to the fact
that it lacks specificity it can sometimes be perceived as an elevated synonym for “data
analyst.” But that’s not the case. A data scientist possesses a combination of analytic,
machine learning, data mining, and statistical skills in addition to experience with
algorithms and coding.

Data scientists also have expertise in the following programs: R, SAS, Python, Matlab,
SQL, Hive, Pig, and Spark. But maybe the most important skill that a data scientist
possesses is the ability to explain the significance of data in a way that can be easily
understood by others.
Data Science vs Data Analytics
Data Analytics vs Data Science
BI vs Data Science
Scope of Business Analytics

 Descriptive analytics
- uses data to understand past and present
 Predictive analytics
- analyzes past performance
 Prescriptive analytics
- uses optimization techniques
Scope of Business Analytics

Example 1.1 Retail Markdown Decisions


 Most department stores clear seasonal inventory by reducing
prices.
 The question is:
When to reduce the price and by how much?
 Descriptive analytics: examine historical data for similar
products (prices, units sold, advertising, …)
 Predictive analytics: predict sales based on price
 Prescriptive analytics: find the best sets of pricing and
advertising to maximize sales revenue
Data for Business Analytics

Four Types Data Based on Measurement Scale:


 Categorical (nominal) data
 Ordinal data
 Interval data
 Ratio data
Data for Business Analytics

Example 1.3
Classifying Data Elements in a Purchasing Database
Data for Business Analytics

Example 1.3 (continued)


Classifying Data Elements in a Purchasing Database

Ca Ca Ca Ca Ra Ra Ra Ra In In
te te te te tio tio tio tio te te
go go go go rv rv
ric ric ric ric al al
al al al al
Data for Business Analytics

Categorical (nominal) Data


 Data placed in categories according to a specified
characteristic
 Categories bear no quantitative relationship to one another
 Examples:
- customer’s location (America, Europe, Asia)
- employee classification (manager, supervisor,
associate)
Data for Business Analytics

Ordinal Data
 Data that is ranked or ordered according to some relationship
with one another
 No fixed units of measurement
 Examples:
- college football rankings
- survey responses
(poor, average, good, very good, excellent)
Data for Business Analytics

Interval Data
 Ordinal data but with constant differences between
observations
 No true zero point
 Ratios are not meaningful
 Examples:
- temperature readings
- SAT scores
Data for Business Analytics

Ratio Data
 Continuous values and have a natural zero point
 Ratios are meaningful
 Examples:
- monthly sales
- delivery times
Unstructured Data
Mapreduce Big Data
NoSQL Databases
Cleaning and Wrangling

http://159.89.224.205/wp-content/uploads/2016/02/tumblr_inline_o21df5eSYo1sleek4_540.png
Big data, draws from a number of sources: structured data and
unstructured data. Structured data is organized, typically by categories
that make it easy for a computer to sort, read and organize automatically.
Unstructured data, the fastest growing form of big data, is more likely to
come from human input — customer reviews, emails, videos, social media
posts, etc.
Typically, businesses employ data scientists to handle this unstructured
data, whereas other IT personnel will be responsible for managing and
maintaining structured data
How many Machine Learning algorithms should I know?

Decision tree
Random forest
Logistic regression
Support vector machine
Naive Bayes
k-NearestNeighbor
k-means
Adaboost
Neural network
Markov
Artificial Intelligence vs Machine Learning
Machines Will Do Half Our Work By 2025 (Forbes Sep 2018).

Artificial Intelligence is the broader concept of machines being able to carry out tasks in a
way that we would consider “smart”. Artificial Intelligences – devices designed to act

intelligently. ML and neural networks.Python Automation

Source: https://www.forbes.com/sites/patrickwwatson/2018/09/27/machines-will-do-half-our-work-by-2025/#204a1b255e2a
http://blogs-images.forbes.com/louiscolumbus/files/2017/05/Data-Science-and-Analytics-Demand-by-industry.jpg
Supply & Demand of
Data Science
Professionals
10-15 Minutes
Demand and Supply of Data Science Professional
Bridging The Data Scientist Talent Gap Starts With Defining The Current Role (Forbes
June 2018). Demand for data science and analytics skills? New job postings to reach
2.72M in 2020 (BHWS PWC 2017). Annual demand for the fast-growing new roles
of data scientist, data developers, and data engineers will reach nearly 700,000

openings by 2020. By 2020, the number of jobs for all US data professionals will
increase by 364,000 openings to 2,720,000 according to IBM.
IT Spending, Freelancing and Hiring Trends

IT spending is projected to reach about $3.85 trillion in 2019, up 2.8% from 2018.
36% of the workforce is contract-based or freelance talent with projections showing
freelancers will outnumber non-freelancers in the U.S. by 2027. Predictive analytics

algorithms monitor 3GB of data every second streaming from millions of


network interfaces. What's Coming: Tech Hiring Predictions For 2019 (Forbes June 2018).
The Amazing Ways Verizon Uses AI And Machine Learning To Improve Performance.
Future Jobs - Machines taking away Jobs

Deep Learning is used by Google in its voice and image recognition algorithms, by
Netflix and Amazon to decide what you want or buy. ML is described as a sub-
discipline of AI. The Workforce Needs AI -- But AI Needs Human Workers, Too (Forbes

Nov 2018). AI is expected to be able to write a high school essay and drive a truck
better than a human can, have a 50% chance of outperforming all human tasks within
45 years and automate all jobs in the next century. 14-54% of the U.S. workforce
could see their jobs automated in the next two decades. Let The Robots Take Over: How
The Future Of AI Will Create More Jobs (Forbes Dec 2018)
Future of Job Market

75% of finance departments will employ automation by 2020. Jobs


taken away from Artificial Intelligence. Robots Aren't Coming For Jobs: AI Is Already
Taking Them (Forbes Oct 2018).

Credit Suisse using deep neural networks, random forest and NLP to eliminate
analyst jobs (Waterstechnology 2019). What Is The Difference Between Deep
Learning, Machine Learning and AI? (Forbes Dec 2017). 10 Amazing Examples Of How
Deep Learning AI Is Used In Practice? (Forbes Dec 2018). Machine Learning And AI Will
Disrupt All Careers. Eight Ways Big Data And AI Are Changing The Business World
(Forbes 2018)
NYU Center for Data Science
Salary of Data Science
Professionals
5-10 Minutes
https://www.burning-glass.com/wp-content/uploads/The_Quant_Crunch.pdf
Data Science Job - Demand & Salary
Data Scientist has been named the best job in America for three years running, with a

median base salary of $110,000 and 4,524 job openings.Data Scientist Is the Best
Job In America According Glassdoor's 2018 Rankings (Forbes Jan 18).
Data Science Jobs
Data science is a fast growing and lucrative field, with the BLS predicting jobs in this
field will grow 11 percent by 2024. Data scientist is also shaping up to be a satisfying
long-term career path. According to data from Robert Half's 2018 Technology and IT
Salary Guide, the average salary for data scientists, based on experience, breaks down as
follows:
25th percentile: $100,000
50th percentile: $119,000
75th percentile: $142,750
95th percentile: $168,000
http://blogs-images.forbes.com/louiscolumbus/files/2017/05/highest-paying-skills.jpg
Data Science Courses
& Bootcamp
5-10 Minutes
Introduction
● The difference Data Science vs Machine Learning vs Artificial Intelligence vs
Data Analytics. How is the industry and HR using them while writing job
description?
● You will learn to use Python to help you acquire, parse and model your data.
● A significant portion of the course will be a hands-on approach to the
fundamental modeling techniques and machine learning algorithms that
enable you to build robust predictive models of real-world data and test their
validity.
● Seemingly enough, Scala Hadoop and other tech is faster which might be one
level closer to production. The idea if the course remain to develop analytical
thought process. Lot of Data Wrangling terms and concepts remain same
which is language agnostic.
What is inside the typical Data Science Course
● Mathematics
● Statistics
● Python statistical techniques in Python & Data Visualization
● Machine Learning
● Big Data Engineering
● Deep Learning

Pre-Works: Introductory Python (Optional), Data Analysis and Visualization with Python,
Statistics
How much Math's / Stat is required?
Logarithm, exponential, polynomial functions, rational numbers.
Basic geometry and theorems, trigonometric identities.
Real and complex numbers and basic properties.
Series, sums, and inequalities.
Graphing and plotting, Cartesian and polar coordinate systems, conic sections.
Linear algebra (and ideally basic multivariate calculus)
Regression linear regression and the things that violate the assumptions of linear models
(e.g., autocorrelation in time series data, non-independent observations)
Probability theory ... especially Bayes' Law and Central Limit Theorem
Numerical analysis (e.g., time series analysis and forecasting)
Core machine learning methods (clustering, decision trees, k-NN)
Excelling as Data
Scientist
5-10 Minutes
Excelling as Data Scientist
What Does It Take To Excel As A Data Scientist These Days? (Nov 2018). Companies are

only using about 12% of the data.Core Curriculum: Hadoop, Spark, Machine
Learning, Visualization. Specialization: Deep Learning, Data Engineering & Big
Data, Automation (DevOPs)
Technical Skills for Data Scientists

Math (e.g. linear algebra, calculus and probability)


Statistics (e.g. hypothesis testing and summary statistics)
Machine learning tools and techniques (e.g. k-nearest neighbors, random forests, ensemble methods, etc.)
Software engineering skills (e.g. distributed computing, algorithms and data structures)
Data mining
Data cleaning and munging
Data visualization (e.g. ggplot and d3.js) and reporting techniques
Unstructured data techniques
R and/or SAS languages
SQL databases and database querying languages
Python (most common), C/C++ Java, Perl
Big data platforms like Hadoop, Hive & Pig
Cloud tools like Amazon S3

You might also like