Professional Documents
Culture Documents
THE ULTIMATE
DATA CAREER
ROADMAP
your guide to the data world
By Avery Smith
HI I'M AVERY
Thank you for being
here and downloading
this roadmap! I am so
happy to go on this
data journey with you.
MachIne
Computer Learning Statistics
Science
DATA
SCIENCE
Software Traditional
Development Research
Domain
Knowledge
(n) Using All of these solutions, using data to solve business problems, we
call data science. Data science is what happens when a domain,
data to
computers science, and statistics meet.
I think a simple definition for data science is: Data Science (n); Using
provide
data to provide value
That’s it; pretty simple, and pretty inclusive. I hope this makes it feel
value
like you’ve actually already done data science in your life which I’d
argue, you have.
Business
Intelligence
Data
Data Mining Data
Storage
VIZ Machine
learning
Cloud
Computing
Data
Deep Engineering
Learning
I’M A CHEMIST...
Chemists can use data to: I’M A BLANK...
Analyze mixtures to check composition
If you don’t see your profession or industry
Predict reaction outcomes
here, but would like to, send me a message
Run more efficient experiments
and I’ll add it to the list.
MEET REAL LIFE
D A T A
P R O FES S I O N A L S
Click picture to follow
them onLinkedIn
I’ll split this roadmap in two and have you decide, would you like to focus on
analytics or engineering?
You
Analytics Engineering
SUMMARY STATISTICS
Summary statistics are calculations like averages,
sums, percentages, or variance. You’re looking at
counts, and how many times something occurred.
DATA VISUALIZATION
Data visualization is the gate-way topic into data in general. It is the art of
visualizing, or seeing data, with our human eyes. We, as humans, are not
calculation machines; we aren’t computers. It is very hard for us to
process multiple numbers at one given time. But we are really good at
seeing things and creating a story out of what we see.
It’s always easier to understand data by seeing it, and hence data
visualization is key to finding insights and understanding the data.
SIMPLE TEXT
SCATTERPLOT
TABLE
LINE
HEATMAP
SLOPEGRAPH
BAR
STACKED BAR
WATERFALL
TYPES OF DATA VIZ SQUARE AREA
There’s all sorts of different types of DOT PLOT
data visualization. All graphs are data CHOROPLETH MAP
SYMBOL MAP
viz so you’ve probably seen pie charts, DONUT CHART
SANKEY DIAGRAM
bar charts, and line charts before. All AREA CHART
are great examples of data viz. But PICTOGRAM
PIE CHART
there’s actually dozens more charts SPIDER CHART
you probably haven’t heard of before. GANTT CHART
TREEMAP
BAR MEKKO CHART
CHORD CHART
CIRCULAR CHART
TERNARY DIAGRAM
GAUGE CHART
SUNBURST CHART
PARETO PLOT
HISTOGRAM
IF YOU COMBINE MULTIPLE CHARTS ON ONE PAGE, YOU CAN CREATE A DASHBOARD.
WHAT IS A DASHBOARD?
Dashboards are interactive displays of
data that often have multiple data
visualizations.
DASHBOARD SOFTWARES?
Now days, there’s many softwares that will allow you to
make beautiful, interactive dashboards pretty easily. At the
end they’re all pretty similar. Some do have strengths, or
are more easily accessible, but they aren’t too different.
To reach this new level, you’ll need some sort of “data wrangling” tool.
SQL
SQL stands for structured query
language and is basically database
talk. It’s a way to communicate with a
database and ask questions using
certain syntax.
R
R is a programming language used
exclusively for data analysis and
statistics. It has functions that can help
you understand your data in a matter
of lines of code.
PYTHON
Python is a complete, high-level
general programming language that
can nearly do anything. You can do
data analysis, build websites, do
cybersecurity, make games, all with
one language. There are many great
tools to perform data analysis using
Python that makes analytics fairly
simple.
Where the SELECT column is choosing what columns you want to look
at, FROM describes the table you’re pulling from and WHERE does a
filter on the rows. Other where conditions could be > or < with numeric
data.
Instead of collecting rows and columns, you can aggregate the data
from those columns using keywords like COUNT, AVG, MIN, MAX and
SUM which provide quick calculations of your columns.
If you want to start working with multiple tables, you’d use a statement
called JOIN that combines tables. Other common commands are ORDER
BY which allows you to sort results, LIKE which allows you to search on
substrings, and GROUP BY which allows for grouping of records.
CODING BASICS
First, if you’ve never coded before, or haven’t in many years, don’t worry;
it isn’t as hard as you think. CODING ISN’T THAT HARD. Programming is
just a conversation between a human and a computer. A few definitions
will help you get started:
VARIABLE
A variable is just like it was in 9th grade math class,
a letter that represents something else. You can
name a variable nearly anything and it can
represent any math or doing structure. Here’s a
simple example of a variable, let’s call it “a”. We can
define “a” as the number 4. Then we can write a new
expression, something like “a + 4” and the program
would return 8.
FUNCTION
A function is just a way to condense multiple lines
of code into one line of code. So that you don’t need
to rewrite it again if you use it multiple times.
LOOP
A loop is a way to do a repeated calculation,
multiples of times quickly and efficiently.
ADVANCED STATISTICS
At this point, you might be ready to do something more advanced. You
might want to learn some statistics methods to derive deeper insight
from your data. Here are some things you might want to learn.
HYPOTHESIS TESTING
You have a theory that something is true; that’s your
hypothesis. There’s a chance your theory is not true;
your null hypothesis. This technique allows you to
compute how likely this outcome is based on a
normal distribution. Learn more here.
ANOVA
don't let stats scare you off ANOVA, or analysis of variance, allows you to find
either. you can do this!
differences between groups that are statistically
significant. Learn more here.
DIMENSIONALITY REDUCTION
Sometimes, you have too many columns for our
little human brains to deal with. And you’ll want to
have less dimensions so you can simplify the
problem down to something that is a big more
digestible. You can use a technique called PCA
(principal component analysis) and start to identify
trends in your data. Read more here.
WHAT IS MACHINE LEARNING?
At this point, you might feel like you’ve done a lot of descriptive analytics, but want to try
predictive analytics. This is where you can start to learn about something like Machine Learning.
It’s a bit broad of a topic and there’s a lot of confusion in the space. Here’s what it is not. It’s not
robots taking over the world. It’s not computers learning automatically by themselves. It’s not the
apocalypse. Okay? Much happier, and honestly simpler than that.
Here’s my definition: smart algorithms that look for patterns and trends in data sets. And
maybe more simply put you can think of machine learning as computers imitating human
thought process via complex mathematics.
There’s lots of branches of different types of machine learning. There’s computer vision, there’s
natural language processing or NLP, there’s regression, there’s neural networks, there’s
classification, there’s clustering, there’s reinforcement learning, and deep learning.
But there are two main types of machine learning; supervised and unsupervised.
You can take this one-step further and make it insanely powerful by just
using multiple x’s (inputs) instead of the one. That way you can make a
predictive model using lots of potential predictors.
You can read more about this via example by looking at the House Price
Prediction Data Set.
K-NEAREST NEIGHBORD
K-Nearest Neighbors
Instead of predicting a quantitative number, you can use machine
learning to predict what group something belongs to. This is called
classification and it’s mostly taught with the Iris Flower Data Set.
I know this is `~30 pages of information, but don't worry; that's why I'm
here. Here are some ways I can help:
Listen to my podcast!
I interview data professionals who have
been in your shoes. We go over what it's like
to be a data professional, and how to get
there.
DATA CAREER
JUMPSTART
Avery Smith