You are on page 1of 34

V0.

THE ULTIMATE

DATA CAREER
ROADMAP
your guide to the data world
By Avery Smith
HI I'M AVERY
Thank you for being
here and downloading
this roadmap! I am so
happy to go on this
data journey with you.

Over the past few years,


I've started my own data analytics consultancy
Worked as a data scientist for ExxonMobil
Interned as a data analyst for the Utah Jazz,
Instructed Data Engineering at MIT
Taught hundreds of people about data
analytics

The best part about all of this? You can do it as


well.

You can find your spot in the data kingdom,


enjoying your job with excellent work-life balance
with a potential six-figure salary.

And this is the guide to get you started!


WHAT IS DATA?
Data (n): recorded information of events

It’s the factual recording of what has occurred. It


can answer things like: who, how many, how big,
where, what category, what color, how hot, ect.

HOW IS DATA USEFUL?

Knowing exactly what has happened historically


allows us to make better decisions about the
unknown or into the future.

Seeing into the future would be an amazing power,


but impossible. Seeing crystal clear into the past to
understand patterns and trends to help us predict
the future is the closest thing we have, and hence
we have data.

Data allows us to see what outcomes occurred due


to previous acts and efforts.
HOW IS DATA ACTUALLY USEFUL?

Thousands of organizations are using data right at


this second, to help solve real-world business
problems, here are 24 of them:
Let’s deep dive into 3 of these organizations and
learn how they are using data in their products:
INSTAGRAM EXAMPLE
You open Instagram, and sure enough, the first post
is from your favorite celebrity, Justin Bieber, yet
again. The picture is awesome and you comment
“Love this photo, Justin. Can’t wait for the new
album!.” When you open the app the next day,
another post from Justin...why this pattern when you
follow 1273 other accounts?

When Instagram first came out, the picture at the


top of your feed was the most recent post, but now,
it uses data from your previous interactions on the
app, to decide what photo you might like the most
at the top.

Instagram makes money from ads on the app,


which are highly dependent on having as many
eyeballs on the app as possible. They want you to
stay on the app as much as possible to garner
those views and they more you like what you see,
the longer they’ll stay.
They use your previous likes, comments, view time,
ect to predict what you might like to see today in
aims to get you to stay glued to the app for eternity!
to learn more about this, check out And I mean with looks and vocals like Bieber, who
Netflix's The Social Dilemma could blame you!
UBER EXAMPLE
You’ve just touched down in your local airport after
a trip to London. You exit the plane, pick up your
bags at baggage claim, and head to the curb to
catch a Uber home. You open up the app and
whack!, it’s $56.82 to take you back to your home.
You’re flabbergasted! Just a week ago, it was only
$22.81 to take you to the airport! Why did the price
more than double?

Fundamentally, Uber is a matchmaker. It has drivers


looking for jobs, and riders looking for rides; Uber
simply brings the two parties together and makes
an economic trade. This entire market follow market
economics and is based upon supply and demand.
If there are 100 drivers, and only 5 riders, drivers
have a choice to make.

They can either lower prices to try to win one of the


riders’ money, or do nothing for the next hour. This
drives prices down for the riders. When there is 5
drivers, and 100 riders, the equilibrium flips. Now,
riders have to decide who wants the ride the most. This entire market calculation is happening behind
the scenes of Uber’s price algorithm. And hence,
when you arrive late on a busy Sunday night, you’re
to learn more about this, read this
not going to get the same price as an open Tuesday
article
morning.
POST OFFICE EXAMPLE
It’s your friend Jose’s birthday, and he lives down in
Brazil. You write him a letter and drop it off at the
Post Office with thousands of other letters. How
does that letter get to Jose in Brazil? Does a
human look at it?

Despite a perceived old-feeling to physically mail,


the post office has some cutting-edge technology.
Letters are viewed with a camera, and the
characters on the address that you wrote with your
sub-par handwriting are read and captured by a
computer.

The computer has been trained with millions of


hand-written addresses (data) every year, and has
learned to actually read. With the computer’s help,
to learn more about this, read this
your birthday card reaches Jose on time.
summary
What is Data Science?

MachIne
Computer Learning Statistics
Science
DATA
SCIENCE
Software Traditional
Development Research

Domain
Knowledge

(n) Using All of these solutions, using data to solve business problems, we
call data science. Data science is what happens when a domain,

data to
computers science, and statistics meet.

I think a simple definition for data science is: Data Science (n); Using

provide
data to provide value

That’s it; pretty simple, and pretty inclusive. I hope this makes it feel

value
like you’ve actually already done data science in your life which I’d
argue, you have.

What about all these


other data terms?
Data / Data AI
Science ANALYTICS Big data

Business
Intelligence
Data
Data Mining Data
Storage
VIZ Machine
learning
Cloud
Computing
Data
Deep Engineering
Learning

Business Intelligence, Data Visualizations, Data Mining, Data


Wrangling, AI, Deep Learning...most of these are just a part of data
science.

Data Engineering is pretty different however.

Let's explore these more in depth on the next page...


What about all these
other data terms?
Data / Data

WHAT IS DATA ANALYTICS?


Data / Data
Data analytics and data science are pretty much
WHAT IS DEEP LEARNING?
Just a fancy subset of machine learning focused

Science ANALYTICS Science ANALYT


synonymous. Some will argue there is a
difference, but I haven’t seen a compelling
around mimicking the way a human brain thinks.
Usually, they use an algorithm called neural
enough argument to sway me. networks, which once again, mimic human
learning patterns.
WHAT IS MACHINE LEARNING?
WHAT IS BIG DATA?
Machine learning is a computer algorithm that
Ah, another marketing ploy. Big data is
helps human understand processes by finding
somewhat meaningless and is just a substitute
patterns within a data set. These are algorithms
to describe data science in general. In reality,
where you as a human do not need to set finite
big data can be classified into using data sets
rules.
that are probably too large to open on your
small laptop, but it is used more generically
than that.

WHAT IS AI? THAT’S DIFFERENT, RIGHT? WHAT IS DATA VISUALIZATION?


Kind of. AI is used more of a marketing term
Data visualization is presenting data in a way
than anything else. Honestly, AI could be
that human eyes/brain can digest it, usually
simple math, or could be machine learning, or
involving graphs or tables (and yes, a table is a
something a bit more complicated. It’s said to
data visualization).
be a computer system that can do tasks that
usually require a human brain. Things like
spam filters, self-driving vehicles, and Netflix
recommenders could be considered AI. It is a
muddied term frankly, and hence I try not to WHAT IS DATA ENGINEERING?
actually use it. THIS ONE IS DIFFERENT! Data engineering is
the art of collecting, storing, moving, and
WHAT IS BUSINESS INTELLIGENCE? deploying data. These are the people that make
Just data analytics, rebranded for businesses. A data accessible. It is related to data science, and
bit more emphasis on descriptive analytics. has some overlap, but I put this in a completely
different category of study.

I'M AN ENGINEER... I’M A FINANCE...


Engineers can use data to: Finance professionals can use data to:
Analyze if processes are working correctly Get real time stock market insight
Find when and what went wrong More accurately anticipate risk
Simulate complex scenarios Detect and identify fraud

I'M IN MARKETING... I'M IN HEALTHCARE...


Marketers can use data to: Health care workers can use data to:
Understand their demographic better Develop personalized medicine
Monitor costly ad campaigns Find anomalies in Xrays or other scans
Predict customer churn Optimize employee scheduling to
minimize hospital wait times

I’M A CHEMIST...
Chemists can use data to: I’M A BLANK...
Analyze mixtures to check composition
If you don’t see your profession or industry
Predict reaction outcomes
here, but would like to, send me a message
Run more efficient experiments
and I’ll add it to the list.
MEET REAL LIFE
D A T A
P R O FES S I O N A L S
Click picture to follow
them onLinkedIn

Rachel Castellino Monica Royal Zach Wilson


Incoming Data Scientist Product Manager Tech Lead
@ Facebook @ Northwestern Mutual @ AirBnb

Matt Francis Ellie Bublick Abe Diaz


Data Scientist Analytics Manager Financial Analyst
@ General Mills @ Dentsu @ NSU Florida

Matt Sharpe Mary MacCarthy Mark Freeman II


Data Engineer Data Journalist Senior Data Scientist
@ MX @ Northwestern Mutual @ Humu
5 Reasons You Should
Have a Data Career
#1 - DATA CAREERS ARE FUN, CREATIVE, AND INTERESTING
You’ll spend 90,000 hours working in your life. Do you really want to spend that
90,000 hours doing something you hate?
Data / Data
In a data position, you get to see things from a whole different level that others
cannot see. You get to understand why things ended up the way they did, and why
Science ANALYTICS
might happen in the future.. You get to learn and try new algorithms, languages, and
softwares. You’re constantly learning, improving your skills, and providing insight.

#2 - YOU BECOME A SUPERHERO


Every company wants to make good choices, but often they’re just going with their
gut feeling. You get to show them their gut feeling is wrong, and they should choose
another path. Or show that it is right, and give them the peace of mind in their choice.
You get a seat at the table. Your boss and the CEO trust you. You bring true, tangible
value to your team. You help make tools that make million-dollar decisions. Your
knowledge transforms numbers and Excel tables into revenue generating insights
and decisions.

#3 - DATA SKILLS ARE IN DEMAND


Think about it: all of today’s big companies: Facebook, Amazon, Netflix, Uber all have
products that rely on data science. And companies like these aren’t going anywhere;
they’re only growing bigger. I personally think data will follow a similar track of
software development and web development from the early tech days till now, jobs
might change in nature, but the demand will always be present.

#4 - THE PAY IS GOOD


Because there is such a high demand, and because they bring so much value,
people with data skills are compensated extremely well. Many entry positions offer
over $100,000 salaries with additional compensation and benefits. Now like I said
earlier, money isn’t everything, but it is something. And getting paid well to do what
you do best is a great feeling and unlocks a bit of financial freedom.

#5 - YOU GAIN FLEXIBILITY


A lot of data jobs offer some version of working remotely. This means you’ll be able
to work wherever you want. This means you can take extended trips, work during the
day, and play in the evenings. This means you can experience living in many parts of
the world. Some jobs will even be pretty flexible on timing and when you work.
EVERYONE ASKS THAT
Well, that’s the everyone wants to ask. But first, I’d
ask you, well where do you want to go! For
example, let’s take Alice from Alice in Wonderland.

She learned from the cat, if you don’t know where


you want to go, it doesn’t matter what path you
take.

So what are the potential end destinations? Well,


those roles we talked about earlier! We can
separate those roles largely by analytics or
engineering.

If you don't know where you're going,


any road will take you there
CHESHIRE CAT
ANALYTICS VS ENGINEERING
Analytics is finding insight from data, and engineering is making infrastructure
to power data activities.

I’ll split this roadmap in two and have you decide, would you like to focus on
analytics or engineering?

You

Analytics Engineering

If you’re interested in analytics: If you’re interested in engineering:


Yay! I'd love to help you more. I’m not the best person to cover this. I’d
Let's keep reading. suggest connecting with Joe, Andreas,

or Zach.

The Easy Way To Get


Started
WHERE SHOULD I START?
You can think of a coordinate system with one axis value, and the other axis difficult.
The value axis represents how much value is created by doing an activity, and the
difficulty axis is defined by how difficult that task is. And here’s the secret: start in the
corner of high value, low difficulty. This is going to help you deliver value to your job
quickly, and get you excited and motivated to learn more.

So what is low-effort, high-value?


WHAT IS HIGH VALUE - LOW EFFORT?
There are 4 types of analytics: Descriptive, Diagnostic,
Predictive, and Prescriptive.

Descriptive answers the question “WHAT HAPPENED?” It literally


describes what happened.

Diagnostic is “WHY DID IT HAPPEN?”, talking about the why


behind the description; diagnosing it.

Predictive is “WHAT WILL HAPPEN” it is projecting into the


future at the unknown and trying to predict what will happen.

Prescriptive is one step further and asks “HOW CAN WE MAKE


IT HAPPEN?” as in how we can fabricate our desired outcome,
based on our previous results.

The highest value to effort ratio is by far descriptive analytics.


WHAT IS DESCRIPTIVE ANALYTICS?
Descriptive analytics is well..describing! You’re
focusing on finding out what happened or what is
happening in a summary sort of way. The purpose is
to help understand decisions and hopefully improve
in the decision making process moving forward.

Descriptive analytics is typically summary statistics


plus data visualization.

SUMMARY STATISTICS
Summary statistics are calculations like averages,
sums, percentages, or variance. You’re looking at
counts, and how many times something occurred.

DATA VISUALIZATION
Data visualization is the gate-way topic into data in general. It is the art of
visualizing, or seeing data, with our human eyes. We, as humans, are not
calculation machines; we aren’t computers. It is very hard for us to
process multiple numbers at one given time. But we are really good at
seeing things and creating a story out of what we see.

It’s always easier to understand data by seeing it, and hence data
visualization is key to finding insights and understanding the data.
SIMPLE TEXT
SCATTERPLOT
TABLE
LINE
HEATMAP
SLOPEGRAPH
BAR
STACKED BAR
WATERFALL
TYPES OF DATA VIZ SQUARE AREA
There’s all sorts of different types of DOT PLOT
data visualization. All graphs are data CHOROPLETH MAP
SYMBOL MAP
viz so you’ve probably seen pie charts, DONUT CHART
SANKEY DIAGRAM
bar charts, and line charts before. All AREA CHART
are great examples of data viz. But PICTOGRAM
PIE CHART
there’s actually dozens more charts SPIDER CHART
you probably haven’t heard of before. GANTT CHART
TREEMAP
BAR MEKKO CHART
CHORD CHART
CIRCULAR CHART
TERNARY DIAGRAM
GAUGE CHART
SUNBURST CHART
PARETO PLOT
HISTOGRAM

IF YOU COMBINE MULTIPLE CHARTS ON ONE PAGE, YOU CAN CREATE A DASHBOARD.
WHAT IS A DASHBOARD?
Dashboards are interactive displays of
data that often have multiple data
visualizations.

DASHBOARD SOFTWARES?
Now days, there’s many softwares that will allow you to
make beautiful, interactive dashboards pretty easily. At the
end they’re all pretty similar. Some do have strengths, or
are more easily accessible, but they aren’t too different.

Some of the most common ones are Tableau, Power BI,


Spotfire, and Google Data Studio.
Solid start...
but you want more
SOLID START...BUT YOU WANT MORE
So far, we’ve focused on all the high-value, low-effort activities of analytics.
Eventually, you’re ready for more automation and more advanced opportunities.
Once again, we are still focusing on the correct quadrant, but these are more
value, more effort.

To reach this new level, you’ll need some sort of “data wrangling” tool.

DATA WRANGLING TOOLS


At this point, you’ve graduated from tools like Excel, and are ready for something
with a bit more fire power. Welcome! The tools most often used in these situations
are: SQL, Python, and R.

SQL
SQL stands for structured query
language and is basically database
talk. It’s a way to communicate with a
database and ask questions using
certain syntax.
R
R is a programming language used
exclusively for data analysis and
statistics. It has functions that can help
you understand your data in a matter
of lines of code.

PYTHON
Python is a complete, high-level
general programming language that
can nearly do anything. You can do
data analysis, build websites, do
cybersecurity, make games, all with
one language. There are many great
tools to perform data analysis using
Python that makes analytics fairly
simple.

WHICH SHOULD YOU LEARN?


All of them are useful, and you can’t go wrong with any of them. As a
data analyst, you’re expected to know the basics of SQL, and if you
know any R or Python, that’s a bonus. As a data scientist, there’s an
expectation that you should know at least 2 of the 3 well.

But Python is my favorite.


SQL BASICS
Starting off with SQL isn’t so bad. Knowing a few statements will get you
pretty far. Here’s what you should know! Most queries have a pretty
basic recipe that go as so: SELECT column1, column2 FROM tableA
WHERE column1 = ‘red’

Where the SELECT column is choosing what columns you want to look
at, FROM describes the table you’re pulling from and WHERE does a
filter on the rows. Other where conditions could be > or < with numeric
data.

Instead of collecting rows and columns, you can aggregate the data
from those columns using keywords like COUNT, AVG, MIN, MAX and
SUM which provide quick calculations of your columns.

If you want to start working with multiple tables, you’d use a statement
called JOIN that combines tables. Other common commands are ORDER
BY which allows you to sort results, LIKE which allows you to search on
substrings, and GROUP BY which allows for grouping of records.
CODING BASICS
First, if you’ve never coded before, or haven’t in many years, don’t worry;
it isn’t as hard as you think. CODING ISN’T THAT HARD. Programming is
just a conversation between a human and a computer. A few definitions
will help you get started:

VARIABLE
A variable is just like it was in 9th grade math class,
a letter that represents something else. You can
name a variable nearly anything and it can
represent any math or doing structure. Here’s a
simple example of a variable, let’s call it “a”. We can
define “a” as the number 4. Then we can write a new
expression, something like “a + 4” and the program
would return 8.

FUNCTION
A function is just a way to condense multiple lines
of code into one line of code. So that you don’t need
to rewrite it again if you use it multiple times.

LOOP
A loop is a way to do a repeated calculation,
multiples of times quickly and efficiently.
ADVANCED STATISTICS
At this point, you might be ready to do something more advanced. You
might want to learn some statistics methods to derive deeper insight
from your data. Here are some things you might want to learn.

HYPOTHESIS TESTING
You have a theory that something is true; that’s your
hypothesis. There’s a chance your theory is not true;
your null hypothesis. This technique allows you to
compute how likely this outcome is based on a
normal distribution. Learn more here.

ANOVA
don't let stats scare you off ANOVA, or analysis of variance, allows you to find
either. you can do this!
differences between groups that are statistically
significant. Learn more here.

DIMENSIONALITY REDUCTION
Sometimes, you have too many columns for our
little human brains to deal with. And you’ll want to
have less dimensions so you can simplify the
problem down to something that is a big more
digestible. You can use a technique called PCA
(principal component analysis) and start to identify
trends in your data. Read more here.
WHAT IS MACHINE LEARNING?
At this point, you might feel like you’ve done a lot of descriptive analytics, but want to try
predictive analytics. This is where you can start to learn about something like Machine Learning.

It’s a bit broad of a topic and there’s a lot of confusion in the space. Here’s what it is not. It’s not
robots taking over the world. It’s not computers learning automatically by themselves. It’s not the
apocalypse. Okay? Much happier, and honestly simpler than that.

Here’s my definition: smart algorithms that look for patterns and trends in data sets. And
maybe more simply put you can think of machine learning as computers imitating human
thought process via complex mathematics.

There’s lots of branches of different types of machine learning. There’s computer vision, there’s
natural language processing or NLP, there’s regression, there’s neural networks, there’s
classification, there’s clustering, there’s reinforcement learning, and deep learning.

But there are two main types of machine learning; supervised and unsupervised.

SUPERVISED LEARNING UNSUPERVISED LEARNING


Supervised Learning is when we have Unsupervised Learning is more
previous results of our target outcome. experimental and exploratory. You have no
AKA we know the answers for historical previous recorded outcomes and need to
data, and we can use that to make a model tease out insight by running algorithms to
on the future, or unknown outcomes. dissect potential patterns or trends.

MULTIVARIATE LINEAR REGRESSION


You may remember regression from your earlier math days; y = mx+ b.
Where you’re fitting a linear relationship between one variable and
another. This is actually machine learning! So you probably have already
done machine learning before; wahoo!

You can take this one-step further and make it insanely powerful by just
using multiple x’s (inputs) instead of the one. That way you can make a
predictive model using lots of potential predictors.

You can read more about this via example by looking at the House Price
Prediction Data Set.

K-NEAREST NEIGHBORD
K-Nearest Neighbors
Instead of predicting a quantitative number, you can use machine
learning to predict what group something belongs to. This is called
classification and it’s mostly taught with the Iris Flower Data Set.

Basically, you’re identifying the “closest” examples of what has occurred


previously, and weighting them together to predict what will happen
with this record.
HOLY CRAP!
THAT'S A LOT!
YES, IT IS, BUT DON'T WORRY

I know this is `~30 pages of information, but don't worry; that's why I'm
here. Here are some ways I can help:

Listen to my podcast!
I interview data professionals who have
been in your shoes. We go over what it's like
to be a data professional, and how to get
there.

Schedule a coaching call!


Need a personal plan? A resume review?
Some career coaching? Stuck on a bug? Or
just want to talk data science? Schedule a
coaching call with me and let's chat!

Join my Data Community Discord


Want to engage with like-minded peers?
Want free monthly events?
Want technical questions answered?
Want some new friends? COME AND JOIN!

Join Data Career Jumpstart


DCJ is a complete data career bootcamp. I
teach you descriptive analytics, data viz,
Python, R, SQL, dashboards, EVERYTHING.
Live coaching calls weekly + more.
YOU'RE INVITED TO JOIN

DATA CAREER
JUMPSTART

Data Career Jumpstart is my signature program


that will be your blueprint for becoming a data
professional...
Learn all the data tools necessary
Update your resume + full LinkedIn Bootcamp
Even learn how to land your first freelancing gig

Click Here to Learn More


ANY QUESTIONS?
Feel free to message me and ask me anything!

If you have any questions


about data, don't hesitate
to contact me with my
chat widget @
DataCareerJumpstart.com

Avery Smith

You might also like