You are on page 1of 13

Published in Towards Data Science · Follow

Get started Sign In

You have 2 free member-only stories left this


month. Search
Sign up for Medium and get an extra one

Jason Jung
Oct 5, 2018 · 7 min read
Jason Jung
1.3K Followers
How to Data Science SWE/Data Scientist @
Shopify. Alum @
without a Degree Northwestern and @ UCLA.
Creator @
Thoughts and advice from a data https://salary.ninja. Visit me
scientist @ https://jasjung.github.io

Follow

Related

How to become a
Kaggle
Competitions…

Ensemble Methods
In Machine
Learning
Building Stronger…

Top 15 careers in
Me at GoDaddy office
Data Science
Want to know mor…

Learning Rate
INTRODUCTION Hyperparameter
Explained
Hey there. How can you…
I
want to show you how to become a
Data Scientist without a degree (or
for free). Ironically, I do have a degree —
one that was even made for Data Science
Help Status Writers Blog
(Master’s in Analytics from Careers Privacy Terms About
Knowable
Northwestern). But to give you a little
background, I used to be an accountant
at Deloitte. Isn’t that crazy? I was far
from data science or anything technical.
I had to learn a lot of things online on my
own after work and even during my
Master’s program to catch up to my
peers’ level as I came from a non-
technical background. Having gone
through the experience myself, I can tell
you that degree is very helpful, but not
necessary. Because I have been on both
sides of getting a degree and learning
things online, I think I can give you a
unique perspective. Getting a Master’s in
data science is a sure and fast way to get
into the field, but luckily you don’t have
to if you don’t want to spend $60–90k on

tuition. It will require a lot of your self-


discipline though.

If a friend asks me how to get into data


science, this post would be for them. I
h fi d d i l bl d
hope you find my advice valuable and
relevant as I have gone through the
process myself and found these
resources useful. Before we get into the
details, let’s find out what data science is
about.

WHAT DO YOU DO AS A DATA


SCIENTIST?
Skip this section if you already know this.

Well, from my experience working as a


data scientist at a few companies like
GoDaddy, HERE, and GoGo, data
scientists solve problems by applying
machine learning on big data. Some
examples are: predicting customers’
probability to cancel subscription,
identifying data anomalies, computing
ad-hoc analysis on gigabytes or terabytes
of data, clustering customers into
meaningful groups, text analytics to find
topics in customer chat transcripts,

calculating revenue projection, and the


list does not end.

As a data scientist, you get thrown a lot


of different types of problems. To be
d h
competent, you need to have a strong
foundation in math, statistics, and
programming. You need to know when
to use certain techniques and algorithms
depending on the problem and the data.
At the end, you often need to present the
results and techniques to the executives
and less-technical audience.

Also, as a data scientist, you need to


continue to learn and adapt. Because the
field is changing rapidly, it is important
to stay up-to-date and learn new
techniques. Even today, I spend a lot of
time studying.

WHAT IT TAKES TO BECOME A


DATA SCIENTIST (FOR FREE)

Free online resources

Does data scientist work sound exciting


to you? Great. This is a good time to be
alive to learn for free. I tried to focus on
f h ti b h
free or cheap options because who
doesn’t like free things? It just takes your
commitment and perseverance. I will
describe this process in three phases.

Keep in mind that there are other great


resources other than what is mentioned
below. But these happen to be the ones I
took and found them useful.

PHASE 1: INFANCY
In order to be good at data science, you
need to have good fundamentals in
programming, statistics, and math. At
minimum, I recommend you learn the
following:

University-level introduction to
computer science course (For me it
was C++).

University-level lower division math


courses such as multivariable
calculus, differential equation,

linear algebra. This will directly


impact your understanding of the
low-level math of deep learning, such
as back propagation and matrix
operations.

University level introduction to


University-level introduction to
statistics and probabilities that
teaches you R.

Good news is that they do not have to be


taken at a university. To learn the skills I
mentioned above online, I recommend
these:

Math: Multivariable calculus,


differential equation, linear
algebra from Khan Academy.

Statistics: Statistics in R and Intro


to data science: Data Science
Specialization by Johns Hopkins
University on Coursera.

Python: CodeAcademy.com for


general programming in Python.

To see examples of what data science can


do, check out Kaggle.com where people
learn and compete data science projects.
Also, check out DataCamp.com which
provides hands on tutorials on various
data science topics in both R and Python.

By the end of phase 1, you should be


comfortable with performing simple
machine learning techniques like
logistic/linear regression and decision
trees on either R or Python. On a side
note, I recommend learning both R and
Python. Even though I mostly use
Python lately, it is useful to know both
depending on the problem you are
trying to solve.

PHASE 2: ADOLESCENCE
Now you should have better idea of data
science and statistical methods. In Phase
2, you want to go deeper and focus on
machine learning. I found that online
resources like Coursera do not usually
cover as deep as a university-level
course. Thankfully, Stanford’s AI Lab
provides amazing courses online for free.
So you can watch world-class lectures,
lecture notes, and many other course
materials for free. So I recommend you
take Coursera course and watch
Stanford lectures at the same period if
it’s available. For example,
DeepLearning.ai on Coursera gives you
a very good and practical side of deep
learning where as Stanford’s CS231n
Computer Vision course delves much
deeper.

In this phase, take the following:


Machine Learning: Andrew Ng’s
Machine Learning Course on
Coursera. I took this but did not pay
for the certification because
homework was not using python or
R. But still very useful for
understanding fundamentals of
machine learning.

Machine Learning: Stanford CS229


Machine Learning Course. These
ones are old lecture videos by
Andrew Ng, but still very good.

Text Analytics: Applied Text Mining


in Python on Coursera. I have not
taken this course, but text analytics
and natural language processing
(NLP) is a very common and desired
skill for a data scientist.

PySpark: DataCamp’s Introduction


to PySpark Course. Pyspark is a
python version of Spark distributed
computing framework. Simply put, it
allows you to use python on a very
large data. I use it on a weekly basis.

Deep Learning: Andrew Ng’s


DeepLearning.ai on Coursera. I
paid for the certification because the
homework on this course is very
good. Since it’s very affordable, I
would recommend you pay for it.
Computer Vision: Stanford CS231n
Convolutional Neural Networks for
Visual Recognition Course.

Natural Language Processing:


Stanford CS224n Natural
Language Processing with Deep
Learning Course.

(2019–10–01 Update) PyTorch and


Practical Deep Learning: Fast.ai. I
have heard so many good things
about this free course and PyTorch
has been gaining popularity fast. I
plan on checking out this course
myself.

Again, there are other resources like


DataCamp, Udacity, edX, and fast.ai that
you can check out to learn various
topics.

PHASE 3: INDEPENDENCE
During this phase, you should prepare
for interviews and continue to learn new
and deeper topics. If you have mastered

the materials until phase 2, I think you


should have enough knowledge to apply
for an entry-level job. However, there are
a few more things that are critical for
you to pass the interview process.
First, personal projects. If you are in a
data science program, most classes make
you complete machine learning projects,
which are very good to practice your
skills and to show the employers what
you have done. So I really suggest you to
try some personal projects, easiest one
being Kaggle. Also, even though it is not
necessary, I suggest you to have some
example codes and projects you have
completed on Github to show them to
future employers.

Second, you will most likely be


interviewed SQL. When I started
working at GoDaddy, I did not know too
much SQL. When I was interviewing, I
just learned a little bit on
W3Schools.com, CodeAcademy, and by
googling SQL interview questions. Even
though it depends on companies, SQL is
not as important as your machine
learning and programming skills. It is
relatively easy to learn on the job. Check

out Leetcode.com to practice your SQL


and programming skills.

Finally, by this stage, you should have


enough knowledge to explore different
hi l i i dl
machine learning topics and learn
deeper. It is up to you to focus on
whatever topic that sounds interesting to
you, whether it be RNN, CNN, NLP, or
etc. As for me, I am trying to learn
reinforcement learning these days.

CONCLUSION
This was my very first Medium post and I
really hope you found it useful. I tried to
focus on specific classes you should take
instead of specific tools or Python/R
packages you need to learn because the
classes will teach those things.

If you want to see example codes of


machine learning, check out my Github
repository which I constantly update
with new things I learn. I plan on
sharing more different projects I am
working on or any other random
thoughts on Medium! If you have any
questions or want me to cover anything

specific in the future, ask away in the


comment below.

Thank you for reading!

(Li k dI T i )
(LinkedIn, Twitter)

You check out my other articles and


projects!

Medium: Why are there so many


different types of data scientist?
Machine Learning Engineer vs
Data Scientist (Is Data Science
Over?)

Medium: Curious about what


happens after you become a data
scientist? Check out my new article,
How to Stay Up To Date as a
Data/Research Scientist.

Project: My latest project


www.Salary.Ninja and the
corresponding article Welcome to
Salary Ninja.

Youtube: My program (AlphaBlitz)


that beats Facebook’s Word Blitz
Game using deep learning. Like and
subscribe! :)

Medium: How I Built and Deployed


My First Web Application with
Django in 5 Weeks.

Medium: How I Made Over $2000


With My First Four Medium
A i l
Articles.

Medium: I Worked as a Software


Engineer, Machine Learning
Engineer, and Data Scientist
within Four Years. These Are My
Key Takeaways.

Sign up for The Variable


By Towards Data Science

Every Thursday, the Variable delivers


the very best of Towards Data Science:
from hands-on tutorials and cutting-
edge research to original features you
don't want to miss. Take a look.

Get this newsletter

You might also like