You are on page 1of 4


AskorSearchQuora AskQuestion Read Answer Notifications Achint

BigData Statistics(academicdiscipline) BookRecommendations Books

How do I learn statistics for data science? science?
basicstatisticsandmathematics? WhatstatisticsshouldIknowtododatascience?

Answer Request Follow 514 Comment Share 1 Downvote

Become a top Hadoop developer. Join interactive online course. statisticsfordatascienceandmachinelearning?
Instructor-led course with 24x7 support. Master HDFS, Mapreduce, Yarn, Pig, Hive,
HBase, Oozie & Flume. whichoneisbetterUdacity:IntrotostatisticsorKhan

William Chen, studied Statistics at Harvard University (2014)
For any aspiring data scientist, I would highly recommend learning statistics with a
heavy focus on coding up examples, preferably in Python or R.

My favorite series is the Statistical Learning series. It's a great primer on statistical
modeling / machine learning with applications in R.
The Elements of Statistical Learning

An Introduction to Statistical Learning MoreRelatedQuestions

If you want something with a Python focus, I would check out Think Stats
There are ocial pdf versions generously available for FREE at Bookmark 106,172Views
data mining, inference, and prediction. 2nd Edition. SuggestEdits LastAskedNov20

Page on Thank 7MergedQuestions

Edits Report


Upvote 202 Downvote Comments 5+

Greg Ryslik, Led data science teams at Bay Area companies


I wouldnt focus so much on learning statistics for data science, but more on just
learning statistics. Data Science itself is a combination of two elds,
statistics/mathematics and computer science. There were data scientists that sat at
the intersection of those two elds far before the term was coined.

Many of the answers above (which are great!) are targeted specically to machine
learning. In getting a broader perspective you gain the ability to not only implement
the models but understand how they connect and are related to the deeper
mathematics behind them as such, this post is more towards the general eld.

In terms of statistics that are immediately useful to data science, they typically fall
into one of two categories, either 1) inference or 2) model tting.

1) In regards to inference that typically topics such as:

1) Parameter Estimation
2) Hypothesis testing
3) Bayesian Analysis
4) Identifying the best estimator
5) Other Statistical Theory
Some classic books on these topics include: 1
AskorSearchQuora AskQuestion Read Answer Notifications Achint
(more introductory): Statistical Inference: George Casella: 9788131503942: Books
(more advanced): Theory of Point Estimation (2nd English Edition): E.L. Lehmann, RelatedQuestions
George Casella: 9783698745156: Books
2) In regards to model tting there are a multitude of topics: science?

1) Linear Regression
2) Non-linear Regression Whichisabettercareeroptionforsomeoneinterested
3) Categorical Data Analysis instatistics,probability&linearalgebra?Data
4) Time Series & Longitudinal Analysis
5) Machine Learning WherecanIfindsomegoodfreeresourcestolearn
Some famous intro books include:
Linear Models: Applied Linear Statistical Models w/Student CD-ROM: Michael H.
Kutner, John Neter, Christopher J. Nachtsheim, William Li: 9780071122214: Books Isagraduateoptimizationcoursegoodforstatistics
Categorical Data: An Introduction to Categorical Data Analysis
(9780471226185): Alan Agresti: Books WherecanIlearndatascience?

3) Finally, there are also a variety of topics that are very helpful with things like datasciencesingeneralimprovedyourunderstanding
A/B testing, missing data, etc. andrateatwhichyouabsorbinformatio...

These include things like:

1) Design of Experiments (very helpful in A/B testing)

2) Bootstrapping (helpful when parameter of interest is hard to calculate)
3) Sample Size calculations (useful when trying to understand how many samples you
4) Multiple comparisons (what happens if you run many tests)
5) A ton of others.

Many of the above you will encounter as you get through the 1) and 2) above.

If youre interested in a potential introductory syllabus, Ill be teaching a bootcamp

shortly. The course and syllabus is found here:

Statistical Foundations- Metis

Hope this helps!


Upvote 8 Downvote Comment

Ferris Jumah, Data and Products


Working list, please suggest edits, need classications

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second
Edition [1]
-Hastie Tibshirani, Friedman

Statistical Inference [2]

-Casella, Berger
--Excellent starting text for moving on to more advanced material

Bayesian Data Analysis [3]

-Gelman, Carlin, Stern, Rubin

Mining of Massive Datasets [4]

-Rajaraman, Ullman, Leskovec

All of Statistics [5]


Also, for a very comprehensive list, see

What are some good resources for learning about statistical analysis?
AskorSearchQuora AskQuestion Read Answer Notifications Achint
[1] data mining, inference, and prediction. 2nd Edition. (download/buy)
[2] Statistical Inference: George Casella, Roger L. Berger: 9780534243128: Books RelatedQuestions
[3] Home page for the book, "Bayesian Data Analysis"
[4]Mining of Massive Datasets - The Stanford University InfoLab
[5] All of Statistics: A Concise Course in Statistical Inference (Springer Texts in
Statistics): Larry Wasserman: 9780387402727: Books WhatstatisticsshouldIknowtododatascience?

26.3kViewsViewUpvotes Whichisabettercareeroptionforsomeoneinterested
Upvote 33 Downvote Comments 3+ ScienceorMachineLearning?

Brian Feeny, Harvard Grad Student statisticsfordatascienceandmachinelearning?
(2014)andJustinRising,PhDinstatistics Inordertolearnstatisticsfordatascienceclass,
There are many books that will focus on statistics as it applies to data science, Academy:ProbabilityandStatist...
however I do believe you should approach statistics holistically, and not just in the
frame of reference of Data Science. For that, I recommend the following book:

Statistics, 4th Edition (9780393929720): David Freedman, Robert Pisani, Roger WherecanIlearndatascience?

Purves Howhaslearningcomputerscience,statistics,or
This is the same book (loosely) followed by Andrew Conway in his Coursera course andrateatwhichyouabsorbinformatio...

Statistics One. I would try to nd the International version, as they are identical to
the US versions, but can be had for around $30.

The rst chapter or two are rather confusing, but I nd the rest of the book very well
laid out. Andrew Conway is very knowledgable in Statistics, and no doubt he has
recommended this book for good reason.

That said, I recommend using no single resource. Statistics is far too important to
Data Science. You must master it, and like most things, that is a constant work in
progress. I am addicted to Statistics, and I think this book is partially to blame.


Upvote 9 Downvote Comments 2

Carl Shan, reads a lot, has written a few


To brush up on some basic statistics, without dropping a load of cash on a

textbook/degree, I'd like to suggest to start o by reading over a series of short primers
(10-12 page PDFs per topic) meant for the novice statistician, and social science
researcher written by MIT EECS PhD student Ramesh Sridharan.

He taught a 1-mo course at MIT for researchers brushing up on basic or intermediate

statistics, and uploaded all of his PDFs. (You can check out the website here: Statistics
for Research Projects )

I stumbled across his notes while looking up some details regarding the Kolmogorov-
Smirnov testa non-parametric test (a non-parametric test is a test that doesn't
assume the data has any sort of probability distribution, and is thus "parameter"-free)
for dierences in two distributionsand found his notes to be incredibly lucidly
written and clear.

If you have some mathematical or technical maturity, you may nd his notes
similarly helpful in getting up to speed. If not, I still think his notes are a great initial
entry point into quickly getting a lay of the land.

The link is to his 6-7 notes, totaling ~70 pages, is here: Statistics for Research

Note that he doesn't have any notes on predictive modeling, which is a key part of
machine learning. I emailed him asking why, and he told me that he didn't have the
chance to write anything detailed for the topic. I'm considering drafting a short
primer myself...
13.6kViewsViewUpvotes 1
AskorSearchQuora AskQuestion Read Answer Notifications Achint
Upvote 24 Downvote Comment

Shailesh Upadhyay, former Associate at Indian School of Business (2010-
OriginallyAnswered:HowdoIlearnstatisticsandprobabilityfordatascience? WhatstatisticsshouldIknowtododatascience?

To become a good data scientist, you need to build a strong foundation in the Whichisabettercareeroptionforsomeoneinterested
following: instatistics,probability&linearalgebra?Data
Fundamental statistics (topics like descriptive & inferential statistics;
parametric & non parametric tests, simple & multiple regression etc)

Prociency with atleast one statistical computing language like R, SAS, Inordertolearnstatisticsfordatascienceclass,
STATA etc. Python programmers who have done data analysis also have an whichoneisbetterUdacity:IntrotostatisticsorKhan
edge. Academy:ProbabilityandStatist...

Good knowledge/experience with advanced modeling techniques, such as Isagraduateoptimizationcoursegoodforstatistics

time series analysis, matrix factorization, mixed-eect models, and machine
learning techniques such as boosting and random forests. WherecanIlearndatascience?

Algorithmic thinking- ability to think about and solve problems at a level of Howhaslearningcomputerscience,statistics,or
abstraction that is beyond any specic programming language goes a long

An understanding of how relational databases work. SQL experience helps.

Experience with large data sets & distributed computing using Hadoop/Hive
is an added advantage if you want to continue excelling as a data scientist.

A few online resources and moocs that can help you get started are:

1. Data Analyst (a good place to get a feel for data and practice)

2. Managing Big Data with MySQL - Coursera (learn using relation DB in

business analysis)

3. Practical Machine Learning - Coursera (a primer to start machine learning


Hope this helps.


Upvote 12 Downvote Comments 1+


AnswerwrittenIndiaTopicyoumightlikeThu AnswerwrittenIndiaTopicyoumightlikeFeb Undiscoverednewanswer22m

Autorickshaw drivers are facing Where can I buy drugs online?
losses due to Uber and Ola. Is What do foreigners like about
Josjhua Litese
this fair? India?
Anna Stepanova, lives in Sakura Su We have Pain and anxiety meds of
Hyderabad, India WrittenFeb22
dierent types with no Prescription
WrittenThu I am trying to required. Prices are moderate and with
Ola and Uber have denitely saved lives of learn great relationship with our clients. We are
many foreigners in India. I know that Sanskrit. I American based underground vendor
Indians themselves suer from am very very with expli...
dishonesty of many auto drivers but with interested in
foreigners it is another level of hell. Indian
When... culture, especially the philosophy and
religions. Im trying hard to nd a way to
ReadInFeed get enlightened.ReadInFeed
I believe that perhaps ReadInFeed
Indian p...