You are on page 1of 3

Important Links in Data Science

Often, I have seen that many people have a knowledge about various algorithms in Machine Learning
and have gone through carious courses from the Internet, but they often lack the basic understanding of
workflow in Data Science projects. And it is completely fine. This comes from practice. Based on this
practice, I have come up with various articles that have really helped me to understand Data Science in
depth. Hope it helps you!

Rishabh Agrawal

Some Essentials
1. How to do a Data Science Project from Scratch:
a. https://www.freecodecamp.org/news/how-to-build-a-data-science-project-from-
scratch-dc4f096a62a1/

2. MACHINE LEARNING IS COMPLETE ITERATION:


a. https://elitedatascience.com/machine-learning-iteration
b. https://blog.insightdatascience.com/how-to-deliver-on-machine-learning-projects-
c8d82ce642b0

3. Additional: To completely learn Machine Learning in Python, I also suggest you to go through
this User Guide completely. You don’t need any other course to actually learn scikit. I’ll be
referring to many articles from this User Guide below too!
a. https://scikit-learn.org/stable/user_guide.html

Complete Data Preparation

Mastering Data Preparation in Python: Read this 2 page blog and the related links to learn the exact
basic workflow involved in any Data Science Project.

a. https://www.kdnuggets.com/2019/06/7-steps-mastering-data-preparation-python.html
b. https://www.kdnuggets.com/2017/06/7-steps-mastering-data-preparation-
python.html/2

Based on the above 2 page blog, below are the additional links related for every step involved:

1. Exploratory Data Analysis:


a. https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/
b. https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python (this
article is based on the ProbStats course from Stanford. You’ll know how to actually use
Statistics in real life. ProbStats Course link:
https://lagunita.stanford.edu/login?next=/courses/course-
v1%3AOLI%2BProbStat%2BOpen_Jan2017/course/)
c. http://seaborn.pydata.org/tutorial/distributions.html

2. Working with Missing Data


a. http://pandas.pydata.org/pandas-docs/stable/missing_data.html
b. https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4
c. https://clevertap.com/blog/how-to-treat-missing-values-in-your-data-part-i/
d. https://clevertap.com/blog/how-to-treat-missing-values-in-your-data-part-ii

3. Outliers Removal
a. https://www.kdnuggets.com/2017/06/7-steps-mastering-data-preparation-
python.html/2 (go through all the links given in the Outliers section of this article)
b. https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/#three
c. https://scikit-learn.org/stable/modules/outlier_detection.html

4. Imbalanced Data Learning


a. https://www.kdnuggets.com/2016/08/learning-from-imbalanced-classes.html
b. https://www.kdnuggets.com/2017/06/7-techniques-handle-imbalanced-data.html
c. https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/
d. https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-
machine-learning-dataset/
e. https://www.analyticsvidhya.com/blog/2016/09/this-machine-learning-project-on-
imbalanced-data-can-add-value-to-your-resume/

5. Data Transformations: The best one (in the scikit User Guide itself)
a. http://scikit-learn.org/stable/modules/preprocessing.html
6. Dealing with Categorical Variables:
a. https://www.analyticsvidhya.com/blog/2015/11/easy-methods-deal-categorical-
variables-predictive-modeling/
b. https://towardsdatascience.com/understanding-feature-engineering-part-2-categorical-
data-f54324193e63

7. Feature Engineering:
a. https://www.freecodecamp.org/news/how-to-build-a-data-science-project-from-
scratch-dc4f096a62a1/
b. https://www.analyticsvidhya.com/blog/2016/12/introduction-to-feature-selection-
methods-with-an-example-or-how-to-select-the-right-variables/

8. How not to use PCA?


a. https://medium.com/data-design/how-to-not-be-dumb-at-applying-principal-
component-analysis-pca-6c14de5b3c9d

You might also like