0% found this document useful (0 votes)
563 views9 pages

Scikit-Learn Cheat Sheet for ML Basics

This document provides a summary of a scikit-learn cheat sheet for machine learning in Python. It introduces scikit-learn and its basic workflow, including loading and preprocessing data, creating and fitting models, making predictions, and evaluating performance. Code examples are provided for common machine learning tasks like classification, regression, clustering, and cross-validation.

Uploaded by

burhan ök
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
563 views9 pages

Scikit-Learn Cheat Sheet for ML Basics

This document provides a summary of a scikit-learn cheat sheet for machine learning in Python. It introduces scikit-learn and its basic workflow, including loading and preprocessing data, creating and fitting models, making predictions, and evaluating performance. Code examples are provided for common machine learning tasks like classification, regression, clustering, and cross-validation.

Uploaded by

burhan ök
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Introduction to Scikit-Learn
  • Data Science Cheat Sheet
  • Preparing Your Data
  • Feature Engineering
  • Model Selection and Training
  • Model Evaluation
  • Advanced Model Evaluation
  • Model Tuning Techniques
  • Further Resources

09.07.

2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

Scikit-Learn Cheat Sheet: Python Machine


Learning
A handy sc k t-learn cheat sheet to mach ne learn ng w th Python, nclud ng
code examples.

Most of you who are learn ng data sc ence w th Python w ll have def n tely heard already about
scikit-learn , the open source Python l brary that mplements a w de var ety of mach ne learn ng,
preprocess ng, cross-val dat on and v sual zat on algor thms w th the help of a un f ed nterface.

If you're st ll qu te new to the f eld, you should be aware that mach ne learn ng, and thus also th s
Python l brary, belong to the must-knows for every asp r ng data sc ent st.

That's why DataCamp has created a scikit-learn cheat sheet for those of you who have already
started learn ng about the Python package, but that st ll want a handy reference sheet. Or, f you st ll
have no dea about how scikit-learn works, th s mach ne learn ng cheat sheet m ght come n handy
to get a qu ck f rst dea of the bas cs that you need to know to get started.

E ther way, we're sure that you're go ng to f nd t useful when you're tackl ng mach ne learn ng
problems!

Th s scikit-learn cheat sheet w ll ntroduce you to the bas c steps that you need to go through to
mplement mach ne learn ng algor thms successfully: you'll see how to load n your data, how to
preprocess t, how to create your own model to wh ch you can f t your data and pred ct target labels,
how to val date your model and how to tune t further to mprove ts performance.

[Link] ty/blog/sc k t-learn-cheat-sheet 1/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

In short, th s cheat sheet w ll k ckstart your data sc ence projects: w th the help of code examples, you'll
have created, val dated and tuned your mach ne learn ng models n no t me.

So what are you wa t ng for? T me to get started!

(Cl ck above to download a pr ntable vers on or read the onl ne vers on below.)

Python For Data Science Cheat Sheet: Scikit-learn


Sc k t-learn s an open source Python l brary that mplements a range of mach ne learn ng,
preprocess ng, cross-val dat on and v sual zat on algor thms us ng a un f ed nterface.

A Basic Example

>>> from sklearn import neighbors, datasets, preprocessing


>>> from sklearn.model_selection import train_test_split
>>> from [Link] import accuracy_score
[Link] ty/blog/sc k t-learn-cheat-sheet 2/9
09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

>>> iris = datasets.load_iris()


>>> X, y = [Link][:, :2], [Link]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33)
>>> scaler = [Link]().fit(X_train)
>>> X_train = [Link](X_train)
>>> X_test = [Link](X_test)
>>> knn = [Link](n_neighbors=5)
>>> [Link](X_train, y_train)
>>> y_pred = [Link](X_test)
>>> accuracy_score(y_test, y_pred)

Loading The Data

Your data needs to be numer c and stored as NumPy arrays or Sc Py sparse matr ces. Other types that
are convert ble to numer c arrays, such as Pandas DataFrame, are also acceptable.

>>> import numpy as np


>>> X = [Link]((10,5))
>>> y = [Link](['M','M','F','F','M','F','M','M','F','F','F'])
>>> X[X < 0.7] = 0

Preprocessing The Data

Standardization

>>> from [Link] import StandardScaler


>>> scaler = StandardScaler().fit(X_train)
>>> standardized_X = [Link](X_train)
>>> standardized_X_test = [Link](X_test)

Normalization

>>> from [Link] import Normalizer


>>> scaler = Normalizer().fit(X_train)
>>> normalized_X = [Link](X_train)
>>> normalized_X_test = [Link](X_test)

Binarization

[Link] ty/blog/sc k t-learn-cheat-sheet 3/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

>>> from [Link] import Binarizer


>>> binarizer = Binarizer(threshold=0.0).fit(X)
>>> binary_X = [Link](X)

Encoding Categorical Features

>>> from [Link] import LabelEncoder


>>> enc = LabelEncoder()
>>> y = enc.fit_transform(y)

Imputing Missing Values

>>>from [Link] import Imputer


>>>imp = Imputer(missing_values=0, strategy='mean', axis=0)
>>>imp.fit_transform(X_train)

Generating Polynomial Features

>>> from [Link] import PolynomialFeatures


>>> poly = PolynomialFeatures(5)
>>> oly.fit_transform(X)

Training And Test Data

>>> from sklearn.model_selection import train_test_split


>>> X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)

Create Your Model

Supervised Learning Estimators

L near Regress on

>>> from sklearn.linear_model import LinearRegression


>>> lr = LinearRegression(normalize=True)

Support Vector Mach nes (SVM)

[Link] ty/blog/sc k t-learn-cheat-sheet 4/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

>>> from [Link] import SVC


>>> svc = SVC(kernel='linear')

Na ve Bayes

>>> from sklearn.naive_bayes import GaussianNB


>>> gnb = GaussianNB()

KNN

>>> from sklearn import neighbors


>>> knn = [Link](n_neighbors=5)

Unsuperv sed Learn ng Est mators

Pr nc pal Component Analys s (PCA)

>>> from [Link] import PCA


>>> pca = PCA(n_components=0.95)

K Means

>>> from [Link] import KMeans


>>> k_means = KMeans(n_clusters=3, random_state=0)

Model Fitting

Supervised learning

>>> [Link](X, y)
>>> [Link](X_train, y_train)
>>> [Link](X_train, y_train)

Unsuperv sed Learn ng

>>> k_means.fit(X_train)
fi f (
[Link] ty/blog/sc k t-learn-cheat-sheet
i ) 5/9
09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp
>>> pca_model = pca.fit_transform(X_train)

Prediction

Superv sed Est mators

>>> y_pred = [Link]([Link]((2,5)))

>>> y_pred = [Link](X_test)

>>> y_pred = knn.predict_proba(X_test))

Unsuperv sed Est mators

>>> y_pred = k_means.predict(X_test)

Evaluate Your Model's Performance

Classification Metrics

Accuracy Score

>>> [Link](X_test, y_test)


>>> from [Link] import accuracy_score
>>> accuracy_score(y_test, y_pred)

Class f cat on Report

>>> from [Link] import classification_report


>>> print(classification_report(y_test, y_pred)))

Confus on Matr x

>>> from [Link] import confusion_matrix


>>> print(confusion_matrix(y_test, y_pred)))

Regression Metrics
[Link] ty/blog/sc k t-learn-cheat-sheet 6/9
09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

Mean Absolute Error

>>> from [Link] import mean_absolute_error


>>> y_true = [3, -0.5, 2])
>>> mean_absolute_error(y_true, y_pred))

Mean Squared Error

>>> from [Link] import mean_squared_error


>>> mean_squared_error(y_test, y_pred))

R2 Score

>>> from [Link] import r2_score


>>> r2_score(y_true, y_pred))

Clustering Metrics

Adjusted Rand Index

>>> from [Link] import adjusted_rand_score


>>> adjusted_rand_score(y_true, y_pred))

Homogene ty

>>> from [Link] import homogeneity_score


>>> homogeneity_score(y_true, y_pred))

V-measure

>>> from [Link] import v_measure_score


>>> metrics.v_measure_score(y_true, y_pred))

Cross-Validation

>>> print(cross_val_score(knn, X_train, y_train, cv=4))

[Link] ty/blog/sc k t-learn-cheat-sheet 7/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

>>> print(cross_val_score(lr, X, y, cv=2))

Tune Your Model

Grid Search

>>> from sklearn.grid_search import GridSearchCV

>>> params = {"n_neighbors": [Link](1,3), "metric": ["euclidean", "cityblock"]}

>>> grid = GridSearchCV(estimator=knn,param_grid=params)

>>> [Link](X_train, y_train)

>>> print(grid.best_score_)

>>> print(grid.best_estimator_.n_neighbors)

Randomized Parameter Optimization

>>> from sklearn.grid_search import RandomizedSearchCV

>>> params = {"n_neighbors": range(1,5), "weights": ["uniform", "distance"]}

>>> rsearch = RandomizedSearchCV(estimator=knn,


param_distributions=params,
cv=4,
n_iter=8,
random_state=5)

>>> [Link](X_train, y_train)

>>> print(rsearch.best_score_)

[Link] ty/blog/sc k t-learn-cheat-sheet 8/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

Going Further
Beg n w th our sc k t-learn tutor al for beg nners, n wh ch you'll learn n an easy, step-by-step way how
to explore handwr tten d g ts data, how to create a model for t, how to f t your data to your model and
how to pred ct target values. In add t on, you'll make use of Python's data v sual zat on l brary
matplotl b to v sual ze your results.

[Link] ty/blog/sc k t-learn-cheat-sheet 9/9

You might also like