You are on page 1of 1

Cheatsheet:Scikit Learn

Scikit-Learn is the most popular and widely used library for


machine learning in Python.

Pre-Processing
Function Description

1 sklearn.preprocessing.StandardScaler Standardize features by removing the


mean and scaling to unit variance

sklearn.preprocessing.Imputer Imputation transformer for completing


2
missing values.

3 sklearn.preprocessing.LabelBinarizer Binarize labels in a one-vs-all fashion

4
sklearn.preprocessing.OneHot Encode categorical integer features
Encoder using a one-hot a.k.a one-of-K scheme.

sklearn.preprocessing.Polynomial Generate polynomial and interaction


5 features.
Features

Regression
Function Description
1 sklearn.tree.DecisionTreeRegressor A decision tree regressor

2 sklearn.svm.SVR Epsilon-Support Vector Regression

sklearn.linear_model.Linear Ordinary least squares Linear


3
Regression Regression

sklearn.linear_model.Lasso Linear Model trained with L1 prior as


4
regularizer (a.k.a the Lasso)

Linear model fitted by minimizing a


5 sklearn.linear_model.SGDRegressor regularized empirical loss with SGD

6 sklearn.linear_model.ElasticNet Linear regression with combined L1


and L2 priors as regularizer

7 sklearn.ensemble.RandomForest A random forest regressor


Regressor

8
sklearn.ensemble.GradientBoosting Gradient Boosting for regression
Regressor

9 sklearn.neural_network. Multi-layer Perceptron regressor


MLPRegressor

Classification
Function Description
sklearn.neural_network.MLP Multi-layer Perceptron classifier
1
Classifier

2 sklearn.tree.DecisionTreeClassifier A decision tree classifier

3 sklearn.svm.SVC C-Support Vector Classification

sklearn.linear_model.Logistic Logistic Regression (at.k.a logit, Max


4
Regression Ent) classifier

sklearn.linear_model.SGDClassifier Linear classifiers (SVM, logistic


5
regression, a.o.) with SGD training

6 sklearn.naive_bayes.GaussianNB Gaussian Naive Bayes

sklearn.neighbors.KNeighbors Classifier implementing the k-nearest


7
Classifier neighbors vote

8 sklearn.ensemble.RandomForest A random forest classifier


Classifier

9 sklearn.ensemble.GradientBoosting Gradient Boosting for classification


Classifier

Clustering
Function Description

1 sklearn.cluster.KMeans K-Means clustering

Perform DBSCAN clustering from


2 sklearn.cluster.DBSCAN vector array or distance matrix

3
sklearn.cluster.Agglomerative Agglomerative Clustering
Clustering

4 sklearn.cluster.SpectralBiclustering Spectral bi-clustering

DimensionalityReduction
Function Description
1 sklearn.decomposition.PCA Principal component analysis (PCA)

sklearn.decomposition.Latent Latent Dirichlet Allocation with


2
DirichletAllocation online variational Bayes algorithm

3 sklearn.decomposition.SparseCoder Sparse coding

4
sklearn.decomposition.Dictionary Dictionary learning
Learning

Model Selection
Function Description

1 sklearn.model_selection.KFold K-Folds cross-validator

2 sklearn.model_selection.Stratified Stratified K-Folds cross-validator


KFold

3 sklearn.model_selection.TimeSeries Time Series cross-validator


Split

sklearn.model_selection.train Split arrays or matrices into random


4
_test_split train and test subsets

sklearn.model_selection.GridSearch Exhaustive search over specified


5 parameter values for an estimator.
CV

6 sklearn.model_selection.Randomized Randomized search on hyper


SearchCV parameters.

7 sklearn.model_selection.cross_val_ Evaluate a score by cross-validation


score

Metric
Function Description

1 sklearn.metrics.accuracy_score Classification Metric: Accuracy


classification score

sklearn.metrics.log_loss Classification Metric: Log loss,


2
a.k.a logistic loss or cross-entropy loss

3 sklearn.metrics.roc_auc_score Classification Metric: Compute Rece


iver operating characteristic (ROC)

Regression Metric: Mean absolute


4 sklearn.metrics.mean_absolute_error error regression loss

Regression Metric: R^2 (coefficient of


5 sklearn.metrics.r2_score determination)regression score function.

6 sklearn.metrics.label_ranking_loss Ranking Metric: Compute Ranking


loss measure

sklearn.metrics.mutual_info_score Clustering Metric: Mutual Information


7
between two clusterings.

Miscellaneous
Function Description
Load and return the boston house-
1 sklearn.datasets.load_boston prices dataset (regression)

sklearn.datasets.make_classification Generate a random n-class classifi-


2
cation problem

sklearn.feature_extraction.Feature Implements feature hashing, a.k.a


3
Hasher the hashing trick

sklearn.feature_selection.SelectK Select features according to the k


4
Best highest scores

sklearn.pipeline.Pipeline Pipeline of transforms with a final


5
estimator

sklearn.semi_supervised.Label Label Propagation classifier for semi-


6
Propagation supervised learning

For More Infographics log on:


www.analyticsvidhya.com

You might also like