Machine Learning Algorithms Cheatsheet

Has To increase regularization or
Task Properties Main sklearn models (bold = recommended) Key sklearn hyperparameters Should scale features Multi-target/label Deterministic Has predict_proba_() feature_importances_ Typical loss function Typical evaluation metric Training complexity Prediction complexity Space complexity
similar effect
linear_model.LinearRegression (no
regularization), linear_model.Lasso (L1-
regularization), linear_model.Ridge (L2- Use coef_ only if data is Increase alpha (usually O(p²n + p³) for n examples and
Linear regression Regression Linear, deterministic regressor alpha Yes Yes Yes No Mean squared error R², MSE, RMSE O(p) O(p)
regularization), linear_model.ElasticNet (L1 scaled squared L2 and/or L1 penalty) p model parameters
and L2), linear_model.ElasticNetCV,
linear_model.SGDRegressor
Yes (some sources say No, Decrease C (usually L2

Binary classifier (multiclass via OVR), linear_model.LogisticRegression, linear_model. Cross-entropy aka log loss, aka
Logistic regression Classification penalty, C but this is not the case in my No Yes Yes No penalty), or increase alpha for Likelihood ratio, weighted F1 O(np) O(p) O(p)
deterministic (depends on solver) SGDClassifier logistic loss, aka deviance
experience) SGDClassifier
Ridge regression Kernel-based, non-linear decision Cross-entropy aka log loss, aka O(p²n + p³) for n examples and
Classification linear_model.RidgeClassifier alpha Yes No Depends on solver No No Increase alpha Weighted F1 O(p) O(p)
classification boundary, binary classifier logistic loss, aka deviance p model parameters
For brute force, O(npk) for k

neighbours and n training
O(1) for brute force, or O(n.log
Instance-based, non-parametric, neighbors.KNeighborsClassifier, neighbors. Weighted F1 (classification) or examples of dimension p O(1) for brute force, or O(npk)
k-nearest neighbours Classification or Regression n_neighbours Yes Yes Yes Yes No None Increase n_neighbors (n).p) for k-d tree for n
multiclass classifier KNeighborsRegressor R², MSE, RMSE (regression) (number of features ~ number for k-d tree
examples and p features
of parameters). Or, for k-d tree,
O(k.log(n)).
svm.SVC, svm.SVR, svm.LinearSVC, svm.LinearSVR, Decrease C (squared L2

Linear support vector Linear decision boundary, binary Weighted F1 (classification) or O(n²p) for n examples and p O(sp) for s support vectors and O(sp) for s support vectors and
Classification or Regression svm.NuSVC, svm.NuSVR, linear_model. C, or alpha for SGDClassifier Yes No Only if probability=False Only if probability=True No Hinge loss penalty), or increase alpha for
machine classifier (multiclass via OVR) R², MSE, RMSE (regression) model parameters p model parameters p model parameters
SGDClassifier SGDClassifier
Kernel-based, non-linear decision

Nonlinear support vector Decrease C (squared L2 Weighted F1 (classification) or O(n²p + n³) for n examples and O(sp) for s support vectors and O(sp) for s support vectors and
Classification or Regression boundary, binary classifier (multiclass via svm.SVC, svm.SVR, svm.NuSVC, svm.NuSVR kernel, C Yes No Only if probability=False Only if probability=True No Hinge loss
machine penalty) R², MSE, RMSE (regression) p model parameters p model parameters p model parameters
OVR)
Decrease max_depth,
max_features, max_depth, O(nzp) for n examples, p model
tree.DecisionTreeClassifier, tree. Gini (per split, not global so not max_features, or increase Weighted F1 (classification) or
Decision tree Classification or Regression Non-parametric, multiclass classifier
DecisionTreeRegressor
min_samples_leaf, No Yes Yes Yes Yes
strictly a loss function per se) R², MSE, RMSE (regression)
parameters, if depth is limited O(z) for max depth z O(z)
min_samples_split min_samples_split, to z.
min_samples_leaf
Decrease max_depth,
n_estimators, max_features, O(nzpt) for n examples, p
ensemble.RandomForestClassifier, ensemble. Gini (per split, not global so not max_features, or increase Weighted F1 (classification) or
Random forest Classification or Regression Stochastic, ensemble multiclass classifier
RandomForestRegressor
max_depth, min_samples_leaf, No Yes No Yes Yes
model parameters, max depth O(zt) O(zt)
min_samples_split min_samples_split, z, and t trees
min_samples_leaf
Decrease max_depth,
Stochastic, ensemble multiclass classifier n_estimators, max_features, O(nzpt) for n examples, p
ensemble.ExtraTreesClassifier, ensemble. Gini (per split, not global so not max_features, or increase Weighted F1 (classification) or
Extremely randomized trees Classification or Regression (ExtraTrees is to ExtraTree as
ExtraTreesRegressor
max_depth, min_samples_leaf, No Yes No Yes Yes
RandomForest is to DecisionTree) min_samples_split min_samples_split, z, and t trees
min_samples_leaf
ensemble.GradientBoostingClassifier, ensemble. Decrease max_depth,

Stochastic, ensemble multiclass classifier n_estimators, max_features, O(nzpt) for n examples, p
GradientBoostingRegressor, ensemble. Cross-entropy aka log loss, aka max_features, or increase Weighted F1 (classification) or
Gradient boosted trees Classification or Regression (ExtraTrees is to ExtraTree as
HistGradientBoostingClassifier, ensemble.
max_depth, min_samples_leaf, No No No Yes Yes
logistic loss, aka deviance R², MSE, RMSE (regression)
RandomForest is to DecisionTree) min_samples_split min_samples_split, z, and t trees
HistGradientBoostingRegressor min_samples_leaf
Cross-entropy aka log loss, aka

hidden_layer_sizes, activation,
neural_network.MLPClassifier, neural_network. logistic loss, aka deviance Weighted F1 (classification) or
Multilayer perceptron Classification or Regression Deep feedforward artificial neural network
MLPRegressor
alpha, learning_rate_init, Yes Yes No Yes No
(classification) or squared error
Increase alpha (L2 penalty)
R², MSE, RMSE (regression) 😬 O(p) for p parameters (weights) O(p)
max_iter
(regression)

Machine Learning Algorithms Cheatsheet

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Algorithms Cheatsheet

Uploaded by

Copyright:

Available Formats

Has To increase regularization or

Yes (some sources say No, Decrease C (usually L2

For brute force, O(npk) for k

svm.SVC, svm.SVR, svm.LinearSVC, svm.LinearSVR, Decrease C (squared L2

Kernel-based, non-linear decision

ensemble.GradientBoostingClassifier, ensemble. Decrease max_depth,

Cross-entropy aka log loss, aka

You might also like