You are on page 1of 1

a lgorithm d esc ription a ppli cation s a dva nta g es di sa dva nta g es

use cases
A simple algorithm that models a linear Explainable metho Assumes linearity between inputs and outpu
Stock price predictio
relationship between inputs and a continuous Interpretable results by its output coefficient Sensitive to outlier
Linear Regression Predicting housing price
numerical output variable Faster to train than other machine learning Can underfit with small, high-dimensional data
Predicting customer lifetime value
models
Top Machine Learning Algorithms
use cases
A simple algorithm that models a linear Interpretable and explainabl Assumes linearity between inputs and output
Credit risk score predictio
relationship between inputs and a categorical Less prone to overfitting when using Can overfit with small, high-dimensional data

Linear Models
Logistic Regression output (1 or 0)
Customer churn prediction
regularizatio
Applicable for multi-class predictions

Part of the regression family — it penalizes use cases


Less prone to overfitting All the predictors are kept in the final mode
Predictive maintenance for automobile
features that have low predictive outcomes by Best suited where data suffer from Doesn't perform feature selection
Ridge Regression Sales revenue prediction
shrinking their coefficients closer to zero. Can multicollinearit
be used for classification or regression Explainable & interpretable

Part of the regression family — it penalizes use cases


Less prone to overfittin Can lead to poor interpretability as it can
Predicting housing price
features that have low predictive outcomes by Can handle high-dimensional dat keep highly correlated variables
Lasso Regression Predicting clinical outcomes based on
shrinking their coefficients to zero. Can be used No need for feature selection
health data
for classification or regression

Decision Tree models make decision rules on use cases


Explainable and interpretabl Prone to overfittin
Customer churn predictio
the features to produce predictions. It can be Can handle missing values Sensitive to outliers
De cision Tree used for classification or regression
Credit score modelin
Disease prediction

An ensemble learning method that combines use cases


Reduces overfittin Training complexity can be high
Credit score modelin
the output of multiple decision trees Higher accuracy compared to other models Not very interpretable
Random Forests Predicting housing prices
Tree-Based Models
Supervised Learning

Gradient Boosting Regression employs boosting use cases


Better accuracy compared to other Sensitive to outliers and can therefore cause
Predicting car emission
Gradient Boosting
 to make predictive models from an ensemble of regression model overfittin
Predicting ride hailing fare amount
weak predictive learners It can handle multicollinearit Computationally expensive and has high
Regression
It can handle non-linear relationships complexity

Gradient Boosting algorithm that is efficient & use cases


Provides accurate result Hyperparameter tuning can be comple
Churn predictio
flexible. Can be used for both classification and Captures non linear relationships Does not perform well on sparse datasets
XGBoost regression tasks
Claims processing in insurance

A gradient boosting framework that is designed use cases


Can handle large amounts of dat Can overfit due to leaf-wise splitting and high
Predicting flight time for airline
to be more efficient than other implementations Computational efficient & fast training spee sensitivit
LightGBM Regressor Predicting cholesterol levels based on
Low memory usage Hyperparameter tuning can be complex
>

health data

use cases
K-Means is the most widely used clustering 1. Scales to large datasets
1. Requires the expected number of clusters
Customer segmentatio
approach—it determines K clusters based on 2. Simple to implement and interpret
from the beginning

K-Means Recommendation systems


euclidean distances 3. Results in tight clusters 2. Has troubles with varying cluster sizes and
densities
Unsupervised Learning

Clustering

A "bottom-up" approach where each data use cases


1. There is no need to specify the number
 1. Doesn’t always result in the best clustering

Fraud detectio
Hierarchical
 point is treated as its own cluster—and then of clusters
2. Not suitable for large datasets due to high
Document clustering based on similarity 2. The resulting dendrogram is informative
Clustering the closest two clusters are merged together complexity
iteratively

use cases
A probabilistic model for modeling normally 1. Computes a probability for an observation 1. Requires complex tuning

Customer segmentatio
Gaussian Mixture
 distributed clusters within a dataset belonging to a cluster
2. Requires setting the number of expected mixture
Recommendation systems
2. Can identify overlapping clusters
components or clusters
Models
3. More accurate results compared to K-means
Association

Rule based approach that identifies the most use cases


1. Results are intuitive and Interpretable
1. Generates many uninteresting itemsets

1. Product placements

frequent itemset in a given dataset where prior 2. Exhaustive approach as it finds all rules 2. Computationally and memory intensive.

Apriori algorithm knowledge of frequent itemset properties is used


2. Recommendation engines

based on the confidence and support 3. Results in many overlapping item sets
3. Promotion optimization
>

You might also like