You are on page 1of 69

Lecture 6

XGBoost
Dimensionality Reduction

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 1


Need to Know Before We Start ……..

Regularization in Machine Learning

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 2


Overfitting
Can you recall Overfitting?
It is a phenomenon that occurs when a Machine Learning model is constraint to training
set and not able to perform well on unseen data.

This happens because the model is trying too hard to capture the noise in training dataset.
By noise we mean the data points that don’t really represent the true properties of your data,
but random chance.
Learning such data points, makes models more flexible, at the risk of overfitting.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 3


Regularisation
Regularisation is a technique used to reduce the errors by fitting the function
appropriately on the given training set and avoid overfitting.
Regularisation discourages learning a more complex or flexible model, so as to avoid the
risk of overfitting.
In tree-based methods regularization is usually understood as defining a minimum gain
so which another split happens:

This minimum gain can usually be set for anything between (0, ∞).
The commonly used regularisation techniques are :
L1 regularisation
L2 regularisation
Dropout regularisation

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 4


How Does Regularisation Solve The Problem?
You penalize your loss function, 𝐿(𝑋,𝑌), by adding a multiple of an L1 (LASSO)
or an L2 (Ridge).

You get the following equation:


𝐿(𝑋,𝑌)+𝜆𝑁(𝑤)
(𝑁 is either the 𝐿1, 𝐿2 or any other norm)
𝜆 is Regularization term.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 5


Regularisation
A regression model which uses L1 Regularisation technique is called LASSO(Least Absolute
Shrinkage and Selection Operator) regression.

A regression model that uses L2 regularisation technique is called Ridge regression.

Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term to the
loss function(L).

Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss
function(L).

NOTE that during Regularisation the output function(y_hat) does not change. The change is
only in the loss function.
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 6
Regularisation
The loss function before regularisation:
prediction cost
The loss function after regularisation:

prediction cost +
regularization cost

lambda (𝜆) is a Hyperparameter Known as regularisation constant and it is greater


than zero.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 7


XGBoost

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 8


XGBoost: Aim

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 9


XGBoost: Quick Start
I would like to show you how easy and powerful XGBoost is with this Quick
start.

Let’s solve a real problem.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 10


Problem Description
Pima Indians Diabetes Prediction
Predict the onset of diabetes based on diagnostic measures.
https://www.kaggle.com/uciml/pima-indians-diabetes-database
Tabular Data : 768 rows x 9 columns
768 people
8 input features and 1 output

Input features (diagnostic measures) : X ∈ R768 x 8


Pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes
pedigree function, age

Output : y ∈ R768 x 1
Whether he / she has diabetes (0 or 1).
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 11
Code
Around 30 lines (including some pre-processing).

The core part of the code.

Very simple. Isn’t it?

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 12


Results
Train error: 1.9% / Test error: 24% .
Quite good performance.
Vary depending on the trial.

Also, this library is really fast.

It takes < 10 seconds to fit the model in my lab-top computer.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 13


Results and Further Usage
XGBoost also tells you how important each feature is.

5-th feature is most important and 0-th feature is least important.


5-th feature: BMI / 0-th feature: Pregnancies

It can be used as a basis when using other models (such as Deep Learning to learn
later).
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 14
XGBoost: What It Is?
XGBoost stands for eXtreme Gradient Boosting.

It is an algorithm that has recently been dominating applied machine learning


and Kaggle competitions for structured or tabular data.

It is an implementation of gradient boosted decision trees designed for speed


and performance.

It is a machine learning library like numpy, tensorflow, pytorch.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 15


XGBoost: What It Is?
The “eXtreme” refers to speed enhancements such as parallel computing and
cache awareness that makes XGBoost approximately 10 times faster than
traditional Gradient Boosting.

In addition, XGBoost includes a unique split-finding algorithm to optimize trees,


along with built-in regularization that reduces overfitting.

Generally speaking, XGBoost is a faster, more accurate version of Gradient


Boosting.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 16


XGBoost and Gradient Boosting
XGBoost is an implementation of gradient boosting machines created by Tianqi
Chen, now with contributions from many developers.

Both XGBoost and gbm follows the principle of gradient boosting.

There are, however, the difference in modeling details. Specifically, XGBoost used a
more regularized model formalization to control over-fitting, which gives it better
performance.

In addition to shrinkage parameter, XGB employs many other configurable


strategies that are not found in ‘traditional’ GBM implementations. Check out the list:
https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 17


Why XGBoost?
Easy to implement in scikit-learn.

It is an ensemble, so it scores better than individual models.

It is regularized, so default models often don’t overfit.


Very fast (for ensembles).

It learns form its mistakes (gradient boosting).

Has extensive hyperparameters for fine-tuning.


It includes hyperparameters to scale imbalanced data and fill null values.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 18


XGBoost Model Features
The implementation of the model supports the features of the scikit-learn and R
implementations, with new additions like regularization.

Three main forms of gradient boosting are supported:


Gradient Boosting algorithm also called gradient boosting machine including
the learning rate.

Stochastic Gradient Boosting with sub-sampling at the row, column and


column per split levels.

Regularized Gradient Boosting with both L1 and L2 regularization.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 19


XGBoost System Features
The library provides a system for use in a range of computing environments, not
least:

Parallelization of tree construction using all of your CPU cores during training.

Distributed Computing for training very large models using a cluster of machines.

Out-of-Core Computing for very large datasets that don’t fit into memory.

Cache Optimization of data structures and algorithm to make best use of


hardware.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 20


XGBoost Algorithm Features
The implementation of the algorithm was engineered for efficiency of compute time
and memory resources.

A design goal was to make the best use of available resources to train the model.

Some key algorithm implementation features include:

Sparse Aware implementation with automatic handling of missing data values.

Block Structure to support the parallelization of tree construction.

Continued Training so that you can further boost an already fitted model on
new data.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 21


Fine Tuning XGBoost Model
Tuning the model is the way to supercharge the model to increase their
performance.

Let us look into an example where there is a comparison between the untuned
XGBoost model and tuned XGBoost model based on their RMSE score.

Later, you will know about the description of the hyperparameters in XGBoost.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 22


Fine Tuning XGBoost Model
Untuned Tuned

Output: 34624.229980 Output: 29812.683594


Around 15% reduction in the RMSE score
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 23
XGBoost Hyperparameters
For each base learner of the XGBoost, there are different parameters that can
be tuned to increase the model performance.

There are a plethora of tuning parameters for tree-based learners in XGBoost


for building the model.

It is available at:
http://xgboost.readthedocs.io/en/latest/parameter.html#general-parameters

The most common ones are:

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 24


XGBoost Hyperparameters
learning rate: Gradient boosting involves creating and adding trees to the model sequentially.
New trees are created to correct the residual errors in the predictions from the existing
sequence of trees.
A problem with gradient boosted decision trees is that they are quick to learn and overfit
training data.
One effective way to slow down learning in the gradient boosting model is to use a learning
rate, also called shrinkage (or eta in XGBoost).
It affects how quickly the model fits the residual error using additional base learners.
Setting values less than 1.0 has the effect of making less corrections for each tree added to the
model. This in turn results in more trees that must be added to the model.
It is common to have small values in the range of 0.1 to 0.3, as well as values less than
0.1.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 25


XGBoost Hyperparameters
max_depth: It is a positive integer value and is responsible for how deep each tree
will grow during any boosting round.

Increasing this value will make the model more complex and more likely to
overfit.

XGBoost aggressively consumes memory when training a deep tree.

Default = 6.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 26


XGBoost Hyperparameters
subsample: The fraction of the training samples (randomly selected) that will be
used to train each tree.
It ranges from 0 to 1.
Setting it to 0.5 means that XGBoost would randomly sample half of the
training data prior to growing trees.
Lower values prevents overfitting, but too small values might lead to under-
fitting.
High values may lead to overfitting.
Default = 1.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 27


XGBoost Hyperparameters
colsample_by_tree: The fraction of Imagine we have a dataset with 16 features.
features (randomly selected) that
For simplicity let’s use 0.5.
will be used to train each tree.
Ranges from 0 to 1.
Default = 1.

For the first tree.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 28


XGBoost Hyperparameters
colsample_bylevel: The fraction of features (randomly
selected) that will be used in each node to train each tree.
Ranges from 0 to 1.
Default = 1.

This comes into play every time when we achieve the new
level of depth in a tree.

Before making any further splits, we take all the features


that are left after applying colsample_bytree and filter
them again using colsample_bylevel.

On the next level of depth, we repeat this step, so you get


different set of features on each level.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 29


XGBoost Hyperparameters
colsample_bynode: The fraction of features (randomly selected) that will be used for
each split.
Ranges from 0 to 1.
Default = 1.
Occurs once every time a new split is evaluated.
Features are subsampled from the set of features chosen for the current level.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 30


XGBoost Hyperparameters
Why do I need these parameters?
By limiting the number of features for building each tree we may end up with trees
that gained different insights from the data.

They learn how to optimise for the target variable using different set of features.

So, if you have enough data you can try tuning colsample parameters!

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 31


XGBoost Hyperparameters
n_estimators: Number of trees you want to build.

objective: Determines the loss function to be used like reg:linear for


regression problems, reg:logistic for classification problems with only
decision, binary:logistic for classification problems with probability.

max_depth: Maximum depth of a tree.


Increasing this value will make the model more complex and more likely to
overfit.
Default = 1.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 32


XGBoost Hyperparameters
gamma: Controls whether a given node will split based on the expected reduction
in loss after the split.
Minimum loss reduction required to make a further partition on a leaf node of
the tree.
A higher value leads to fewer splits.
The larger gamma is, the more conservative the algorithm will be.
range: [0, ∞].
Default = 0.
There is no “good Gamma” for any data set alone.
Gamma is dependent on both the training set and the other parameters you
use.
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 33
XGBoost Hyperparameters
alpha: L1 regularization on leaf weights.
A large value leads to more regularization.
Default = 0.

lambda: This is responsible for L2 regularization on leaf weights.


A large value leads to more regularization.
is smoother than L1 regularization.
Default = 1.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 34


XGBoost Hyperparameters
eta: Step size shrinkage used in update to prevents overfitting.
A problem with gradient boosted decision trees is that they are quick to learn and overfit
training data.
One effective way to slow down learning in the gradient boosting model is to use a
learning rate, also called shrinkage (or eta in XGBoost).
range: [0,1]
Default = 0.3.
to get the most of xgboost, eta must be set as low as possible. However, as eta gets
lower, you need many more steps (rounds) to get to the optimum:
Increasing eta makes computation faster (because you need to input less rounds) but
does not make reaching the best optimum.
Decreasing eta makes computation slower (because you need to input more rounds) but
makes easier reaching the best optimum.
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 35
Tuning eta

esv
gi
el ta
od of e
.
r m ue
tte val
a sing
be
r ea
nc
ei
Th
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 36
What Makes XGBoost so Popular?
Speed and performance.
Core algorithm is parallelizable.
Consistently outperforms single-algorithm methods.
State-of-the-art performance in many ML tasks.
Out-of-Core computing (large datasets that do not fit in memory).

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 37


When to Use XGBoost
You have a large number of training samples.
Greater than 1000 training samples and less 100 features.
The number of features < number of training samples.

You have a mixture of categorical and numeric features.


Or just numeric features.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 38


When Not to Use XGBoost
Image recognition.

Computer vision.

Natural language processing and understanding problems.

When the number of training samples is significantly smaller than the number
of features.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 39


Dimensionality Reduction

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 40


Curse of Dimensionality
Increasing the number of features will not always
improve classification accuracy.

In practice, the inclusion of more features might


actually lead to worse performance.
31 bins
It has been estimated that as the number of
dimensions increase, the number of training
examples required increases exponentially with
dimensionality d (i.e., kd).
32 bins

33 bins
k: number of bins per feature
Curse of Dimensionality
The curse of dimensionality is the phenomena
whereby an increase in the dimensionality of a data
set results in exponentially more data being required
to produce a representative sample of that data set.

The ability to generalize correctly becomes


exponentially harder as the dimensionality of the
training dataset grows, as the training set covers a
dwindling fraction of the input space.

Models also become more efficient as the reduced


feature set boosts learning rates and diminishes
computation costs by removing redundant features.

k: number of bins per feature


Curse of Dimensionality
With one dimension there are only 10 possible
positions. 10 datum are required to create a
representative sample which 'covers' the problem
space.
With two dimensions there are 10^2 = 100 possible
positions. 100 datum are required to create a
representative sample which 'covers' the problem
space.
With just three dimensions there are 10^3 = 1000
possible positions. 1000 datum are required to create
a representative sample which 'covers' the problem
space ...............
k: number of bins per feature
What is Dimensionality Reduction?
In machine learning classification problems, there are often too many factors on the
basis of which the final classification is done.
These factors are basically variables called features.
The higher the number of features, the harder it gets to visualize the training set
and then work on it.

Sometimes, most of these features are correlated, and hence redundant.


This is where dimensionality reduction algorithms come into play.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 44


What is Dimensionality Reduction?
Dimensionality reduction is the process of reducing the number of random variables
under consideration, by obtaining a set of principal variables.

It can be divided into feature selection and feature extraction.

Feature extraction: This reduces the data in a high dimensional space to a


lower dimension space, i.e., a space with lesser no. of dimensions.

Feature selection: Finding a subset of the original set of variables, or


features, to get a smaller subset which can be used to model the problem

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 45


Dimensionality Reduction
Feature extraction: finds a Feature selection:
set of new features (i.e., chooses a subset of
through some mapping f()) the original features.
from the existing features.
The mapping f() é x1 ù
êx ú
é x1 ù could be linear ê 2ú
êx ú é xi1 ù
or non-linear ê . ú ê ú
ê 2ú é y1 ù
ê . ú êy ú
ê ú ê xi2 ú
.
ê ú
. ê 2ú x=ê ú ® y=ê . ú
x=ê ú ¾¾¾
f (x)
®y = ê . ú ê . ú ê ú
ê . ú ê ú ê ú ê . ú
ê ú ê . ú ê . ú ê ú
ê . ú êë yK úû ê . ú ë xiK û
ê . ú ê ú
ê ú
K<<N ëê xN ûú K<<N
ëê xN ûú
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 46
Dimensionality Reduction
Feature Extraction Feature Selection
Independent component analysis Remove features with missing values
Principal component analysis Remove features with low variance
Isomap Remove highly correlated features
Autoencoder Univariate feature selection
Locally Linear Embedding Recursive feature elimination
t-distributed Stochastic Neighbor Feature selection using
Embedding SelectFromModel
………….. …………..

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 47


Principal Component Analysis (PCA)

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 48


What is PCA?
Principal Component Analysis (PCA) is an unsupervised,
non-parametric statistical technique primarily used for
dimensionality reduction in machine learning.
The goal of PCA is to reduce the dimensionality of the data
while retaining as much as possible of the variation present
in the original dataset.
High dimensionality means that the dataset has a large
number of features.
The primary problem associated with high-dimensionality in
the machine learning field is model overfitting, which
reduces the ability to generalize beyond the examples in the
training set.
à Curse of Dimensionality
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 49
What is PCA?
Curse of Dimensionality à “Many algorithms that work
fine in low dimensions become intractable when the
input is high-dimensional.”

The goal of PCA is to reduce the dimensionality (number


of variables) of the data by extracting important one from
a large pool while retaining as much as possible of the
variation present in the original dataset.

In other words, this method combines highly correlated


variables together to form a smaller number of an
artificial set of variables which is called “principal
components” that account for most variance in the data.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 50


What is PCA?
The goal of PCA is to reduce the dimensionality of the
data while retaining as much as possible of the variation
present in the original dataset.

PCA reduces the number of variables in your data by


extracting important one from a large pool with the aim
of retaining as much information as possible.

More formally, PCA is the identification of linear


combinations of highly correlated variables to form a
smaller number of an artificial set of variables (principal
components) that provide maximum variability within a
set of data.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 51


Principal Components
Principal components are new variables
that are constructed as linear combinations or
mixtures of the initial variables.

Geometrically speaking, principal


components represent the directions of the
data that explain a maximal amount of
variance, that is to say, the lines that capture
most information of the data.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 52


Principal Components
These combinations are done in such a way that
the new variables (i.e., principal components)
are uncorrelated and most of the information
within the initial variables is squeezed or
compressed into the first components.

So, the idea is 10-dimensional data gives you 10


principal components, but PCA tries to put
maximum possible information in the first
component, then maximum remaining
information in the second and so on, until having
something like shown in the scree plot below.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 53


Intuition
You want to predict what the GDP of the USA for 2017.
You have lots of information available:
👉 the U.S. GDP for the first quarter of 2017,
👉 the U.S. GDP for the entirety of 2016, 2015, and so on.
👉 Publicly-available economic indicator, like the unemployment rate, inflation rate, etc.
👉 U.S. Census data from 2010 estimating how many Americans work in each industry.
👉 American Community Survey data updating those estimates in between each census.
👉 How many members of the House and Senate belong to each political party.
👉 Stock price data, the number of IPOs occurring in a year, and
👉 how many CEOs seem to be mounting a bid for public office.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 54


Intuition
A lot of variables to consider.
But this can present problems.
Do you understand the relationships between each variable?
Do you have so many variables that you are in danger of overfitting your
model to your data?

You might ask the question, “How do you take all of the variables you have
collected and focus on only a few of them?”

In technical terms, you want to “reduce the dimension of your feature space.”

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 55


Intuition
By reducing the dimension of your feature space, you have fewer relationships
between variables to consider, and
You are less likely to overfit your model.

Ways:
Feature Elimination

Feature Extraction

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 56


Feature Elimination
Reducing the feature space by eliminating features.
GDP example: instead of considering every single variable, drop all variables
except the three you think will best predict what the U.S.’s GDP will look like.

Advantages: simplicity and maintaining interpretability of variables.

Disadvantage: gain no information from those variables you’ve dropped.


If you only use last year’s GDP, you will miss out on whatever the dropped
variables could contribute to the model.
By eliminating features, you’ve also entirely eliminated any benefits those
dropped variables would bring.
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 57
PCA
PCA (feature extraction), however, doesn’t run into this problem.
Suppose:
There are ten independent variables.
PCA will create ten “new” independent variables, where each “new”
independent variable is a combination of each of the ten “old” independent
variables.
However, these new independent variables are created in a specific way and
order these new variables by how well they predict our dependent variable.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 58


PCA
Where does the dimensionality reduction come into play?
It is possible keep as many of the new independent variables as anyone
want but can drop the “least important ones.”
The new variables are ordered by how well they predict the dependent variable, it
is possible to know which variable is the most important and least important.

But — and here’s the kicker — because these new independent variables are
combinations of our old ones,
It is still possible to keep the most valuable parts of our old variables,
even when one or more of these “new” variables are dropped!

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 59


PCA: Step by Step
Step 1: Standardize the dataset.
Step 2: Calculate the covariance matrix for the
Step 1: Standardization
features in the dataset.
Step 2: Covariance Matrix computation Step 3: Calculate the eigenvalues and
eigenvectors for the covariance matrix.
Step 3: Compute the Eigenvectors and Step 4: Sort eigenvalues and their
Eigenvalues of the Covariance Matrix to corresponding eigenvectors.
identify the Principal Components Step 5: Pick k eigenvalues and form a matrix
of eigenvectors.
Step 6: Transform the original matrix.

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 60


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 61


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 62


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 63


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 64


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 65


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 66


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 67


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 68


PCA: Algorithm

Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 69

You might also like