Professional Documents
Culture Documents
Introduction
• The number of input variables or features for a dataset
is referred to as its dimensionality.
• Dimensionality reduction refers to techniques that
reduce the number of input variables in a dataset.
• More input features often make a predictive modeling
task more challenging to model, more generally
referred to as the curse of dimensionality.
• High-dimensionality statistics and dimensionality
reduction techniques are often used for data
visualization. Nevertheless these techniques can be
used in applied machine learning to simplify a
classification or regression dataset in order to better fit
a predictive model.
What is Dimensionality Reduction?
In machine learning we are having too many
factors on which the final classification is done.
These factors are basically, known as variables.
The higher the number of features, the harder it
gets to visualize the training set and then work
on it. Sometimes, most of these features are
correlated, and hence redundant. This is where
dimensionality reduction algorithms come into
play.
Motivation
• When we deal with real problems and real
data we often deal with high dimensional
data that can go up to millions.
• In original high dimensional structure,
data represents itself. Although,
sometimes we need to reduce its
dimensionality.
• We need to reduce the dimensionality that
needs to associate with visualizations.
Although, that is not always the case.
Dimensionality Reduction Methods
The various methods used for dimensionality
reduction include:
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Generalized Discriminant Analysis (GDA)
• Factor Analysis(FA)
Principal Component Analysis
What Is Principal Component Analysis (PCA)?
Arguments
x: Matrix / data frame
center:a logical or numeric value, centring option if TRUE, centring by the mean if FALSE
no centring if a numeric vector, its length must be equal to the number of columns of
the data frame df and gives the decentring
scale:a logical value indicating whether the column vectors should be normed for the
row.w weighting