Professional Documents
Culture Documents
Aim:Reducing the number of variables in a dataset while retaining as much of the information as
possible.
In the case of image data, PCA can be used to identify the most important features in the images
and compress them into a smaller set of variables.
(1) Preprocess the images by flattening them into a vector of pixel values.
(2) Then, you would use PCA to identify the most important principal components of the
image data.
(3) Finally, you would reconstruct the images using only the most important principal
Consider the Dimension given as D.Let us minimize the dimension to a lower dimension M.
Subtracting the mean makes variance and covariance calculation easier by simplifying
their equations.
Step 3: Calculate the eigen vectors V and eigen values D of the covariance matrix.
Eigenvectors are plotted as diagonal dotted lines on the plot. (note: they are
One of the eigenvectors goes through the middle of the points, like drawing a line of
best fit.
The second eigenvector gives us the other, less important, pattern in the data.
The eigenvector with the highest eigenvalue is the principal component of the data set.
In our example, the eigenvector with the largest eigenvalue is the one that points down
4. Reduce overfitting,
Features of FS:
4. Improving learning efficiency, such as reducing storage requirements and computational cost.
The selection can be represented as a binary array, with each element corresponding to
the value 1, if the feature is currently selected by the algorithm and 0, if it does not occur.
There should be a total of 2 M subsets where M is the number of features of a data set.
Here M=3
Types:
1.Filter Method
2.Wrapper Method
(1) Filter Method:
The selection of features is independent of any machine learning algorithms.
Features are selected on the basis of their scores in various statistical tests.
1. Forward Selection:
We start with having no feature in the model.
In each iteration, we keep adding the feature which best improves our model.
2. Backward Elimination:
We start with all the features.
Remove the least significant feature at each iteration.
3. Bidirectional Generation:
We perform both FS and BE concurrently.
4. Random Generation:
It starts the search in a random direction.
The choice of adding or removing a features is a random decision.
Filter Wrapper