CS485 CW1

Coursework1: ML for Computer Vision
Duy-Thao DO Bartlomiej Grochowski

School of Computing, KAIST Department of Aerospace Engineering, KAIST
20235707 20236437
thaodo@kaist.ac.kr bgrochowski@kaist.ac.kr
I. E IGENFACES
A. By computing directly from the covariance matrix
Before doing PCA, we have to compute the mean (i.e. mean
face in the Eigenfaces). The result is illustrated as Fig.1.
Fig. 3: The eigenvalues chart.
Fig. 1: The mean image. a N × N matrix (N is number of sample). This is clearly

low-cost (low-dimensional) when N is small.
Compare with 1st approach, there are 416 non negative
Computing eigenvectors and eigenvalues directly from co-
eigenvalues out of N = 416 from the eigenvalue vector. After
variance matrix S = (1/N )AAT which is Dim×Dim matrix
sorting descending as the previous part, the Frobenius norm
gives us eigenvectors of Dim×Dim and eigenvalues of Dim
of the difference of 2 eigenvalue vectors (only first 416 values
dimension.
from the AAT method) is 2 × 10−9 which mean roughly 0.
Eigenvectors are then sorted correspondingly by their eigen-
Thus, eigenvalue vectors are nearly identical, however, the 2
values (as in Fig.3) so we can find principal components (PC).
corresponding eigenvectors are different due to the nature of
Five biggest PCs are visualized in the face form (which are
their shapes.
called as eigenfaces) in Fig.2. There are 1496 eigenvectors
Both approaches are mathematically equivalent but first
with non-zero eigenvalues (out of Dim = 2576 dimension)
method is more costly because only there are 416 samples
in training set < 2576 pixels per image (i.e. N < Dim
), but eigenvectors from 1st method is directly ready to be
visualized as Fig.2. Hereinafter, we use 1st PCA method to
extract features for images. On the other hand, the later method
is cost-efficient which fit for cases of less sample data and a
Fig. 2: The top 5 (like ghostly) eigenfaces. huge number of raw feature dimension such as medical data
in reality.
We decide to chose from 100 to 250 largest PCs for face C. Face Image Reconstruction
recognition tasks in later sections of this work because those Each face can be reconstructed as a linear combination of
vectors with too small eigenvalues have high similarities (as the best M eigenvectors (i.e. PCs, also called as bases) as the
in Fig.17) so it’s not really helpful or even harmful (non- slide 20 of Manifold Learning chapter.
sense ”feature” caused by face rotation or translation) for We pick 3 images (1st, 9th, 17th images from the train set)
classification task. to visualize how good the PCA-base reconstruction is in Fig.4.
The more PCs we use for reconstructing, the closer quality to
B. A Low-dimensional Computation Approach the original.
Computing eigenvectors and eigenvalues directly from co- We use a distortion measure as slide 26 of Quadratic
variance matrix S = (1/N )AAT which is Dim×Dim matrix Optimization chapter which is L2-norm of the difference
has a downside of high cost of computation. Thus, it’s not between the original and the reconstructed. The total average
practical or cost efficient if Dim × Dim is too big. This error is calculated for several M bases and visualized as the
approach is to calculate (1/N )AT A instead which only has chart Fig.5.
1
better results. Regarding number of top PCs, we think 100 to
150 are good enough to achieve the best result. Hereinafter,
we search other learning methods with considering of M only
from 120 to 150. We also visualize confusion matrix for the
best case of 63.46% accuracy (M=130, K=3 by Euclidean
distance) as in Fig.8. Some successful and failed cases by
this model are shown in Fig.7
Fig. 4: Some reconstructed images compared with the original

samples. Fig. 7: Successful and failed cases by the best KNN model.
Fig. 5: The reconstructed errors with M bases chart.
Fig. 8: Confusion matrix from the best NN model.

D. PCA-based Face Recognition
Each normalized training face ϕ is represented by its pro- In general, the memory cost by nearest neighbor classifiers
jections (or weights) ω as in step 8 of slide 20 from Manifold (NN) by different number of K (from 3 to 11) is not much
Learning chapter. By this way, we extract features (weights) different. This could be by the nature of NN doesn’t hold
from each image base on the largest M eigenvectors. weights, classify directly by train set. Similar results for time-
We then use these features as input for a k-nearest neigh- lapse difference. In total, it takes around 420MB for the whole
bor (KNN) classifier to do face recognition task. There are script; nearly 0 ms for training and 78 ms for test set.
some configurations we consider here: K number of nearest II. I NCREMENTAL PCA
neighbors, top M PCs, type of distances. To be detail, M is in
range from 100 to 250 by skip 10 (i.e. 100, 110, .. 250); K is To begin with Incremental PCA, we split the training and
in range from 3 to 11 by skip 2 (i.e. 3, 5, ..) and 3 distances testing data into 4 equal subsets. Then, we converted every
(Manhattan, Cosine, Euclidean). The recognition accuracy by subset to a an eigenspace model {µ µi , Ni , P i , Λ i }i=1,2,3,4 and
these configurations are visualized in Fig.6. merged the models one by one to form the model of the entire
training data. The resulting eigenfaces look similar (ghostly)
to the ones present in Fig.2, and are shown in Fig.18.
The training time for Batch PCA (using 2nd way) was
2.0238s, the PCA trained by 1st subset took 1.8773s, and
the Incremental PCA took 2.4851s. The training time values
include the time from obtaining mean image to training data
projection on the A vector. In case of Incremental PCA, this
included computing all the eigenspace models, merging them
and projecting on the A vector as well. The relatively small
differences in the training times are a result of the eigen-
Fig. 6: Classification accuracy results by KNN with M[100, decomposition of the covariance matrices, which takes the
250], K[3,11]. significant amount of time for all the algorithms. Incremental
PCA does not necessarily decrease the total learning time, but
From the Fig.6, it’s better to keep a low number of nearest its advantage is efficient merging of more data into existing
neighbors (3 or 5), also Cosine and Euclidean distance gave trained model.
2
To compare the reconstruction abilities of the three models, and Manhattan, Cosine and Euclidean distance, the average
we used M=130 bases in all the following reconstructions. The accuracy was 81.92%, already a big improvement. The highest
resulting faces (first and last from the dataset) of these models average (and the highest accuracy) was achieved with Cosine
are shown in Fig.9. distance, so only this metric will be further considered. The
accuracies for k where the average accuracy is highest and
where the highest accuracy lies are shown below:
Fig. 11: Classification accuracy results for cosine, K[7, 11],

Mpca [100, 250], Mlda [10,100].
The highest accuracy was achieved for K = 11, Mpca = 190,

Fig. 9: Comparison of reconstructions of the first (top) and Mlda = 60 and Cosine metric. The confusion matrix together
last (bottom) image. with example success and failure cases are shown below. The
rank of the within-class scatter matrix for this case was 190
The reconstruction does not differ significantly between (=Mpca ), wheras the rank of the between-class scatter matrix:
Batch and Incremental PCA. As expected, the PCA based on 51 (=C-1, the highest possible).
the first subset performs better on reconstruting the images
from its training data, but fails on reconstructing other images.
This is also represented in the mean reconstruction error for
each model, where Batch PCA achieved 442.27, 1st subset
PCA had a value of 914.54, and Incremental PCA - 584.88
(rounded to 2 decimal places).
For comparison of face recognition accuracy, we again used
KNN classification method with K=3 and Euclidean distance
as it performed best in the previous chapter. The results are
the accuracy values of 63.46% for Batch PCA, 19.23% for 1st
subset PCA, and 61.54% for Incremental PCA. The confusion
matrices of the latter 2 methods are shown in Fig.10. Fig. 12: Confusion matrix for PCA-LDA.
Fig. 13: Example recognitions for the best PCA-LDA model.
B. PCA-LDA Ensemble
(a) First subset PCA. (b) Incremental PCA. Using the above method, the accuracy might be improved
Fig. 10: Confusion matrices. even more using ensemble learning. To limit the compuational
parameters for optimization, the used metric will be cosine
III. LDA E NSEMBLE FOR FACE R ECOGNITION only, and Mpca will be fixed to the previously optimal 190.
The variables thus will be: number of base models T [10, 150,
A. PCA-LDA step 10], randomness parameter ratio Nρ [0.1, 1.0 step 0.05],
LDA Face Recognition works by applying LDA to PCA random features ratio [0.1, 1.0 step 0.05] and fusion method.
data, and using KNN classifier. The parameters for this method Mlda [10, 100] and K[3, 19] will be chosen randomly for each
are Mpca , Mlda , K neighbors and distance metrics. Fisherfaces base, together with learning samples and dropped features. The
for this method are shown in Fig.19. selected fusion methods are majority, average and max (as min
Upon iterating this method for Mpca from 100 to 250 skip and product would not work due to 0 probabilities for some
10, Mlda from 10 to 100 skip 10, K from 1 to 19 skip 2, classes using KNN bases).
3
Upon iteration through all the parameters, three optimal trees, degree of randomness and type of weak learners; also the
solutions with accuracy equal to 97.1% became apparent: top M PCs. We consider number of trees nT [100 to 400] (skip
[T=100, rpr=1, rfr=0.7, fusion=’average’], [T=100, rpr=0.95, 20); max depth of tree dt [5 to 15]; for randomness, random
rfr=0.6, fusion=’average’], and [T=120, rpr=0.9, rfr=0.65, fu- seed [0,1] and max number
√ of features (let nF is full number)
sion=’majority’]. It is worth noting that these results are based to consider split are nF and 0.5nF; and axis-aligned, 2-pixel
on pseudo-random choices of dropout and sample selection, test are 2 types of weaker learner. To keep the report brief,
but based on three close-by results the following control we fixed Gini impurity to measure the quality of a split.
variables will be considered: T=100, rpr=0.95, rfr=0.65, and With axis-aligned learners, RF gave the best result of 75%
average fusion. √ at the configure M=100, nT = 360, dt = 10, seed=0
accuracy
Next iteration using only this model again yielded the and nF . Confusion matrix this case is illustrated in Fig.16.
accuracy of 97.1%. Its error of the committee machine was Some test cases by this are also shown in Fig.15. Duration of
0.0288, while the average error of individual models was training is 3147 ms while testing time is 35 ms for whole set.
0.5368, higher by a factor of 18.6. The confusion matrix of
this ensemble is shown in Fig.14:
Fig. 15: Example cases from the best (axis-aligned) RF model.
Fig. 14: Confusion matrix for PCA-LDA Ensemble.
IV. G EN & D ISCRIMINATIVE S UBSPACE L EARNING

From surveying, we found some methods that learn sub-
space both (or balance) reconstruction and discriminative
features: Generative Adversarial Net (GAN) [3], Partial Least
Squares Discriminant Analysis (PLS-DA) [2]. We pick PLS- Fig. 16: Confusion matrix from the best RF (PCA-based)
DA to learn latent space to maximise covariance between input model
set X and the class membership matrix Y , thus, formulate
mathematically the objective as below (same as [1]) With 2-pixel test, since the scikit-learn library is limited
so we convert the input (dim=M) to pair subtractions (thus,
C
X there are C(M,2) combinations) as slide 24 of the RF chapter
J = max [cov(Xω, Yc )]2 before training. Experimental results show √ the best case at
ω T ω=1
c=1 M=100, nT = 280, dt = 15, seed=0 and nF . This best
where cov(.., ..) denotes covariance function, C: number configure gives 69.23% test accuracy. It however costs 19531
of classes, ω: a weight vector for constructing the PLS ms for training and 30 ms for testing. (conf matrix and samples
(1) (C)
components t = Xω, X = [X , ..., X ] is the centred omitted for short)
sample matrix. After a few equivalent transformations, the We noticed during tuning RF (both axis-aligned and 2-pixel
objective becomes: test), we found that if we set max number of features > 0.4nF,
the accuracy hardly get over 60%. Also with M > 130, none
T of configures reach 70%. Compare with the best 2-pixel test
J = ω T X ŠX ω
learner RF, the best axis-aligned based gives better accuracy
where Š = diag(Š (1) , ..., Š (C) ), Š (C) is an nC ×nC matrix and faster in training, just slightly slower in inference time.
with entries all equal to 1/(n − 1)2 . The solution for this On the other hand, compare with KNN in Q1, ensemble
T
objective is to find eigenvector of Seb = X ŠX . model like RF classifier here shows superior accuracy result
(75% vs 63.46%), faster inference time (30ms vs 78ms) but
V. R ANDOM F OREST CLASSIFIER a price of considerable training duration. Compare to Q3, the
Similarly to Q1, we leverage Eigenfaces (PCA) method LDA-based demonstrated superior over this PCA-based RF by
to extract features. There are some hyper parameters in this a major gap (29% gain) which prove that LDA extracts better
Random Forest (RF) classifier: number of trees, max depth of discriminative features.
4
R EFERENCES
[1] Muhammad Aminu and Noor Atinah Ahmad. Locality preserving partial
least squares discriminant analysis for face recognition. Journal of King
Saud University - Computer and Information Sciences, 34(2):153–164,
2022.
[2] Anne-Laure Boulesteix and Korbinian Strimmer. Partial least squares: a
versatile tool for the analysis of high-dimensional genomic data. Briefings
in bioinformatics, 8(1):32–44, 2007.
[3] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gener-
ative adversarial nets. Advances in neural information processing systems,
27, 2014.
A PPENDIX A
F IRST A PPENDIX
Fig. 17: The 301st to 305th eigenfaces.
Fig. 18: The top 5 eigenfaces of PCA of the first batch (top)
and Incremental PCA (bottom).
Fig. 19: The top 5 Fisherfaces for Mpca = 100.

CS485 CW1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS485 CW1

Uploaded by

Copyright:

Available Formats

Coursework1: ML for Computer Vision

Duy-Thao DO Bartlomiej Grochowski

Fig. 3: The eigenvalues chart.

Fig. 1: The mean image. a N × N matrix (N is number of sample). This is clearly

Fig. 4: Some reconstructed images compared with the original

Fig. 5: The reconstructed errors with M bases chart.

Fig. 8: Confusion matrix from the best NN model.

Fig. 11: Classification accuracy results for cosine, K[7, 11],

The highest accuracy was achieved for K = 11, Mpca = 190,

Fig. 13: Example recognitions for the best PCA-LDA model.

Fig. 15: Example cases from the best (axis-aligned) RF model.

Fig. 14: Confusion matrix for PCA-LDA Ensemble.

IV. G EN & D ISCRIMINATIVE S UBSPACE L EARNING

Fig. 17: The 301st to 305th eigenfaces.

Fig. 19: The top 5 Fisherfaces for Mpca = 100.

You might also like