Professional Documents
Culture Documents
Then the 2n×2n covariance matrix S can be calculated First, we assume that the projected new features have
by zero mean:
N
N 1 X
1 X φ(xi ) = 0. (16)
S= dxi dxT
i . (10) N i=1
N i=1
The covariance matrix of the projected features is M ×
Now we perform PCA on S: M , calculated by
Spk = λk pk , (11) N
1 X
C= φ(xi )φ(xi )T . (17)
where pk is the eigenvector of S corresponding to the N i=1
kth largest eigenvalue λk , and
Its eigenvalues and eigenvectors are given by
pT
k pk = 1. (12)
Cvk = λk vk , (18)
Let P be the matrix of the first t eigenvectors:
where k = 1, 2, · · · , M . From Eq. (17) and Eq. (18),
we have
P = [p1 , p2 , · · · , pt ]. (13)
N
1 X
Then we can approximate a shape in the training set φ(xi ){φ(xi )T vk } = λk vk , (19)
by N i=1
x = x̄ + Pb, (14)
which can be rewritten as
T
where b = [b1 , b2 , · · · , bt ] is the vector of weights for N
X
different deformation patterns. By varying the param- vk = aki φ(xi ). (20)
eters bk , we can generate new examples of the shape. i=1
We can also limit each bk to constrain the deformation
patterns of the shape. Typical limits are Now by substituting vk in Eq. (19) with Eq. (20), we
have
p p
− 3 λ k ≤ bk ≤ 3 λ k , (15) N N N
1 X X X
φ(xi )φ(xi )T akj φ(xj ) = λk aki φ(xi ).
where k = 1, 2, · · · , t. Another important issue of N i=1 j=1 i=1
ASMs is how to search for the shape in new images (21)
using point distribution models. This problem is be- If we define the kernel function
yond the scope of our paper, and here we only focus
on the statistical model itself. κ(xi , xj ) = φ(xi )T φ(xj ), (22)
Kernel PCA and its Applications
and multiply both sides of Eq. (21) by φ(xl )T , we have 2.2. Reconstructing Pre-Images
N N N
1 X X X So far, we have discussed how to generate new fea-
κ(xl , xi ) akj κ(xi , xj ) = λk aki κ(xl , xi ). tures yk (x) using kernel PCA. This is enough for ap-
N i=1 j=1 i=1
plications such as feature extraction and data classifi-
(23)
cation. However, for some other applications, we need
We can use the matrix notation
to approximately reconstruct the pre-images {xi } from
K2 ak = λk N Kak , (24) the kernel PCA features {yi }. This is the case in ac-
where tive shape models, where we not only need to use PCA
Ki,j = κ(xi , xj ), (25) features to describe the deformation patterns, but also
have to reconstruct the shapes from the PCA features
and ak is the N -dimensional column vector of aki : (Romdhani et al., 1999; Twining & Taylor, 2001).
ak = [ ak1 ak2 · · · akN ]T . (26)
In standard PCA, the pre-image xi can simply be ap-
ak can be solved by proximated by Eq. (6). However, Eq. (6) cannot be
Kak = λk N ak , (27) used for kernel PCA (Gökhan H. Bakır et al., 2004).
For kernel PCA, we define a projection operator Pm
and the resulting kernel principal components can be which projects φ(x) to its approximation
calculated using
m
N X
Pm φ(x) = yk (x)vk , (33)
X
T
yk (x) = φ(x) vk = aki κ(x, xi ). (28)
k=1
i=1
If the projected dataset {φ(xi )} does not have zero where vk is the eigenvector of the C matrix, which
is define by Eq. (17). If m is large enough, we have
mean, we can use the Gram matrix K e to substitute
Pm φ(x) ≈ φ(x). Since finding the exact pre-image x
the kernel matrix K. The Gram matrix is given by
is difficult, we turn to find an approximation z such
Ke = K − 1N K − K1N + 1N K1N , (29) that
where 1N is the N × N matrix with all elements equal φ(z) ≈ Pm φ(x). (34)
to 1/N (Bishop, 2006).
This can be approximated by minimizing
The power of kernel methods is that we do not have
to compute φ(xi ) explicitly. We can directly con- ρ(z) = kφ(z) − Pm φ(x)k2 . (35)
struct the kernel matrix from the training data set
{xi } (Weinberger et al., 2004). Two commonly used
kernels are the polynomial kernel 2.3. Pre-Images for Gaussian Kernels
κ(x, y) = (xT y)d , (30) There are some existing techniques to compute z for
specific kernels (Mika et al., 1999). For a Gaussian
or kernel κ(x, y) = exp −kx − yk2 /2σ 2 , z should sat-
κ(x, y) = (xT y + c)d , (31) isfy
where c > 0 is a constant, and the Gaussian kernel N
γi exp −kz − xi k2 /2σ 2 xi
P
κ(x, y) = exp −kx − yk2 /2σ 2
(32)
z = i=1N , (36)
with parameter σ. P 2 2
γi exp (−kz − xi k /2σ )
i=1
The standard steps of kernel PCA dimensionality re-
duction can be summarized as: where
m
X
1. Construct the kernel matrix K from the training γi = yk aki . (37)
data set {xi } using Eq. (25). k=1
3. Experiments
class 1
In this section, we show the setup and results of our class 2
three experiments. The first two experiments are clas-
sification problems without pre-image reconstruction.
The third experiment combines active shape models 100
z
Before we work on real data, we would like to gener- −20
20
Table 1. Classification error rates on training data and
x 10 polynomial kernel PCA
8 testing data for standard PCA and Gaussian kernel PCA
class 1
class 2 with σ = 22546.
6
4
Error Rate PCA Kernel PCA
2
Training Data 8.82% 6.86%
Testing Data 23.08% 11.54%
0
−2 are aligned, and each image has 168 × 192 pixels. Ex-
ample images of the Yale Face Database B are shown
−4
in Figure 5.
−6
−8
−1.5 −1 −0.5 0 0.5 1 1.5
21
x 10
class 1
2 class 2
1
Figure 5. Example images from the Yale Face Database
0
B.
−1
3.2.2. Classification Results
−2
We use the 168 × 192 pixel intensities as the original
−3 features for each image, thus the original feature vec-
tor is 32256-dimensional. Then we use standard PCA
−4
and Gaussian kernel PCA to extract the 9 most signif-
−5 icant features from the training data, and record the
eigenvectors.
−6
3.3. Kernel PCA-Based Active Shape Models Figure 7, Figure 8 and Figure 9, respectively. The face
is drawn by using line segments to represent eye brows,
In ASMs, the shape of an object is described with
eyes, the nose, using circles to represent eye balls, us-
point distribution models, and standard PCA is used
ing a quadrilateral to represent the mouth, and fitting
to extract the principal deformation patterns from the
a parabola to represent the contour of the face.
shape vectors {xi }. If we use kernel PCA instead of
standard PCA here, it is promising that we will be
able to discover more hidden deformation patterns.
0 0
3.3.1. Data Description
0.5 0.5
0 0
0.5 0.5
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0 0
0 0
3.3.2. Experimental Results
0.5 0.5
In our work, we first normalize all the shape vectors by
restricting both the x coordinates and y coordinates in 1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
the range [0, 1]. Then we perform standard PCA and
0 0
Gaussian kernel PCA on the normalized shape vectors.
For standard PCA, the reconstruction of the shape is 0.5 0.5
given by 1 1
x = x̄ + Pb. (39) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0 0 0 0
1 1 1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0 0 0 0
1 1 1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0 0 0 0
1 1 1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 9. The effect of varying the third PCA feature for Figure 11. The effect of varying the second Gaussian ker-
ASM. nel PCA feature for ASM.
the third Gaussian kernel PCA feature are shown in 0.5 0.5
0 0
0.5 0.5
0 0
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.5 0.5
0 0
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.5 0.5
0 0
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.5 0.5
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 12. The effect of varying the third Gaussian kernel
0 0
PCA feature for ASM.
0.5 0.5
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
4. Discussions
In this section, we address two concerns: First, how
Figure 10. The effect of varying the first Gaussian kernel do we select the parameters for Gaussian kernel PCA;
PCA feature for ASM. Second, what is the intuitive explanation of Gaussian
kernel PCA.
By observation, we can see that the first PCA feature
4.1. Parameter Selection
affects the orientation of the human face (left or right),
and the second PCA feature to some extent determines Parameter selection for kernel PCA directly deter-
some microexpression from amazement to calmness of mines the performance of the algorithm. For Gaussian
the human face. In contrast, the Gaussian kernel PCA kernel PCA, the most important parameter is the σ in
features seem to be determining some very different the kernel function defined by Eq. (32). The Gaussian
microexpressions than PCA features. kernel is a function of the distance kx − yk between
Kernel PCA and its Applications
two vectors x and y. Ideally, if we want to separate However, we indicate that there are techniques to find
different classes in the new feature space, then the pa- more powerful kernel matrices by learning-based meth-
rameter σ shoud be smaller than inter-class distances, ods (Weinberger et al., 2004; 2005).
and larger than inner-class distances. However, we
don’t know how many classes are there in the data, 5. Conclusion
thus it is not easy to estimate the inter-class or inner
class distances. Alternatively, we can set σ to a small In this paper, we discussed the theories of PCA, ker-
value to capture only the neighborhood information of nel PCA and ASMs. Then we focused on the pre-
each data point. For this purpose, for each data point image reconstruction for Gaussian kernel PCA, and
xi , let the distance from xi to its nearest neighbor be used this technique to design kernel PCA-based ASMs.
dNN
i . In our experiments, we use this parameter selec- We tested kernel PCA dimensionality reduction on
tion strategy: synthetic data and human face images, and found
that Gaussian kernel PCA succeeded in revealing more
σ = 5 · mean (dNN
i ). (41)
i complicated structures of data than standard PCA,
and achieving much lower classification error rates.
This strategy, in our experiments, ensures that the σ
We also implemented the Gaussian kernel PCA-based
is large enough to capture neighboring data points,
ASM and tested it on human face images. We found
and is much smaller than inter-class distances. When
that Gaussian kernel PCA-based ASMs are promis-
using different datasets, this strategy may need modi-
ing in providing more deformation patterns than tradi-
fications.
tional ASMs. A potential application is that we could
For the pre-image reconstruction of Gaussian kernel combine traditional ASMs and Gaussian kernel PCA-
PCA, the initial guess z0 will determine whether the based ASMs for microexpression recognition on human
iterative algorithm (38) converges. We can always sim- face images. Besides, we proposed a parameter selec-
ply use the mean of the training data as the initial tion method to find the proper parameters for Gaus-
guess: sian kernel PCA, which works well in our experiments.
z0 = mean (xi ). (42)
i
Acknowledgment
4.2. Intuitive Explanation of Gaussian Kernel
PCA The author would like to thank Johan Van
Horebeek (horebeek@cimat.mx) and Flor Martinez
We can see that in our synthetic data classification (flower10@cimat.mx) from CIMAT for reviewing our
experiment, Gaussian kernel PCA with a properly code and reporting the bugs.
selected parameter σ can perfectly separate the two
classes in an unsupervised manner, which is impos-
sible for standard PCA. In the human face images
References
classification experiment, Gaussian kernel PCA has a Bishop, Christopher M. (ed.). Pattern Recognition
lower training error rate and a much lower testing er- and Machine Learning. Springer, Cambridge, U.K.,
ror rate than standard PCA. From these two experi- 2006.
ments, we can see that Gaussian kernel PCA reveals
more complex hidden structures of the data than stan- Cootes, Timothy F, Taylor, Christopher J, Cooper,
dard PCA. An intuitive understanding of the Gaussian David H, and Graham, Jim. Active shape models
kernel PCA is that it makes use of the distances be- — their training and application. Computer vision
tween different training data points, which is like k- and image understanding, 61(1):38–59, 1995.
nearest neighbor or clustering methods. With a well Demers, David and Y, Garrison Cottrell. Non-linear
selected σ, Gaussian kernel PCA will have a proper dimensionality reduction. In Advances in Neural In-
capture range, which will enhance the connection be- formation Processing Systems 5, pp. 580–587. Mor-
tween data points that are close to each other in the gan Kaufmann, 1993.
original feature space. Then by applying eigenvector
analysis, the eigenvectors will describe the directions Georghiades, A.S., Belhumeur, P.N., and Kriegman,
in a high-dimensional space in which different clusters D.J. From few to many: Illumination cone mod-
of data are scattered to the greatest extent. els for face recognition under variable lighting and
pose. IEEE Transactions on Pattern Analysis and
In this paper we are mostly using Gaussian kernel for Machine Intelligence, 23(6):643–660, 2001.
kernel PCA. This is because it is intuitive, easy to im-
plement, and possible to reconstruct the pre-images. Gökhan H. Bakır, Weston, Jason, and Schölkopf,
Kernel PCA and its Applications