Appendix II: Derivation of

Principal Component
Analysis

2.1

Eigenvectors of the Covariance Matrix Derivation

2.1.1

Principal Component Feature Extraction

Consider a random vector x ∈ ℜ

n

for which a basis u is to be determined such that an approxi-

mation of x can be obtained by a linear combination of m orthogonal basis vectors:

xˆ = y 1 u 1 + y 2 u 2 + … + y m u m

[154]

where m < n and y j is the weighting coefficient of basis vector u j formed by taking the inner
product of x with u j :
T

yj = x uj .

[155]

By forming a k × n observation matrix X , the rows of which are the observations x k , we can
express Equation 154 and Equation 155 respectively in the following form:
T

ˆ = YU T
X
m

[156]

Y m = XU m

[157]

where U m is an n × m matrix whose columns are an uncorrelated basis for X , and Y m is a

k × m matrix whose columns are the coefficients for each column vector in U m . By the orthog157

then. We refer to Equation 156 as the PCAfeature re-synthesis equation and Equation 157 as the PCA-feature projection equation where the columns of Y m correspond to the estimated features and the columns of U m are the projection vectors which perform the linear transformation of the input to the new uncorrelated basis. U j = 1 . The problem.e. is defined as the element-wise operation: ∞ E[ψ(x)] = ∫ ψ ( x ) px ( x ) dx [161] –∞ where p x ( x ) is the probability density function of the random variable x . for deriving a PCA is to obtain the matrix U m such that the residual error in approximating x with xˆ is minimized: n ε = x – xˆ = ∑ y ju j [159] j = m+1 that is. say ψ ( x ) . A suitable criteria is the minimization of the expectation of the mean-square residual error: 2 2 ξ = E [ ε ] = E [ x – xˆ ] [160] where the expectation of an arbitrary function of x . i. the expansion of all the unused features results in a minimal signal which is the residual error ε . Since the expectation operator is linear and due to the condition that the columns of U m are orthonormal it follows that: 158 . it follows that: T Um Um = Im [158] where I m is a m × m diagonal matrix with unit entries.Eigenvectors of the Covariance Matrix Derivation onality of U m and by the imposition of an additional constraint that the columns of U m have unit norm.

[166] Thus. n ∂u j .Eigenvectors of the Covariance Matrix Derivation n n   T  ζ = E [ ε ε ] = E  ∑ y i u i   ∑ y j u j i = m + 1 j = m + 1  T n ∑ = 2 E[y j ] [162] j = m+1 from which it follows that: 2 E[y j ] T = E [ ( u j x ) ( x u j ) ] = u j E [ xx ]u j = u j Ru j T T T T [163] where R is the correlation matrix for x . [165] It is well known that the solutions to this equation constitute the eigenvectors of the correlation matrix R . …. we can re-express the residual error as the sum of the eigenvalues of the unused portion of the basis: n ζ = ∑ λj [167] j = m+1 159 . j = m + 1. This is a necessary condition for the minimum and gives the characteristic equation: ∂ ζ' = 2 ( Ru j – λju j ) = 0. the problem is equivalent to finding the eigenvectors of the covariance matrix Q x . It is also worth noting that the correlation matrix is related to the covariance matrix by the following expression: T Q x = R x – mm . for zero-mean or centered data. Since the columns of U are now determined to be the eigenvectors. Now by substitution of Equation 163 into Equation 162 we arrive at the quantity to be minimized: n ∑ ζ = T u j Ru j [164] j = m+1 We can express the minimization as a multi-variate differential equation by using a set of Lagrangian multipliers λ j and setting the derivative of the expectation of the mean-square error with respect to the basis components u j to zero.

Eigenvectors of the Covariance Matrix Derivation and the solution to the minimization reduces to ordering the basis vectors u j such that the columns with the smallest eigenvalues occur in the unused portion of the basis which also implies that the m columns of U m which are used for reconstruction should comprise the eigenvectors with the m largest eigenvalues. Since the eigenvalues form the diagonal elements of the covariance of the transformed data with all other elements equal to zero we can express the solution to the eigenvalue decomposition as a diagonalization of the input covariance matrix Q x . 160 . Now that we have arrived at the form of the solution for optimal orthonormal basis reconstruction (in the square-error sense) we must find a general form for representing the solution.