You are on page 1of 28

# Dimensionality Reduction

Lecturer:

Javier Hernandez Rivera

30th September 2010 MAS 622J/1.126J: Pattern Recognition & Analysis
"A man's mind, stretched by new ideas, may never return to it's original Oliver Wendell Holmes Jr. dimensions"

Outline

Before
    

Linear Algebra Probability Likelihood Ratio ROC ML/MAP Accuracy, Dimensions & Overfitting (DHS 3.7) Principal Component Analysis (DHS 3.8.1) Fisher Linear Discriminant/LDA (DHS 3.8.2) Other Component Analysis Algorithms

Today
   
2

Accuracy. Dimensions & Overfitting  Classification accuracy depends upon the dimensionality? if  Complexity (computational/temporal) Big O notation O(c) < O(log n) < O(n) < O(n log n) < O(n^2) < O(n^k) < O(2^n) < O(n!) < O(n^n) Training Testing 3 .

Overfitting Regression Classification Error Overfitting Complexity 4 .

Dimensionality Reduction  Optimal data representation 2D: Evaluating students’ performance # study hours # study hours # As Unnecessary complexity 5 # As .

Dimensionality Reduction  Optimal data representation  3D: Find the most informative point of view 6 .

n projected data WT 2 Rk.d k basis of d dim 7 . n samples xij dim i of sample j Y 2 Rk.n d dim.Principal Component Analysis  Assumptions for new basis:    Large variance has important structure Linear projection Orthogonal basis T wj wi = 1 i = j T wj wi = 0 i 6= j Y = WT X X 2 Rd.

Principal Component Analysis  Optimal W? 1. 8 Maximize projected variance (Hotelling. 1933) . Minimize projection cost (Pearson. 1901) minkx ¡ x ^k 2.

Unit variance: xij = ¾ 3. Zero mean: xij = xij ¡ ¹i Pn 1 ¾i = n¡1 j =1 (xij ¡ ¹i )2 x 2. Compute eigenvectors of ∑: WT 5. Covariance matrix: § = XXT 4. Project X onto the k principal components ij j Y kxn = k<d WT kx d x X dx n 9 .Principal Component Analysis Covariance Algorithm Pn 1 ¹i = n j =1 xij 1.

9 or 0.95) n ¸ i=1 i 14% of Energy Normalized eigenvalue of 3rd PC There are d eigenvectors Choose first p eigenvectors based on their eigenvalues Final data has only k dimensions 10 .Principal Component Analysis  How many PCs? Y = WX kx n k x d dx n Pk ¸i =1 Pi > Threshold (0.

PCA Applications: EigenFaces X dxn Images from faces: • Aligned • Gray scale • Same size • Same lighting eig vect (X XT) W dxd eigenFaces d = # pixels/image n = # images Sirovich and Kirby. 1987 Matthew Turk and Alex Pentland. 1991 11 .

PCA Applications: EigenFaces Faces EigenFaces   Eigenfaces are standardized face ingredients A human face may be considered to be a combination of these standard faces 12 .

PCA Applications: EigenFaces Dimensionality Reduction W x x Outlier Detection argmaxckycar ¡ yc k > µ2 ycar Classification argminckynew ¡ yc k < µ1 ynew 13 .

PCA Applications: EigenFaces Reconstruction x = y1 w1 + y2 w2 + … + yk wk Reconstructed images (adding 8 PC at each step) Reconstructing with partial information 14 .

000 x 10.k are k eigenvectors of XTX 15 .Principal Component Analysis  Any problems? Let’s define a samples as an image of100x100 pixels d = 10.000  X XT is 10.000  O(d2d3) computing eigenvectors becomes infeasible 100 px 100 px  Possible solution (just if n<<d) Use WT = (XE)T where E є Rn.

Project data Y kxn = W kxd x X dx n k <= n < d Why W = U? 16 . Compute SVD of X: X = U§VT 4. Projection matrix: W = U 5. Unit Variance: n¡1 ¾j 3. Zero Mean: xij = xij ¡ ¹i Pn xij 1 2 ¾ = ( x ¡ ¹ ) x = i ij i ij j =1 2.Principal Component Analysis SVD Algorithm Pn 1 ¹i = n j =1 xij 1.

17 .Principal Component Analysis X dx n = U dx d x S dx n x VT nxn Data Columns are data points Right Singular Vectors Columns are eigenvectors of XXT Singular Values Diagonal matrix of sorted values Left Singular Vectors Rows are eigenvectors of XTX MATLAB: [U S V] = svd(A).

Hernandez.PCA Applications: Action Unit Recognition T. 2009 18 . J. Simon.

g.html 19 .Applications        Visualize high dimensional data (e. AnalogySpace) Find hidden relationships (e.mcgill.g.cs. topics in articles) Compress information Avoid redundancy Reduce algorithm complexity Outlier detection Denoising Demos: http://www.ca/~sqrt/dimr/dimreduction.

PCA for pattern recognition Principal Component Analysis • higher variance • bad for discriminability Fisher Linear Discriminant Linear Discriminant Analysis • smaller variance • good discriminability 20 .

Linear Discriminant Analysis  Assumptions for new basis:   y = wT x Maximize distance between projected class means Minimize projected class variance 21 .

Compute T y = w x 3.Linear Discriminant Analysis Objective argmaxw J (w) = mi = 1 ni wT SB w wT SW w x2Ci SW = SB = (m2 ¡ m1)(m2 ¡ m1)T P2 P j P x T x2Cj (x ¡ mj )(x ¡ mj ) Algorithm 1. Compute class means ¡1 w = SW (m2 ¡ m1 ) 2. Project data 22 .

23 .PCA vs LDA   PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional space as possible. LDA: Perform dimensionality reduction while preserving as much of the class discriminatory information as possible.

McDuff.Other Component Analysis Algorithms  Independent Component Analysis (ICA) PCA: uncorrelated PCs ICA: independent PCs Blind source separation M. Poh. Picard 2010 Recall: independent  uncorrelated uncorrelated X independent 24 . D. and R.

Independent Component Analysis Sound Source 1 Mixture 1 Output 1 Sound Source 2 Mixture 2 I C A Output 2 Output 3 Sound Source 3 25 Mixture 3 .

Other Component Analysis Algorithms  Curvilinear Component Analysis (CCA) PCA/LDA: linear projection CCA: non-linear projection while preserving proximity between points 26 .

What you should know?  Why dimensionality reduction?  PCA/LDA assumptions  PCA/LDA algorithms  PCA vs LDA  Applications  Possible extensions 27 .

M. Turk and A. J. Martinez. Kak .Recommended Readings  Tutorials     A tutorial on PCA. Belhumeur. Kriegman PCA versus LDA. Swets. A. Weng Eigenfaces vs. and D. A. Smith Fisher Linear Discriminat Analysis.edu/~cdecoro/eigenfaces/) C.princeton. J. L. Hespanha. M. DeCoro  Publications     28 Eigenfaces for Recognition. Welling Eigenfaces (http://www. J. Fisherfaces: Recognition Using Class Specific Linear Projection.cs. Shlens A tutorial on PCA. Pentland Using Discriminant Eigenfeatures for Image Retrieval. D. P.