35 views

Uploaded by Shiv Ganesh

Approximation techniques for Kernel matrices.

- Examining Social Media and Higher Education: An Empirical Study on Rate My Professors.com and Its Impact on College Students
- pls_dis
- 05458324
- 7581Meij
- chap52
- 1
- Hydro-Meteorological Data Quality Assurance and Improvement-10
- Pca - Making Sense of Principal Component Analysis, Eigenvectors & Eigenvalues - Cross Validated
- tedetedetedetede
- Final Exam Linear Algebra
- 7 - 3 - 3 Sparse Coding and Predictive Coding (2354)
- San
- opc-sol2
- Lebl 2014_Ankle Plantarflexion Strength in Rearfoot a Novel Clusteranalytic
- Lammi
- 10.1.1.94
- Maine Polution Bulletin
- BenmenzerGobetJerusalemGasOilVECM0911
- 4579-21994-1-PB
- Optimization of Chemical Processes,Soln

You are on page 1of 3

Max Welling Department of Computer Science University of Toronto 10 King’s College Road Toronto, M5S 3G5 Canada welling@cs.toronto.edu

Abstract

This is a note to explain kPCA.

1 PCA

Let’s ﬁst see what PCA is when we do not worry about kernels and feature spaces. We will always assume that we have centered data, i.e. i xi = 0. This can always be achieved by a simple translation of the axis. Our aim is to ﬁnd meaningful projections of the data. However, we are facing an unsupervised problem where we don’t have access to any labels. If we had, we should be doing Linear Discriminant Analysis. Due to this lack of labels, our aim will be to ﬁnd the subspace of largest variance, where we choose the number of retained dimensions beforehand. This is clearly a strong assumption, because it may happen that there is interesting signal in the directions of small variance, in which case PCA in not a suitable technique (and we should perhaps use a technique called independent component analysis). However, usually it is true that the directions of smallest variance represent uninteresting noise. To make progress, we start by writing down the sample-covariance matrix C , 1 xi xT C= i N i

(1)

The eigenvalues of this matrix represent the variance in the eigen-directions of data-space. The eigen-vector corresponding to the largest eigenvalue is the direction in which the data is most stretched out. The second direction is orthogonal to it and picks the direction of largest variance in that orthogonal subspace etc. Thus, to reduce the dimensionality of the data, we project the data onto the retained eigen-directions of largest variance: U ΛU T = C and the projection is given by, ⇒ C=

a T yi = Uk xi ∀i (3) where Uk means the d × k sub-matrix containing the ﬁrst k eigenvectors as columns. As a side effect, we can now show that the projected data are de-correlated in this new basis: 1 1 T T T T T yi yi = U T xi xT (4) i Uk = Uk CUk = Uk U ΛU Uk = Λk N i N i k

λa ua uT a

(2)

2 Centering Data in Feature Space It is in fact very difﬁcult to explicitly center the data in feature space. (5) Now imagine that there are more dimensions than data-cases. it is central to most kernel methods.k a αj xj = = = λa xT i ua λa xT i j ⇒ a αj xj ⇒ (7) λa j a T [xi xj ] αj We now rename the matrix [xT i xj ] = Kij to arrive at. is minimal. This implies that we are now ready to kernelize the procedure by replacing xi → Φ(xi ) and deﬁning Kij = Φ(xi )Φ(xj )T . i. so if we can center the kernel matrix .e. the whole exposition was setup so that in the end we only needed the matrix K to do our calculations. 1 1 (xT i ua ) a xi λa ua = C ua = xi = αi xi xT (xT i ua = i ua )xi ⇒ ua = N i N i N λ a i i (6) where ua is some arbitrary eigen-vector of C . losslessly) as some linear combination of the data-vectors. when we receive a new data-case t and we like to compute its projections onto the new reduced space. In this case it is not hard to show that the eigen-vectors that span the projection space must lie in the subspace spanned by the data-cases.where Λk is the (diagonal) k × k sub-matrix corresponding to the largest eigenvalues. uT at = i a T αi xi t = i a αi K (xi . This also implies that instead of the eigenvalue equations C u = λu we may consider the N projected equations xT i Cu = a λxT i u ∀i. some dimensions remain unoccupied by the data.e. i.j a a T T αi αj [xi xj ] = αT a K αa = N λa αa αa = 1 ⇒ ||αa || = 1/ N λa (9) Finally. But. The last expression can be interpreted as: “every eigen-vector can be exactly written (i. By requiring that u is normalized we ﬁnd. Another convenient property of this procedure is that the reconstruction error in L2 -norm between from y to x is minimal. ||xi − Pk xi ||2 i T where Pk = Uk Uk is the projection onto the subspace spanned by the columns of Uk . we know that the ﬁnal algorithm only depends on the kernel matrix. we have derived an eigenvalue equation for α which in turn completely determines the eigenvectors u. xT i C ua 1 xT i N 1 N xk xT k k j a T [xi xk ][xT αj k xj ] j. t) (10) This equation should look familiar. Obviously. where Φ(xi ) = Φia . we compute. From this equation the coefﬁcients αi can be computed efﬁciently a space of dimension N (and not d) as follows. 2 a ˜ a )αa with λ ˜ a = N λa K α = N λa K αa ⇒ K αa = (λ (8) So. uT a ua = 1 ⇒ i.e. This can be seen as follows. and hence it must lie in its span”.

1 Φi = Φi − Φk (11) N k Hence the kernel in terms of the new features is given by. xj ) − κ(ti )1T j − 1i κ(xj ) + k 1i 1j (18) . c Kij = = = with and (Φi − 1 N Φk )(Φj − k 1 N Φl )T l (12) ΦT l ]+[ 1 N Φk ][ k Φi ΦT j −[ Kij − 1 N Φk ]ΦT j − Φi [ k 1 N l 1 N ΦT l ](13) l κi 1 T j − 1i κT j Kik + k 1i 1T j (14) (15) (16) 1 κi = N k= 1 N2 k Kij ij Hence. xj ) = [Φ(ti ) − 1 N Φ(xk )][Φ(xj ) − k 1 N Φ(xl )]T l (17) Using a similar calculation (left for the reader) you can ﬁnd that this can be expressed easily in terms of K (ti . xj ) and K (xi . A kernel matrix is given by Kij = Φi ΦT j . T T Kc (ti . We now center the features using. xj ) as follows.we are done as well. xj ) = K (ti . At test-time we need to compute. Kc (ti . we can compute the centered kernel in terms of the non-centered kernel alone and no features need to be accessed.

- Examining Social Media and Higher Education: An Empirical Study on Rate My Professors.com and Its Impact on College StudentsUploaded byTI Journals Publishing
- pls_disUploaded byRudolf Kiralj
- 05458324Uploaded byjigarmoradiya
- 7581MeijUploaded byKezia Tindas
- chap52Uploaded byMiguel Maffei
- 1Uploaded byAzzam Abid Dzikron
- Hydro-Meteorological Data Quality Assurance and Improvement-10Uploaded bysasa.vukoje
- Pca - Making Sense of Principal Component Analysis, Eigenvectors & Eigenvalues - Cross ValidatedUploaded bysharmi
- tedetedetedetedeUploaded bymisranasrof9
- Final Exam Linear AlgebraUploaded byAsari Mbiam-Ekpe
- 7 - 3 - 3 Sparse Coding and Predictive Coding (2354)Uploaded bySanket Sharma
- SanUploaded byseelan9
- opc-sol2Uploaded bysalim
- Lebl 2014_Ankle Plantarflexion Strength in Rearfoot a Novel ClusteranalyticUploaded byWilliam Moses
- LammiUploaded byjyoti1010
- 10.1.1.94Uploaded byjuan.vesa
- Maine Polution BulletinUploaded byhostil2
- BenmenzerGobetJerusalemGasOilVECM0911Uploaded byMehdi Bouassab
- 4579-21994-1-PBUploaded byEbenezer Samson
- Optimization of Chemical Processes,SolnUploaded byCristhian Gómez
- The Analysis of a Momentum Model and Accompanying Portfolio StrategiesUploaded byRobert T. Samuel III
- HWUploaded byApam Benjamin
- Time Varying Riccati Differential Equations With Known Analytic SolutionsUploaded bynohaycuentas
- A New GPCA Algorithm for Clustering Subspaces by Fitting, Differentiating and Dividing PolynomialsUploaded byTrieu Do Minh
- 289ml project report 3Uploaded byapi-317071008
- 10.1.1.158Uploaded byIon Caciula
- CB Mutual Funds 14 06Uploaded byPrafulla Tekriwal
- Ronco (2008) Screening of Sediment Pollution in TributariesUploaded byNu Vi
- Xlstatpls HelpUploaded byJuan Carlos Gonzalez L
- 5exUploaded byTrung Phan

- Web Cp1 CorrelatedUploaded byShiv Ganesh
- Matrix Inversion LemmaUploaded byHunmin Shin
- Singular Value Decomposition TutorialUploaded byPoppeye Olive
- Eigen Value BoundUploaded byShiv Ganesh
- Web Md1 IntroUploaded byShiv Ganesh
- Web Md1 IntroUploaded byShiv Ganesh
- Gate by Rk KanodiaUploaded byachin_khurana
- Appendix Non Parametric RegressionUploaded bydevendrabarad1985
- State Trans MatrixUploaded byShiv Ganesh
- Bipartite ClusteringUploaded byAdeyemi Emmanuel Adewale
- Bipartite ClusteringUploaded byAdeyemi Emmanuel Adewale
- Using Matlab SimulinkUploaded byDustin White

- bombas hidraulicas danfoss.pdfUploaded byGeorge Lopezz
- Rahaman et.al 2011 StressStrain characteristics of flexible pavement using FEAUploaded byNarendra Goud
- Numerical Simulation of Chip Formation in Metal (Abaqus)Uploaded byeko budianto
- 19760004918_1976004918Uploaded byBoonchai Watjatrakul
- 13_1461Uploaded byAngga Fajar Setiawan
- 4 Pressure BuoyancyUploaded bythinkiit
- 02_SCTO_Sample_Paper_I.pdfUploaded byRyan Davis
- Slag Pot DesignUploaded bynitesh1mishra
- Accelerometer ACH 01Uploaded bysmithgt10
- Intze Water Tank Based on K. RajuUploaded byrushicivil1
- Dupire BrunoUploaded byyeongloh
- Cp MeasurementsUploaded byaltamimz
- ME 303 AssignmentUploaded bydeshraj
- GE-16 Lateral BracingUploaded bycabpcabp
- The Mentality Profile, Toward an Economy of HR MetricsUploaded byPeter Anyebe
- Mechanics of Materials Chap 08-02Uploaded by1234fuckyou
- Ssc Mock Test Paper -159 75Uploaded byAlik Das
- JNFUploaded byaucyaucy123
- Anima MundiUploaded byViviana Puebla
- Chemistry 2 Practice ProblemsUploaded byLamisa Imam
- beam design packageUploaded byapi-241308632
- FEMA [Evaluation of Ductility in Steel & Composite Beam to Column Joints].pdfUploaded bySteel_cat
- DC machine 1.docxUploaded byEuzziel Eclarinal
- BA201 Engineering Mathematic UNIT2 - Measures of Central TendencyUploaded byAh Tiang
- CourseNotes1_1Uploaded byapi-3835421
- Industrial Threading FundamentalsUploaded byAnil Ch
- Sacrificial Anode Cathodic PolarizationUploaded bysanaamikhail
- Solution of Magnetic Effect of Current2023Uploaded byasha
- Herbert E. Lindberg, Alexander L. Florence Auth. Dynamic Pulse Buckling Theory and ExperimentUploaded byJulio Vergara
- 15th Numerical Towing Tank Symposium.pdfUploaded byAnonymous mv84nhz