Dimensionality Reduction Slides, PCA, FA, LDA, ISOMAP

© All Rights Reserved

5 views

Dimensionality Reduction Slides, PCA, FA, LDA, ISOMAP

© All Rights Reserved

- Asq Statistics Division Newsletter v16 i02 Full Issue
- A Survey on Vision Based Human Action Recognition Image and Vision Computing 2010
- Computer Science & Inrformation Technology 2017
- Using Bass-Line Features for Content-Based MIR
- Industrial Fault Diagnosis
- Statsth01
- Pca Deep Information
- Radial Basis Function for Handwritten Devanagari Numeral Recognition
- Performance Factors
- Image Based Face Recognition Issues and Methods
- A Practical Guide to Factorial Validity Using PLS-Graph- Tutorial
- ELEMENTAL STUDY OF POTTERY SHARDS FROM AN ARCHAEOLOGICAL SITE IN DEDAN, SAUDI ARABIA, USING INDUCTIVELY COUPLED PLASMA-MASS SPECTROMETRY AND MULTIVARIATE STATISTICAL ANALYSIS
- Class 10 Factor Analysis I
- A SURVEY ON COPY MOVE FORGERY DETECTION TECHNIQUES.pdf
- Applications of Independent Component Analysis
- 10.1.1.102
- 199-W10067
- Kumar 2016 Article Real-TimeVisionBasedDriverDrow
- Pages From Zhang2015
- CJASR-12-13-141

You are on page 1of 27

ETHEM ALPAYDIN

The MIT Press, 2010

alpaydin@boun.edu.tr

http://www.cmpe.boun.edu.tr/~ethem/i2ml2e

Why Reduce Dimensionality?

Reduces time complexity: Less computation

Reduces space complexity: Less parameters

Saves the cost of observing the feature

Simpler models are more robust on small datasets

More interpretable; simpler explanation

Data visualization (structure, groups, outliers, etc) if

plotted in 2 or 3 dimensions

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 3

Feature Selection vs Extraction

Feature selection: Choosing k<d important features,

ignoring the remaining d k

Subset selection algorithms

Feature extraction: Project the

original xi , i =1,...,d dimensions to

new k<d dimensions, zj , j =1,...,k

discriminant analysis (LDA), factor analysis (FA)

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 4

Subset Selection

There are 2d subsets of d features

Forward search: Add the best feature at each step

Set of features F initially .

At each iteration, find the best new feature

j = argmini E ( F xi )

Add xj to F if E ( F xj ) < E ( F )

Backward search: Start with all features and remove

one at a time, if possible.

Floating search (Add k, remove l)

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 5

Principal Components Analysis (PCA)

Find a low-dimensional space such that when x is

projected there, information loss is minimized.

The projection of x on the direction of w is: z = wTx

Find w such that Var(z) is maximized

Var(z) = Var(wTx) = E[(wTx wT)2]

= E[(wTx wT)(wTx wT)]

= E[wT(x )(x )Tw]

= wT E[(x )(x )T]w = wT w

where Var(x)= E[(x )(x )T] =

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 6

Maximize Var(z) subject to ||w||=1

maxw1T w1 w1T w1 1

w1

Choose the one with the largest eigenvalue for Var(z) to be max

Second principal component: Max Var(z2), s.t., ||w2||=1 and

orthogonal to w1

maxwT2 w 2 wT2 w 2 1 wT2 w1 0

w2

and so on.

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 7

What PCA does

z = WT(x m)

where the columns of W are the eigenvectors of , and m

is sample mean

Centers the data at the origin and rotates the axes

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 8

How to choose k ?

Proportion of Variance (PoV) explained

1 2 k

1 2 k d

Typically, stop at PoV>0.9

Scree graph plots of PoV vs k, stop at elbow

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 9

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 10

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 11

Factor Analysis

Find a small number of factors z, which when combined

generate x :

xi i = vi1z1 + vi2z2 + ... + vikzk + i

E[ zj ]=0, Var(zj)=1, Cov(zi ,, zj)=0, i j ,

i are the noise sources

E* i += i, Cov(i , j) =0, i j, Cov(i , zj) =0 ,

and vij are the factor loadings

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 12

PCA vs FA

PCA From x to z z = WT(x )

FA From z to x x = Vz +

x z

z x

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 13

Factor Analysis

In FA, factors zj are stretched, rotated and translated to

generate x

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 14

Multidimensional Scaling

Given pairwise distances between N points,

dij, i,j =1,...,N

place on a low-dim map s.t. distances are preserved.

z = g (x | ) Find that min Sammon stress

E | X

z r

z x x

s r s

2

s 2

r ,s x xr

gx | gx | x

r s r

x s

2

s 2

r ,s x x

r

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 15

Map of Europe by MDS

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 16

Linear Discriminant Analysis

Find a low-dimensional

space such that when x is

projected, classes are

well-separated.

Find w that maximizes

J w

m1 m2 2

2

s1 s2

2

m1

t x r

w T t t

s t w x m1 r

2 T t 2 t

r t 1

t

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 17

Between-class scatter:

m1 m2 w m1 w m 2

2 T T 2

w m1 m 2 m1 m 2 w

T T

w T SB w where SB m1 m 2 m1 m 2 T

Within-class scatter:

s t w x m1 r

2 T t 2 t

1

t w x m1 x m1 wr t w T S1w

T t t T

where S1 t x m1 x m1 r t t T t

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 18

Fishers Linear Discriminant

Find w that max

w SB w w m1 m 2

T 2

T

Jw T

w SW w w T SW w

LDA soln: w c SW1 m1 m2

Parametric soln:

w 1 2

1

when px|C i ~ N i ,

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 19

K>2 Classes

Within-class scatter:

Si t ri x m i x m i

K

SW Si t t t T

i 1

Between-class scatter:

K

1 K

SB Ni m i m m i m T m mi

i 1 K i 1

Find W that max

W SB W

T The largest eigenvectors of SW-1SB

J W Maximum rank of K-1

WT SW W

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 20

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 21

Isomap

Geodesic distance is the distance along the manifold that

the data lies in, as opposed to the Euclidean distance in

the input space

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 22

Isomap

Instances r and s are connected in the graph if

||xr-xs||<e or if xs is one of the k neighbors of xr

The edge length is ||xr-xs||

For two nodes r and s not connected, the distance is equal to

the shortest path between them

Once the NxN distance matrix is thus formed, use MDS to find

a lower-dimensional mapping

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 23

Optdigits after Isomap (with neighborhood graph).

150

100 2

22222

2

2

50 3 22 2

7 7777 1 11 313

333

77 7 7 7 4 111 1

1 338

3

1 8 83

7 44999 5 5 5 98 38

0 99944 9 59 88

49

4 0

88 0 0

0 00

-50 000

6

4 6 66 0

6 66

4

44

-100

4

-150

-150 -100 -50 0 50 100 150

Matlab source from http://web.mit.edu/cocosci/isomap/isomap.html

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 24

Locally Linear Embedding

1. Given xr find its neighbors xs(r)

2. Find Wrs that minimize

2

r s

2

E (z | W) z r Wrs z(sr )

r s

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 25

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 26

LLE on Optdigits

1

0 000

7

7777

7

6 666 7

7 9 9

1 66 399 47

84 4

8383

9334

957

9

44

389 93

41

9

8 34

3

484 1

1

4 4 1 82 282

1 1 22 222

9

8 25

1

1

1 55

5

1

1

-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 27

- Asq Statistics Division Newsletter v16 i02 Full IssueUploaded byleovence
- A Survey on Vision Based Human Action Recognition Image and Vision Computing 2010Uploaded byXP_CUONG
- Computer Science & Inrformation Technology 2017Uploaded bygsain1
- Using Bass-Line Features for Content-Based MIRUploaded bymachinelearner
- Industrial Fault DiagnosisUploaded byJeevan Preethu
- Statsth01Uploaded byqazwsx777
- Pca Deep InformationUploaded byapi-3798296
- Radial Basis Function for Handwritten Devanagari Numeral RecognitionUploaded byEditor IJACSA
- Performance FactorsUploaded bykiraraki
- Image Based Face Recognition Issues and MethodsUploaded byAfdhol Satria
- A Practical Guide to Factorial Validity Using PLS-Graph- TutorialUploaded bysalauwale
- ELEMENTAL STUDY OF POTTERY SHARDS FROM AN ARCHAEOLOGICAL SITE IN DEDAN, SAUDI ARABIA, USING INDUCTIVELY COUPLED PLASMA-MASS SPECTROMETRY AND MULTIVARIATE STATISTICAL ANALYSISUploaded byBaru Chandrasekhar Rao
- Class 10 Factor Analysis IUploaded byKenny Reyes
- A SURVEY ON COPY MOVE FORGERY DETECTION TECHNIQUES.pdfUploaded bypummykid
- Applications of Independent Component AnalysisUploaded bypsbc9
- 10.1.1.102Uploaded bycsrinivas.se7878
- 199-W10067Uploaded byRatnawaty
- Kumar 2016 Article Real-TimeVisionBasedDriverDrowUploaded byKumar Rajamani
- Pages From Zhang2015Uploaded byKAS Mohamed
- CJASR-12-13-141Uploaded byAmin Mojiri
- Quality Indices for Goat MilkUploaded bysusilo hendri prayogi
- Determinants of Global Competitiveness on Industrial Performance an Application of Exploratory Factor AnalysisUploaded byInternational Journal of Research in Engineering and Technology
- CCC_FB_FacExprRecCVonline.pdfUploaded byDanny Páez
- 05 Res BhanuMurthyUploaded byKautsky Moin
- 2005 - Designing Multiple Classiﬁer SystemsUploaded byluongxuandan
- ENVIR.pdfUploaded byIwanudin
- DpmUploaded bySorn Darong
- Transition EconomiesUploaded byNatasa Kidisevic
- 7C5F1d01Uploaded bySâm Hoàng
- Lecture HandoutUploaded byTuấn Phạm

- Regional Forecasts and Smoothing Effect of Photovoltaic Power Generation in JapanUploaded byBibek Joshi
- tmpE8FC.tmpUploaded byFrontiers
- Survey on Various Image Denoising TechniquesUploaded byAnonymous CUPykm6DZ
- Liquid chromatography–mass spectrometry-based metabolomicsUploaded bynishi@saini
- Critical Management Practices Influecing on Site Waste Minimization in Construction ProjectsUploaded byjcaldanab
- Assessing the Predictive Power of Measures of Financial Conditions for Macro Economic VariablesUploaded byrunawayyy
- Linear Factor Models and Auto-EncodersUploaded byMuhammad Rizwan
- Chain Code and Holistic Features Based OCR System for Printed Devanagari Script Using ANN and SVMUploaded byAdam Hansen
- proj2Uploaded byTomás Calderón
- Wei Zhou | LinkedInUploaded bySushil Sontakke
- A Review on Non Linear Dimensionality Reduction Techniques for Face RecognitionUploaded byEditor IJRITCC
- Factor Analysis GlossaryUploaded byZorz112
- Lip Reading Using Image ProcessingUploaded byavinash
- application of eigen value n eigen vectorUploaded byAman Khera
- Wang Chen OpportunityUploaded byAgung Adiputra
- 4003-11480-1-PB.pdfUploaded bySaul Saleky
- Bernanke Variables in Transmission Mechanism.pdfUploaded byNiall Devitt
- Multivariate Analysis With LISRELUploaded byC. Madhavaiah
- Air Quality Forecasting - AthensUploaded byAnurag Kandya
- TensorUploaded byGulfam Shahzad
- Rnews_2004-1Uploaded bynascem
- Interpreting Multiple RegressionUploaded byRalph Wajah Zwena
- Detection of Land Useland Cover ChangesUploaded byLakshmana Kv
- Montero et al_2017_LATAM.pdfUploaded bySixto Gutiérrez Saavedra
- SizeUploaded bycarlodolci
- Adulteration Blends Nir Pca PlsUploaded byAna Luiza
- SSRN-id2096222Uploaded byvan
- [R._H._G._Jongman,_C._J._F._Ter_Braak,_O._F._R._va(b-ok.org).pdfUploaded byPabloDarlan
- multivariate basics for students_MewaSingh.pdfUploaded byAmal Fernando
- Dump Slope Rating for Indian Coal Mining by Sharma Et Al 2017Uploaded bymuntazir999