principal

© All Rights Reserved

1 views

principal

© All Rights Reserved

- PC235W13 Assignment1 Solutions
- lclogit
- Mathematics II May2004 or 210156
- francesco
- Basic Simulation
- Toro Vazquez
- r5100204 Mathematical Methods
- Brief Introduction to Vectors and Matrices Chapter3
- On approximation of functions by exponential sums
- Paper 56 - Segers
- 1046 Camera Motion Estimation by Tracking Contour Deformation Precision Analysis
- Matlab Tut
- IBM SPSS Amos User's Guide
- Wave Propagation in Anisotropic Layered Media
- Hyperspectral Subspace Identiﬁcation
- Matlab Intro 2
- Constructing Socio-economic Status Indices
- fair-bets
- midtermOne6711F10sols
- Response Method Theory

You are on page 1of 3

The idea: find which features strongly correlate to each other; if the correlation is high, then

(at least) one of those features can be eliminated.

Features with highest variances are the most interesting.

Example: calculating covariance between two features X and Y, where mx = avg(X) and my = avg(Y).

Sample

1

2

3

4

5

6

7

8

9

10

avg=

stdev=

X

2.5

0.5

2.2

1.9

3.1

2.3

2

1

1.5

1.1

Y

2.4

0.7

2.9

2.2 ===>

3

2.7

1.6

1.1

1.6

0.9

X-mx

0.69

-1.31

0.39

0.09

1.29

0.49

0.19

-0.81

-0.31

-0.71

Y-my

0.49

-1.21

0.99

0.29

1.09

0.79

-0.31

-0.81

-0.31

-1.01

1.81

1.91

0.785211 0.846496

Cov(X,Y) = (1/n)*[(x1-mx)(y1-my) + (x2-mx)(y2-my) + + (xn-mx)(yn-my)]

Notice that Cov(X,X) is the same thing as variance.

Cov(X,Y)=

0.5539

Cov(X,X)=

Cov(Y,Y)=

0.5549

0.6449

Cov(Y,X) = Cov(X,Y)

For easy reference, tabulate the result by putting all the covariances into a covariance matrix Cov.

Elements of Cov are Cij, where Cij = Cov(Xi, Xj). In our case, X1 = X and X2 = Y.

Cov =

0.5549

0.5539

0.5539

0.6649

At this point, we could manually program this. However, what if we have many features?

Since the covariance can be represented as a matrix, we could use matrix math.

If we could come up with covariance formulas that use matrices, that would make our life easier because

there is true-and-tested matrix code available.

We would feed matrices into a software package/library function, which would do the matrix math.

So let's work out the matrix math.

In general, the input data is a flat file, i.e. a matrix of n samples (i.e. rows) and f features (i.e. columns):

X=

x11

x21

.

x12

x22

x13

x23

xn1

xn2

xn3

x1f

x2f

xnf

To transpose a matrix means to switch rows and columns; i.e. if the original matrix had elements Xij,

the transposed matrix has elements Xji.

Transpose(X)=

x11

x12

x13

x1f

x21

x22

x23

x2f

xn1

xn2

xn3

xnf

Cov = 1/n* [Transpose(X-mx) * (X-mx)]

Vector mx is the matrix of size n x f that contains averages for all columns:

Each row of matrix mx contains [avg(1st column) avg(2nd column) avg(3rd column) ... avg(last column)]

Covariance is a square matrix of dimension f x f (because we are comparing each feature to all other features).

Cov has elements Cij, i = 1, .f and j = 1, ..f:

------------------------------------------------------------------------------------------------------------------------------------At this point, we can just let the software calculate the covariance. But let's go one more step and calculate Cij.

We can calculate Cij using column vectors.

PS - this way may look "upside down", but is stating the problem correctly - we are comparing

columns (i.e. features).

A vector is a matrix of only 1 column.

For example, if we have a vector V = [1 2] then it's transpose is:

Transpose(V) = 1

i.e. just "flipped over" V.

2

Obviously, trasnposing a transpose gives back the original: Transpose(Transpose(V)) = V.

Let us assume that we label column vectors Xk, where each Xk is a vector and represents the k-th feature.

For example, in the example above:

X1 = Transpose[2.5, 0.5,2.2, 1.9, 3.1, 2.3, 2, 1, 1.5, 1.1]

X2 = Transpose[2.4,0.7, 2.9,2.2,3.0,2.7,1.6,1.1,1.6,0.9]

Cij can be obtained using scalar multiplication of two vectors as:

Cij = [Transpose(Xi - mi) * (Xj - mj)]

where Xki and Xkj are the elements of the input matrix as shown above,

and mi and mj are the averages of ith and jth column, respectively.

-----------------------------------------------------------------------------------------------------------------------------------------------------------So far, we calculated covariance but we are not done yet with PCA: we need to find the eigenvalues of the

covariance matrix Cov.

The eigenvalues are the solutions to the following equation:

Cov * Ei = Li*Ei

Ei are eigenvectors (each Ei has dimension f x 1).

Li are corresponding eigenvalues (i.e. variances for each column), i = 1, .., f.

I is the identity matrix (square matrix with all 1's on the diagonal and 0's otherwise).

We will ask the software to do that for us. Most likely, it will solve the following equation to find Li:

determinant(Cov - L*I) = 0

Once you find eigenvalues, sort them in decreasing order. Hopefully, the first values are the prominently highest,

and can be considered the most important. The smallest values represent features that can be discarded.

To find out how many features to keep, pick m<=f highest eigenvalues and calculate:

R = SUM[i=1,m]Li / SUM[i=1,f]Li

If R > Threshold, taking only those m features and discarding other features would be

a good representation of the f-dimensional data set.

- PC235W13 Assignment1 SolutionsUploaded bykwok
- lclogitUploaded byENU
- Mathematics II May2004 or 210156Uploaded byNizam Institute of Engineering and Technology Library
- francescoUploaded byRahul Agarwal
- Basic SimulationUploaded byProsper Dzidzeme Anumah
- Toro VazquezUploaded byalex de souza
- r5100204 Mathematical MethodsUploaded bysivabharathamurthy
- Brief Introduction to Vectors and Matrices Chapter3Uploaded byarcangelizeno
- On approximation of functions by exponential sumsUploaded bylumonzon
- Paper 56 - SegersUploaded bySaurav Roy
- 1046 Camera Motion Estimation by Tracking Contour Deformation Precision AnalysisUploaded byingbinrusydi
- Matlab TutUploaded byGuruxyz
- IBM SPSS Amos User's GuideUploaded byravi431
- Wave Propagation in Anisotropic Layered MediaUploaded byAlberto García Márquez
- Hyperspectral Subspace IdentiﬁcationUploaded bysolmazbabakan
- Matlab Intro 2Uploaded byFouad Kehila
- Constructing Socio-economic Status IndicesUploaded byrdhernandez26
- fair-betsUploaded byZasorina Anastasia
- midtermOne6711F10solsUploaded bySongya Pan
- Response Method TheoryUploaded byerpixaa
- Some Slides of Lectures 11 and 12Uploaded bystephen38
- Ch 05 Mgmt 1362002Uploaded byBabar Adeeb
- Random_Variables.pdfUploaded bydeelip
- 9780387878614-c1Uploaded byEduar Cast
- Modal CorrelationUploaded byDasaka Brahmendra
- ADA Lecture 5Uploaded byMircea Nicoleta
- Book Chapter DG_0504 Irina 13 04Uploaded byHussein Razaq
- Exam-2Uploaded byاحمد السيد
- The International Journal of Biostatistics an Introduction to Causal InfeUploaded byalesandro
- Covariance and CorrelationUploaded bykomal kashyap

- 4. Jadwal PelaksanaanUploaded byTobtobTob
- TriUploaded byAdi Suseno
- TRI_English Version.docUploaded byAdi Suseno
- Executive Summary 1Uploaded byAdi Suseno
- Factory OutletUploaded byAdi Suseno
- Executive BCPUploaded byAdi Suseno
- Weng.txtUploaded byAdi Suseno
- App Letter Bri Sya xUploaded byAdi Suseno
- Data PerusahaanUploaded byAdi Suseno
- Presentasi EDTUploaded byAdi Suseno
- Status Report Resa UploadUploaded byAdi Suseno
- Status Report Eni Sumarni UploadUploaded byAdi Suseno
- Status Report CV Hanskara UploadUploaded byAdi Suseno
- Status Report Akbar Bersama UploadUploaded byAdi Suseno
- eqexretUploaded byAdi Suseno
- 5298227Uploaded byAdi Suseno
- LB4Uploaded byAdi Suseno
- Book11Uploaded byAdi Suseno
- fcfe2stUploaded bywelcome2jungle
- Docfoc.com-Cashflow(1).docxUploaded byAdi Suseno
- ALAMAT CIREBON1Uploaded byAdi Suseno
- we haveUploaded byAdi Suseno
- Tabel AgunanUploaded byAdi Suseno
- Contoh Surat TagihanUploaded byYayatto Kung
- Biling HotelUploaded byAdi Suseno
- PVIFAUploaded byAdi Suseno
- Kjm Price ListUploaded byAdi Suseno

- soni2018.docxUploaded byyeshiwas
- CourseObjCourseOutcomes 16Dec12Uploaded bysumikannu
- N introUploaded by요시루
- 133Uploaded byPraveen Krishna
- Functional AnalysisUploaded byJitendra Dhiman
- III to VIII Semester B.E. (Computer Sc. & Engineering)Uploaded bymahendrak11
- DarsUploaded byWorse To Worst Satittamajitra
- Ensam Methods for Vibration Design and Validation Methodes de Conception Et de Validation en Vibration Balmes 2011 PolyidUploaded byKstor Klx
- differentialgeom003681mbpUploaded byNavin_Fogla_7509
- AE 483 Linear Algebra Review.pdfUploaded bypuhumight
- MSc Tech Indus MathsUploaded byjayantbildani
- P14-1059Uploaded bylikufanele
- Linear Algebra QuizUploaded byDomMMK
- Differential Forms in Geometric calculusUploaded bymjk
- vectors2-1Uploaded bykid
- Ponnapalli PhD ThesisUploaded bySri Priya Ponnapalli
- MT3VectorsBook_Feb2013Uploaded byMhwaqxLhadyblue
- Progress in Clifford Space GravityUploaded byKathryn Wilson
- APPENDIX - III MAIN EXAMINATION_english.pdfUploaded byJassy Mera
- In Compressible Navier-Stokes Equations Reduce to Bernoulli's LawUploaded byReed Reader
- NIT SIKKIMUploaded byJayashree Bithika Sengupta
- Mathematics & Statistics Contents .pdfUploaded byRajesh singh
- Hestenes (GA1) 2002 - Reforming the Mathematical Language of PhysicsUploaded byvanespp
- Vector AnalysisUploaded byJia Hao
- TransformerUploaded byMuhammad Mamoon
- Privacy-Preserving Public Auditing for Regenerating-Code-Based Cloud StorageUploaded byGreeshma Kiran
- MSC Curriculum IITBUploaded byRishikeshKumar
- Aerospace Eng 07 08 EtUploaded byAkshay Sharma
- 13.CYclic CodesUploaded byVenukareddy Kummitha
- Analysis and Linear Algebra for Finance_ Part IIUploaded byAamir

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.