You are on page 1of 20

Meadhbh Healy – DTU Compute

Principal Component Analysis


Learning Objectives

A student who has met the objectives of the course will be able to:
• Describe the advantages of compressing your data from higher to lower
dimensions
• Explore the importance of standardising your data
• Evaluate the first linear component from a scatterplot

March 23, 2023 DTU Compute Principal Component Analysis 2


Cancerous Growths

Figure: Let’s say we had images of human skin growths...

March 23, 2023 DTU Compute Principal Component Analysis 3


Cancer v Benign

Figure: We need to put them in different classes

March 23, 2023 DTU Compute Principal Component Analysis 4


Dataset from images

• Many different properties of


images... i.e. size, shape,
smoothness etc.
• Dataset has say 30 dimensions

March 23, 2023 DTU Compute Principal Component Analysis 5


You could use...Principal Component Analysis

PCAs of your data are the eigenvectors of your data’s covariance matrix
• PCA is a tool that helps makes data more efficient
• Reduces the dimensions of the data so that only the most relevant
variables are included

March 23, 2023 DTU Compute Principal Component Analysis 6


Activity 1

• Think, pair share... Think about possible advantages and disadvantages of


compressing your data

March 23, 2023 DTU Compute Principal Component Analysis 7


Advantage - What visualisations might look like...

March 23, 2023 DTU Compute Principal Component Analysis 8


Plots with PCA

Figure: PCA plot of cancer dataset

March 23, 2023 DTU Compute Principal Component Analysis 9


Disadvantage - may lose information

Figure: 3d-2d information

March 23, 2023 DTU Compute Principal Component Analysis 10


What do we need to perform PCA

Categorical data is data with a fixed outcome i.e. small, medium and
large
Continuous data is measurable data i.e. income

• Data must be numeric


• Categorical data must be encoded
• Must standardize your continuous data

March 23, 2023 DTU Compute Principal Component Analysis 11


Step 1. Why we standardize

We standardize the data because:


• It may be on different scales
• Attributes with larger variance may contribute more to the component

March 23, 2023 DTU Compute Principal Component Analysis 12


How we standardize

Figure: Z-score normalisation

March 23, 2023 DTU Compute Principal Component Analysis 13


Activity 2 - Quiz

Please log into https://www.menti.com/al2v6e3kbmc6


Voting code - 8387 1568

March 23, 2023 DTU Compute Principal Component Analysis 14


Step 2. Covariance Matrix

• Input data varies from the mean with respect to each other
• See if there is any relationship between them
• Highly correlated variables may contain redundant information

March 23, 2023 DTU Compute Principal Component Analysis 15


Covariance Matrix.. part 2

Figure: Covariance formula


Figure: 3*3 matrix

March 23, 2023 DTU Compute Principal Component Analysis 16


Step 3. Eigenvectors and Eigenvalues

Figure: Principal Components

March 23, 2023 DTU Compute Principal Component Analysis 17


Eigenvectors and Eigenvalues

March 23, 2023 DTU Compute Principal Component Analysis 18


Actvity 3
Please mark the axis of greatest variation on the scatterplot

March 23, 2023 DTU Compute Principal Component Analysis 19


Actvity 3 - Answer
Please mark the axis of greatest variation on the scatterplot

March 23, 2023 DTU Compute Principal Component Analysis 20

You might also like