Unsupervised Learning - PCA - 601

Announcements
Unsupervised learning
1. Unsupervised Learning (Ch. 12)
2
Supervised vs. Unsupervised Learning
• Supervised Learning: both and are known
• Unsupervised Learning: only is known
Supervised Learning Unsupervised Learning
3
Challenges of Unsupervised Learning
• Tend to be more subjective
• No simple goal like prediction of a response
• Part of an exploratory data analysis
• Hard to assess the results obtained
• Reason is simple: no way to check our work because we don’t know
the true answer----the problem is “unsupervised”
4
Roadmap for today
1. Principal Component Analysis (Ch. 12.2)
2
What Does PCA Do?
• Dimension (# of columns/variables)
() ()
• PCA looks to find a low-dimensional representation of the

observations that explain a good fraction of the variance
6
Transformation of X to Z
• Transforms feature matrix X into a lower dimensional principal
feature matrix Z
() ()
7
Why would this be useful?
PCA as a Rotation/Projection
• Animations and short tutorial
https://www.ty-penguin.org.uk/~auj/old_aber_pages/talks/depttalk96/pca.active.html
https://www.youtube.com/watch?v=FgakZw6K1QQ
• Input variables  principal components
9
Terms
• Scores and loadings
++…+
Scores of the 1st PC Loadings of the 1st PC
10
Algorithm
Typically standardize the individual columns in X matrix (mean zero
and standard deviation one) before PCA
Transforms the original variables (columns in X matrix) so that sum of
squares of the individual terms is maximized
Equivalent to minimizing the distance to the line
11
This maximizes the spread of data:
Why is this good?
This maximizes the spread of data:
Why is this good?
Further components
• is the data matrix
• Once we have the first principal component ( vector)

• Score: ( vector)
• Solve for the loadings of the 2nd PC:
14
Properties of Principal Components
• The principal components (columns of the Z matrix) are orthogonal,
i.e., uncorrelated
• They are ordered according to the decreasing variance in the data
they capture: has the largest variance, has the second largest
variance, etc.
• The principal component scores can be used in further supervised

learning (e.g., as predictors in regression)
15
Example: PCA in Two-dimensional Case
16
R Implementation
Perform PCA on the USArrests dataset, which is part of the base R
package (no need to install new package)
For each of the 50 states in the US, the data set contains the number of arrests per 100,000 residents for each
of three crimes: Assault, Murder, and Rape. UrbanPop (the percent of the population in each state living in
urban areas) is also recorded.
states=row.names(USArrests)
states
names(USArrests)
##find the mean and var of columns of the table

apply(USArrests, 2, mean)
apply(USArrests, 2, var)
17
Perform PCA
##apply PCA
pr.out=prcomp(USArrests, scale=TRUE)
names(pr.out)
pr.out$center #means of the X variables

pr.out$scale #stds of the X variables
pr.out$rotation #loading matrix
dim(USArrests)
dim(pr.out$x) #x: score vectors (Z matrix)
18
Plot of the First Two PCs
biplot(pr.out, scale=0)
19
Proportion of Variance Explained (PVE)
pr.out$sdev
pr.var=pr.out$sdev^2
pr.var
pve=pr.var/sum(pr.var)
pve
( )
𝑛 𝑝 2
∑ ∑ 𝜙 𝑗𝑚 𝑥𝑖𝑗
𝑖 =1 𝑗 =1
𝑃𝑉𝐸 ( 𝑃 𝐶 𝑚 ) = 𝑛 𝑝
∑∑
2
𝑥𝑖𝑗
𝑖=1 𝑗 =1
20
Scree Plot
plot(pve, xlab="Principal Component", ylab="Proportion of
Variance Explained", ylim=c(0,1),type='b')
1.0
0.8
Proportion of Variance Explained
0.6
0.4
0.2
0.0
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Principal Component
21
Plot of Cumulative PVE
plot(cumsum(pve), xlab="Principal Component",
ylab="Cumulative Proportion of Variance Explained",
ylim=c(0,1),type='b')
22
Other Uses for PCA:
Preprocessing for Supervised Learning
Using Principal components as input:
• See R
Handling Missing Data (See ISLR 12.3 for Details)
• Common approaches:
– Delete rows with missing data
– Estimate missing data with average for that
The Problem with Deleting:
Excerpt of Netflix movie rating data. Movies are rated

from 1 (worst) to 5 (best). Most customers have viewed
<1% of the movies on Netflix.
The Problem with Averaging:
• Often, input data is correlated.
– i.e., customers that like Avengers 1 typically like
Avengers 2.
– Assume average ranking for both movies are (3,3)
– A customer that liked Avengers 1 but hasn’t seen
Avengers 2, i.e., (5,ꚛ), would be represented by (5,3)
Better Approach:
Estimate Missing Values with PCA
Questions
• What is the purpose of PCA?
• What are “scores” and “loadings” in PCA?
• Should we standardize each variable before PCA?
• If the data consists of variables (i.e., X matrix has columns),
how many principal components will we obtain initially?
• How are the initial principal components ordered?
• How can we reduce the number of principal components?
• How can the final (reduced) principal components be used in
statistical learning?
29

Unsupervised Learning - PCA - 601

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unsupervised Learning - PCA - 601

Uploaded by

Copyright:

Available Formats

Announcements

Supervised Learning Unsupervised Learning

• PCA looks to find a low-dimensional representation of the

Scores of the 1st PC Loadings of the 1st PC

Equivalent to minimizing the distance to the line

• Once we have the first principal component ( vector)

• Solve for the loadings of the 2nd PC:

• The principal component scores can be used in further supervised

##find the mean and var of columns of the table

pr.out$center #means of the X variables

pr.out$rotation #loading matrix

dim(pr.out$x) #x: score vectors (Z matrix)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

Excerpt of Netflix movie rating data. Movies are rated

You might also like