Professional Documents
Culture Documents
Unsupervised learning
1. Unsupervised Learning (Ch. 12)
2
Supervised vs. Unsupervised Learning
• Supervised Learning: both and are known
• Unsupervised Learning: only is known
3
Challenges of Unsupervised Learning
• Tend to be more subjective
• No simple goal like prediction of a response
• Part of an exploratory data analysis
• Hard to assess the results obtained
• Reason is simple: no way to check our work because we don’t know
the true answer----the problem is “unsupervised”
4
Roadmap for today
1. Principal Component Analysis (Ch. 12.2)
2
What Does PCA Do?
• Dimension (# of columns/variables)
() ()
6
Transformation of X to Z
• Transforms feature matrix X into a lower dimensional principal
feature matrix Z
() ()
7
Why would this be useful?
PCA as a Rotation/Projection
• Animations and short tutorial
https://www.ty-penguin.org.uk/~auj/old_aber_pages/talks/depttalk96/pca.active.html
https://www.youtube.com/watch?v=FgakZw6K1QQ
• Input variables principal components
9
Terms
• Scores and loadings
++…+
10
Algorithm
Typically standardize the individual columns in X matrix (mean zero
and standard deviation one) before PCA
Transforms the original variables (columns in X matrix) so that sum of
squares of the individual terms is maximized
11
This maximizes the spread of data:
Why is this good?
This maximizes the spread of data:
Why is this good?
Further components
• is the data matrix
14
Properties of Principal Components
• The principal components (columns of the Z matrix) are orthogonal,
i.e., uncorrelated
• They are ordered according to the decreasing variance in the data
they capture: has the largest variance, has the second largest
variance, etc.
15
Example: PCA in Two-dimensional Case
16
R Implementation
Perform PCA on the USArrests dataset, which is part of the base R
package (no need to install new package)
For each of the 50 states in the US, the data set contains the number of arrests per 100,000 residents for each
of three crimes: Assault, Murder, and Rape. UrbanPop (the percent of the population in each state living in
urban areas) is also recorded.
states=row.names(USArrests)
states
names(USArrests)
17
Perform PCA
##apply PCA
pr.out=prcomp(USArrests, scale=TRUE)
names(pr.out)
dim(USArrests)
18
Plot of the First Two PCs
biplot(pr.out, scale=0)
19
Proportion of Variance Explained (PVE)
pr.out$sdev
pr.var=pr.out$sdev^2
pr.var
pve=pr.var/sum(pr.var)
pve
( )
𝑛 𝑝 2
∑ ∑ 𝜙 𝑗𝑚 𝑥𝑖𝑗
𝑖 =1 𝑗 =1
𝑃𝑉𝐸 ( 𝑃 𝐶 𝑚 ) = 𝑛 𝑝
∑∑
2
𝑥𝑖𝑗
𝑖=1 𝑗 =1
20
Scree Plot
plot(pve, xlab="Principal Component", ylab="Proportion of
Variance Explained", ylim=c(0,1),type='b')
1.0
0.8
Proportion of Variance Explained
0.6
0.4
0.2
0.0
Principal Component
21
Plot of Cumulative PVE
plot(cumsum(pve), xlab="Principal Component",
ylab="Cumulative Proportion of Variance Explained",
ylim=c(0,1),type='b')
22
Other Uses for PCA:
Preprocessing for Supervised Learning
Using Principal components as input:
• See R
Handling Missing Data (See ISLR 12.3 for Details)
• Common approaches:
– Delete rows with missing data
– Estimate missing data with average for that
The Problem with Deleting:
29