You are on page 1of 6

Chapter 10: Exercise 10

a
set.seed(2)
x = matrix(rnorm(20*3*50, mean=0, sd=0.001), ncol=50)
x[1:20, 2] = 1
x[21:40, 1] = 2
x[21:40, 2] = 2
x[41:60, 1] = 1

The concept here is to separate the three classes amongst two dimensions.

b
pca.out = prcomp(x)
summary(pca.out)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.01 0.582 0.00173 0.00167 0.00165 0.00158 0.00154
## Proportion of Variance 0.75 0.250 0.00000 0.00000 0.00000 0.00000 0.00000
## Cumulative Proportion 0.75 1.000 0.99997 0.99997 0.99997 0.99997 0.99998
## PC8 PC9 PC10 PC11 PC12 PC13
## Standard deviation 0.0015 0.00147 0.00141 0.00139 0.00134 0.0013
## Proportion of Variance 0.0000 0.00000 0.00000 0.00000 0.00000 0.0000
## Cumulative Proportion 1.0000 0.99998 0.99998 0.99998 0.99998 1.0000
## PC14 PC15 PC16 PC17 PC18 PC19
## Standard deviation 0.00126 0.00124 0.00123 0.00116 0.00112 0.00109
## Proportion of Variance 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## Cumulative Proportion 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999
## PC20 PC21 PC22 PC23 PC24 PC25
## Standard deviation 0.00102 0.00101 0.000985 0.000938 0.000932 0.000908
## Proportion of Variance 0.00000 0.00000 0.000000 0.000000 0.000000 0.000000
## Cumulative Proportion 0.99999 0.99999 0.999990 0.999990 0.999990 0.999990
## PC26 PC27 PC28 PC29 PC30
## Standard deviation 0.000867 0.000823 0.000801 0.000749 0.000712
## Proportion of Variance 0.000000 0.000000 0.000000 0.000000 0.000000
## Cumulative Proportion 1.000000 1.000000 1.000000 1.000000 1.000000
## PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.000697 0.000673 0.000632 0.000591 0.000565
## Proportion of Variance 0.000000 0.000000 0.000000 0.000000 0.000000
## Cumulative Proportion 1.000000 1.000000 1.000000 1.000000 1.000000
## PC36 PC37 PC38 PC39 PC40
## Standard deviation 0.000538 0.000532 0.000476 0.000448 0.000426
## Proportion of Variance 0.000000 0.000000 0.000000 0.000000 0.000000
## Cumulative Proportion 1.000000 1.000000 1.000000 1.000000 1.000000
## PC41 PC42 PC43 PC44 PC45
## Standard deviation 0.000391 0.000377 0.000314 0.000296 0.000273
## Proportion of Variance 0.000000 0.000000 0.000000 0.000000 0.000000
## Cumulative Proportion 1.000000 1.000000 1.000000 1.000000 1.000000
## PC46 PC47 PC48 PC49 PC50
## Standard deviation 0.00025 0.000191 0.000147 0.000129 7.79e-05
## Proportion of Variance 0.00000 0.000000 0.000000 0.000000 0.00e+00
## Cumulative Proportion 1.00000 1.000000 1.000000 1.000000 1.00e+00

pca.out$x[,1:2]
## PC1 PC2
## [1,] -0.7079 -7.077e-01
## [2,] -0.7072 -7.069e-01
## [3,] -0.7062 -7.059e-01
## [4,] -0.7081 -7.078e-01
## [5,] -0.7073 -7.071e-01
## [6,] -0.7072 -7.069e-01
## [7,] -0.7068 -7.065e-01
## [8,] -0.7075 -7.072e-01
## [9,] -0.7059 -7.056e-01
## [10,] -0.7074 -7.071e-01
## [11,] -0.7070 -7.067e-01
## [12,] -0.7066 -7.063e-01
## [13,] -0.7076 -7.073e-01
## [14,] -0.7080 -7.078e-01
## [15,] -0.7060 -7.058e-01
## [16,] -0.7089 -7.087e-01
## [17,] -0.7067 -7.064e-01
## [18,] -0.7073 -7.070e-01
## [19,] -0.7066 -7.063e-01
## [20,] -0.7070 -7.067e-01
## [21,] 1.4141 -7.089e-05
## [22,] 1.4141 -6.894e-05
## [23,] 1.4141 -7.309e-05
## [24,] 1.4141 -6.935e-05
## [25,] 1.4141 -7.045e-05
## [26,] 1.4141 -6.983e-05
## [27,] 1.4141 -7.448e-05
## [28,] 1.4141 -6.910e-05
## [29,] 1.4141 -7.290e-05
## [30,] 1.4141 -7.055e-05
## [31,] 1.4141 -6.990e-05
## [32,] 1.4141 -7.065e-05
## [33,] 1.4141 -7.014e-05
## [34,] 1.4141 -7.006e-05
## [35,] 1.4141 -7.296e-05
## [36,] 1.4141 -6.608e-05
## [37,] 1.4141 -7.286e-05
## [38,] 1.4141 -6.920e-05
## [39,] 1.4141 -6.744e-05
## [40,] 1.4141 -6.950e-05
## [41,] -0.7064 7.064e-01
## [42,] -0.7070 7.070e-01
## [43,] -0.7074 7.074e-01
## [44,] -0.7077 7.077e-01
## [45,] -0.7078 7.078e-01
## [46,] -0.7057 7.057e-01
## [47,] -0.7065 7.065e-01
## [48,] -0.7058 7.058e-01
## [49,] -0.7075 7.075e-01
## [50,] -0.7074 7.074e-01
## [51,] -0.7079 7.079e-01
## [52,] -0.7074 7.074e-01
## [53,] -0.7068 7.069e-01
## [54,] -0.7062 7.062e-01
## [55,] -0.7068 7.068e-01
## [56,] -0.7069 7.069e-01
## [57,] -0.7063 7.063e-01
## [58,] -0.7064 7.064e-01
## [59,] -0.7071 7.071e-01
## [60,] -0.7077 7.078e-01

plot(pca.out$x[,1:2], col=2:4, xlab="Z1", ylab="Z2", pch=19)

c
km.out = kmeans(x, 3, nstart=20)
table(km.out$cluster, c(rep(1,20), rep(2,20), rep(3,20)))
##
## 1 2 3
## 1 20 0 0
## 2 0 20 0
## 3 0 0 20

Perfect match.

d
km.out = kmeans(x, 2, nstart=20)
km.out$cluster

## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [36] 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

All of one previous class absorbed into a single class.

e
km.out = kmeans(x, 4, nstart=20)
km.out$cluster

## [1] 4 4 4 4 4 1 4 1 4 4 1 1 1 1 4 1 4 1 1 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [36] 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

All of one previous cluster split into two clusters.

f
km.out = kmeans(pca.out$x[,1:2], 3, nstart=20)
table(km.out$cluster, c(rep(1,20), rep(2,20), rep(3,20)))

##
## 1 2 3
## 1 0 0 20
## 2 0 20 0
## 3 20 0 0

Perfect match, once again.

g
km.out = kmeans(scale(x), 3, nstart=20)
km.out$cluster
## [1] 1 1 1 1 1 3 3 3 2 3 1 3 3 3 1 3 2 3 3 1 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2
## [36] 2 2 2 2 2 1 1 1 3 1 3 1 1 1 3 2 3 3 1 1 3 1 3 3 1

Poorer results than (b): the scaling of the observations effects the distance between them.

You might also like