Professional Documents
Culture Documents
a
data = read.csv("./Ch10Ex11.csv", header=F)
dim(data)
## [1] 1000 40
b
dd = as.dist(1 - cor(data))
plot(hclust(dd, method="complete"))
plot(hclust(dd, method="single"))
plot(hclust(dd, method="average"))
Two or three groups depending on the linkage method.
c
To look at which genes differ the most across the healthy patients and diseased patients, we could look at the loading
vectors outputted from PCA to see which genes are used to describe the variance the most.
pr.out = prcomp(t(data))
summary(pr.out)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 11.941 6.0682 5.9348 5.8312 5.7521 5.7003 5.6345
## Proportion of Variance 0.127 0.0327 0.0313 0.0302 0.0294 0.0289 0.0282
## Cumulative Proportion 0.127 0.1594 0.1907 0.2209 0.2503 0.2792 0.3074
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 5.5773 5.5494 5.5062 5.4885 5.4602 5.4023 5.3344
## Proportion of Variance 0.0276 0.0274 0.0269 0.0268 0.0265 0.0259 0.0253
## Cumulative Proportion 0.3350 0.3624 0.3893 0.4160 0.4425 0.4685 0.4938
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 5.2776 5.2159 5.200 5.1514 5.1160 5.0559 5.0384
## Proportion of Variance 0.0248 0.0242 0.024 0.0236 0.0232 0.0227 0.0226
## Cumulative Proportion 0.5185 0.5427 0.567 0.5903 0.6135 0.6362 0.6588
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 5.0187 4.9597 4.9139 4.864 4.8180 4.8081 4.7348
## Proportion of Variance 0.0224 0.0219 0.0215 0.021 0.0206 0.0205 0.0199
## Cumulative Proportion 0.6812 0.7030 0.7245 0.745 0.7661 0.7866 0.8066
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 4.7010 4.6556 4.6162 4.5673 4.5303 4.495 4.3650
## Proportion of Variance 0.0196 0.0193 0.0189 0.0185 0.0182 0.018 0.0169
## Cumulative Proportion 0.8262 0.8455 0.8644 0.8829 0.9012 0.919 0.9360
## PC36 PC37 PC38 PC39 PC40
## Standard deviation 4.3586 4.2670 4.2028 4.1392 5.25e-15
## Proportion of Variance 0.0169 0.0162 0.0157 0.0152 0.00e+00
## Cumulative Proportion 0.9529 0.9691 0.9848 1.0000 1.00e+00
total_load[indices[1:10]]
(*) I’m not sure this is the correct way to aggregate the loading vector.