Professional Documents
Culture Documents
R For Biochemists: Principal Component Analysis With Published CLL Gene Expression Data (Herishanu Et Al, Blood 2011)
R For Biochemists: Principal Component Analysis With Published CLL Gene Expression Data (Herishanu Et Al, Blood 2011)
R for Biochemists
R for Biochemists is preparing teaching materials for R for Biochemists 201 Biochemical Society Online
Training Course
How to use this site...
Next R for Biochemists 101 online course starts on Sep 9th 2019...
Programmatic visualization of UK
Principal Component Analysis with published CLL gene expression data (Herishanu et SARS-CoV-2 spike protein
al, Blood 2011) variant - 22/01/2021
Programmatic visualization of SARS-
CoV-2 Spike Protein - 17/01/2021
Over the last few years, my understanding of chronic lymphocytic leukaemia has developed. Leukaemic cell proliferation has been shown to be a
Volcano time line by Tidy
key element of the disease. The site of CLL cell proliferation was an important question. A gene expression analysis published in Blood by
Tuesday... - 18/05/2020
Herishanu et al in 2011 showed a proliferative gene expression pattern in the leukeamic cells from the lymph nodes as distinct from other sites of
Exploring Phantom ticket prices with
the body (peripheral blood or bone marrow). Tidy Tuesday... - 30/04/2020
I've been exploring the gene expression data using R. As a first step, I've explored the principal component analysis similar to that shown in Figure
1A of the paper. Follow by Email
R Markdown Slideshows
Available on Github
Pages
Blog archive
► 2021 (2)
► 2020 (5)
► 2019 (5)
► 2018 (4)
► 2017 (14)
▼ 2016 (18)
► December (1)
► November (1)
The PCA is convincing because the three groups of samples cluster among themselves. The 3D plot is useful because the three components
separate the groups while two dimensions do not. ► October (1)
► August (1)
▼ June (1)
Principal Component Analysis with
published CLL ge...
► May (4)
► March (6)
► February (1)
► January (2)
► 2015 (45)
Useful R sites
R-bloggers
Deploy to Shinyapps.io from Github
Actions
There are other ways of visualising PCA plots... Sources are indicated as part of the script. Quick-R: Home Page
SCRIPT START
# install.packages("RCurl", "scatterplot3d")
library(RCurl)
library(scatterplot3d)
# A few year's ago a gene expression profile paper was published in Blood by Herishanu et al
# that gave us a new insight into chronic lymphocytic leukamia.
# http://www.bloodjournal.org/content/117/2/563.long
# I have written this script to explore the data
# and to reproduce the principal component analysis.
str(sampID)
sampID$V1 <- as.character(sampID$V1)
plot(hclust(dist(t(data))))
# http://www.bloodjournal.org/content/117/2/563/tab-figures-only
# the twelve with all 3 compartments: #1, #2, #3, #4, #8, #9, #10, #11, #12, #13, #25, #26
# let's explore this concept with just two samples to see if it works
# pick out data for patient #26
samp26 <- data[,grep("#26", sampID$Name)]
plot(hclust(dist(t(twosamp))))
# cluster by patient
plot(hclust(dist(t(twosamp.s))))
# pat # 1
colnames(data)
pat_1 <- cbind(data[,1], data[,27], data[,46])
colnames(pat_1) <- c("PB#1", "BM#1", "LN#1")
pat_1.s <- pat_1 - rowMeans(pat_1)
# pat # 2
pat_2 <- cbind(data[,2], data[,28], data[,47])
colnames(pat_2) <- c("PB#2", "BM#2", "LN#2")
pat_2.s <- pat_2 - rowMeans(pat_2)
data.reqd <- c("#3", "#4", "#8", "#9", "#10", "#11", "#12", "#13", "#25", "#26")
for(i in 1:length(data.reqd)){
samp <- data[,grep(data.reqd[i], sampID$Name)]
samp <- samp - rowMeans(samp)
data.n <- cbind(data.n, samp)
}
# this does cluster the LN samples distinctly but there is still some overlap
# of Peripheral Blood and Bone Marrow.
# PCA from the paper looks nice and convincing
# http://www.r-bloggers.com/computing-and-visualizing-pca-in-r/
# uses function princomp()
plot(princomp(data.n)$loadings)
# http://www.r-bloggers.com/visualizing-principal-components/
p <- princomp(data.n)
loadings <- p$loadings[]
p.variance.explained <- p$sdev^2 / sum(p$sdev^2)
#*****************************************************************
# 2-D Plot
#******************************************************************
x <- loadings[,1]
y <- loadings[,2]
z <- loadings[,3]
cols <- as.factor(colourby)
cols <- gsub("PB", "green", cols)
cols <- gsub("BM", "red", cols)
cols <- gsub("LN", "blue", cols)
#*****************************************************************
# 3-D Plot, for good examples of 3D plots
# http://statmethods.wordpress.com/2012/01/30/getting-fancy-with-3-d-scatterplots/
#******************************************************************
# plot all companies loadings on the first, second, and third principal components and highlight points
according to the sector they belong
s3d = scatterplot3d(x, y, z,
xlab='Comp.1', ylab='Comp.2', zlab='Comp.3',
color=cols, pch = symbols,
main = "Principal Component Analysis in 3D \n CLL cell gene expression")
s3d.coords = s3d$xyz.convert(x, y, z)
text(s3d.coords$x, s3d.coords$y,
labels=colnames(data.n), col=cols, cex=.8, pos=4)
Labels can be added but I'm not sure they help in this example.
# change the order and the angle to make it look a bit more like the figure
s3d = scatterplot3d(y, x, z,
xlab='PC2', ylab='PC1', zlab='PC3',
color=cols, pch = symbols,
angle=-25,
grid = FALSE,
# add legend
legend(10, -4.5, # location
bty="n", # suppress legend box
title="Site of sample",
c("PB", "BM", "LN"),
pch=symbols, col = cols, horiz=TRUE)
SCRIPT END
No comments:
Post a comment
Comments and suggestions are welcome.