You are on page 1of 6

Microarray data analysis using Bioconductor

Alex Sánchez
Statistics and Bioinformatics Research Group
Departament d’Estadı́stica. Universitat de Barcelona
April 12, 2007

Contents
1 Bioconductor Classes 2
1.1 Biobase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 class phenoData . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 class MIAME . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 class exprSet . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Computing on exprSet objects . . . . . . . . . . . . . . . 4
1.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 affy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 class AffyBatch . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Computing on AffyBatch . . . . . . . . . . . . . . . . . . 5
1.2.3 cdfenvs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1
1 Bioconductor Classes
Object-oriented design provides a convenient way to represent data and actions
that can be performed on them. A class can be tought of as a template, a
description of what constitutes each instance of the class. An instance of a
class is a realization of what describes the class. Attributes of a class are data
components, and methods of a class are functions, or actions the instance/class
is capable of 1 .
The R language has an implementation of object concepts through the pack-
age methods.

1.1 Biobase
The package Biobase contains basic strucutures for microarray data.
> library(Biobase)

1.1.1 class phenoData


Class phenoData contains covariate information, i.e. information relative the
hybridization experiments. This is particularily convenient for explorative anal-
ysis, as important covariate are not known.

> samplenames <- letters[1:10]


> dataf <- data.frame(treated = sample(c(TRUE, FALSE),
+ 10, replace = TRUE), sex = sample(c("Male",
+ "Female"), 10, replace = TRUE), mood = sample(c("Happy",
+ "Dont't care", "Grumpy"), 10, replace = TRUE),
+ names = samplenames, row.names = "names")
> dataf.desc = list("Treated with dark chocolate",
+ "Sex", "Mood while eating")
> pdata <- new("phenoData", pData = dataf, varLabels = dataf.desc)

First of all we set several variables to hold important values


> dataDir <- file.path(workingDir, "datos.celltypes")
> targetsFile <- "celltypes.txt"
> anotPackage <- "mouse4302"

> my.targets <- read.phenoData(file.path(dataDir,


+ targetsFile), header = TRUE, row.names = 1)
> print(my.targets)

phenoData object with 2 variables and 6 cases


varLabels
age: read from file
treat: read from file

> print(pData(my.targets))
1 This lab is based on some of Laurent Gautier excellent labs

2
age treat
Aged LPS 80L.CEL Aged LPS
Aged LPS 86L.CEL Aged LPS
Aged LPS 88L.CEL Aged LPS
Aged Medium 81m.CEL Aged MED
Aged Medium 82m.CEL Aged MED
Aged Medium 84m.CEL Aged MED

1.1.2 class MIAME


Class MIAME was created to adapt Bioconductor data structures to the “Mini-
mum Information About a Microarray Experiment” standard. In practice people
tend to skip its use.

> my.desc <- new("MIAME", name = "LPS_Experiment",


+ lab = "National Cancer Institute", contact = "Lakshman Chelvaraja",
+ title = "Molecular basis of age associated cytokine dysregulation in LPS stimulated m
+ url = "http://www.jleukbio.org/cgi/content/abstract/79/6/1314")
> print(my.desc)

Experiment data
Experimenter name: LPS_Experiment
Laboratory: National Cancer Institute
Contact information: Lakshman Chelvaraja
Title: Molecular basis of age associated cytokine dysregulation in LPS stimulated macroph
URL: http://www.jleukbio.org/cgi/content/abstract/79/6/1314
PMIDs:
No abstract available.

1.1.3 class exprSet


Instances of class exprSet contain expression data. The class is mainly consti-
tuted of:
exprs a matrix of expression values (one gene per row, one hybridization ex-
periment per column).
phenoData an instance of class phenoData
description an instance of class MIAME

> data(sample.exprSet.1)
> m <- exprs(sample.exprSet.1)[, c(1:3, 13:15)]
> colnames(m) <- NULL
> eset <- new("exprSet", exprs = m, phenoData = my.targets,
+ description = my.desc)
> eset

Expression Set (exprSet) with


500 genes
6 samples
phenoData object with 2 variables and 6 cases

3
varLabels
age: read from file
treat: read from file

> description(eset)

Experiment data
Experimenter name: LPS_Experiment
Laboratory: National Cancer Institute
Contact information: Lakshman Chelvaraja
Title: Molecular basis of age associated cytokine dysregulation in LPS stimulated macroph
URL: http://www.jleukbio.org/cgi/content/abstract/79/6/1314
PMIDs:
No abstract available.

Instances of class exprSet can be exported to tab delimited files, a format


that is not Microsoft Excel unfriendly.
> exprs2excel(eset, file = "myeset.csv")

1.1.4 Computing on exprSet objects


Computing on instances of exprSet can be done on the attributes, or using
dedicated functions.

> m <- exprs(eset)


> gmean <- apply(m, 1, mean)
> gsd <- apply(m, 1, sd)

> gmean <- esApply(eset, 1, mean)


> gsd <- esApply(eset, 1, sd)
> ttestTreat <- function(x) {
+ xs <- split(x, treat)
+ pval <- t.test(xs[[1]], xs[[2]])$p.value
+ return(pval)
+ }
> pvalttest <- esApply(eset, 1, ttestTreat)

Bioconductor packages should be “exprSet compliant”. This means that,


ideally, one should be able to define computations using exprSets as data inputs,
instead of, for instance matrices.
This compliance is not compulsory, what means that authors can freely de-
cide how they prefer to represent their data.

1.1.5 Exercises
1. Obtain an expression matrix from somewhere in the net and create a data
frame describing this dataset.
2. create an instance of class ’exprSet’ using the matrix and the data.frame
created.
3. export it to a MS Excel-friendly format.

4
4. Load the data package golubEsets. The exprSet to use is the golubTrain.
5. For each gene, compute the ratio

mean(ALL)
rALL.AML =
mean(AML)

6. For each gene, compute the product

pALL.AML = mean(ALL) × mean(AML)

7. Plot log(pALL.AML ) vs log(pALL.AML ).

1.2 affy
1.2.1 class AffyBatch
The class AffyBatch extends the class exprSet. This means that it inherits
characteristics from its ancestor. Therefore is it also constituted of:

ˆ a matrix of probe intensities


ˆ an attribute of class phenoData
ˆ an attribute description to be MIAME compliant

As it extends the class exprSet it contains supplementary attributes:


cdfName The name of the chip type. This name is used to resolve the cdfenv.
nrow, ncol The size of the Affymetrix array.

> library(affy)
> data(affybatch.example)
> affybatch.example

AffyBatch object
size of arrays=100x100 features (240 kb)
cdf=cdfenv.example (150 affyids)
number of samples=3
number of genes=150
annotation=

1.2.2 Computing on AffyBatch


bgcorrect Correct for background signal
abatch.bg <- bg.correct(affybatch.example, method="rma")
normalize Normalize probe intensities (to make them comparable)
abatch.n <- normalize(affybatch.example, method="qspline")
computeExprSet Compute a summary expression value

5
eset <- computeExprSet(affybatch.example, pmcorrect.method="pmonly", summary.method="m
expresso The function expresso is a wrapper around the processing methods
bgcorrect, normalize and computeExprSet applied in sequence.
eset <- expresso(affybatch.example, widget=TRUE)
mas, rma These functions are wrapper around expresso. They define popu-
lar/standard settings for pre-processing.

1.2.3 cdfenvs
The cdfenvs are of environment. They contain associative mappings probe set
identifiers and indices for the rows in the matrix that contains probe intensities.

1.2.4 Exercises
1. load the library affydata
2. load the dataset Dilution. This dataset is used for the exercises
3. use the hist to plot intensity distributions
4. process the AffyBatch using no background correction, the qspline nor-
malization method, the pmonly perfect match correction method and the
medianpolish summary method.

You might also like