Microarray Data Analysis Using Bioconductor

Microarray data analysis using Bioconductor
Alex Sánchez
Statistics and Bioinformatics Research Group
Departament d’Estadı́stica. Universitat de Barcelona
April 12, 2007
Contents
1 Bioconductor Classes 2
1.1 Biobase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 class phenoData . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 class MIAME . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 class exprSet . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Computing on exprSet objects . . . . . . . . . . . . . . . 4
1.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 affy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 class AffyBatch . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Computing on AffyBatch . . . . . . . . . . . . . . . . . . 5
1.2.3 cdfenvs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1
1 Bioconductor Classes
Object-oriented design provides a convenient way to represent data and actions
that can be performed on them. A class can be tought of as a template, a
description of what constitutes each instance of the class. An instance of a
class is a realization of what describes the class. Attributes of a class are data
components, and methods of a class are functions, or actions the instance/class
is capable of 1 .
The R language has an implementation of object concepts through the pack-
age methods.
1.1 Biobase
The package Biobase contains basic strucutures for microarray data.
> library(Biobase)
1.1.1 class phenoData

Class phenoData contains covariate information, i.e. information relative the
hybridization experiments. This is particularily convenient for explorative anal-
ysis, as important covariate are not known.
> samplenames <- letters[1:10]

> dataf <- data.frame(treated = sample(c(TRUE, FALSE),
+ 10, replace = TRUE), sex = sample(c("Male",
+ "Female"), 10, replace = TRUE), mood = sample(c("Happy",
+ "Dont't care", "Grumpy"), 10, replace = TRUE),
+ names = samplenames, row.names = "names")
> dataf.desc = list("Treated with dark chocolate",
+ "Sex", "Mood while eating")
> pdata <- new("phenoData", pData = dataf, varLabels = dataf.desc)
First of all we set several variables to hold important values

> dataDir <- file.path(workingDir, "datos.celltypes")
> targetsFile <- "celltypes.txt"
> anotPackage <- "mouse4302"
> my.targets <- read.phenoData(file.path(dataDir,

+ targetsFile), header = TRUE, row.names = 1)
> print(my.targets)
phenoData object with 2 variables and 6 cases

varLabels
age: read from file
treat: read from file
> print(pData(my.targets))
1 This lab is based on some of Laurent Gautier excellent labs
2
age treat
Aged LPS 80L.CEL Aged LPS
Aged Medium 81m.CEL Aged MED
1.1.2 class MIAME

Class MIAME was created to adapt Bioconductor data structures to the “Mini-
mum Information About a Microarray Experiment” standard. In practice people
tend to skip its use.
> my.desc <- new("MIAME", name = "LPS_Experiment",

+ lab = "National Cancer Institute", contact = "Lakshman Chelvaraja",
+ title = "Molecular basis of age associated cytokine dysregulation in LPS stimulated m
+ url = "http://www.jleukbio.org/cgi/content/abstract/79/6/1314")
> print(my.desc)
Experiment data
Experimenter name: LPS_Experiment
Laboratory: National Cancer Institute
Contact information: Lakshman Chelvaraja
Title: Molecular basis of age associated cytokine dysregulation in LPS stimulated macroph
URL: http://www.jleukbio.org/cgi/content/abstract/79/6/1314
PMIDs:
No abstract available.
1.1.3 class exprSet

Instances of class exprSet contain expression data. The class is mainly consti-
tuted of:
exprs a matrix of expression values (one gene per row, one hybridization ex-
periment per column).
phenoData an instance of class phenoData
description an instance of class MIAME
> data(sample.exprSet.1)
> m <- exprs(sample.exprSet.1)[, c(1:3, 13:15)]
> colnames(m) <- NULL
> eset <- new("exprSet", exprs = m, phenoData = my.targets,
+ description = my.desc)
> eset
Expression Set (exprSet) with

500 genes
6 samples
phenoData object with 2 variables and 6 cases
3
varLabels
age: read from file
treat: read from file
> description(eset)
Experiment data
Experimenter name: LPS_Experiment
Laboratory: National Cancer Institute
Contact information: Lakshman Chelvaraja
Title: Molecular basis of age associated cytokine dysregulation in LPS stimulated macroph
URL: http://www.jleukbio.org/cgi/content/abstract/79/6/1314
PMIDs:
No abstract available.
Instances of class exprSet can be exported to tab delimited files, a format

that is not Microsoft Excel unfriendly.
> exprs2excel(eset, file = "myeset.csv")
1.1.4 Computing on exprSet objects

Computing on instances of exprSet can be done on the attributes, or using
dedicated functions.
> m <- exprs(eset)

> gmean <- apply(m, 1, mean)
> gsd <- apply(m, 1, sd)
> gmean <- esApply(eset, 1, mean)

> gsd <- esApply(eset, 1, sd)
> ttestTreat <- function(x) {
+ xs <- split(x, treat)
+ pval <- t.test(xs[[1]], xs[[2]])$p.value
+ return(pval)
+ }
> pvalttest <- esApply(eset, 1, ttestTreat)
Bioconductor packages should be “exprSet compliant”. This means that,

ideally, one should be able to define computations using exprSets as data inputs,
instead of, for instance matrices.
This compliance is not compulsory, what means that authors can freely de-
cide how they prefer to represent their data.
1.1.5 Exercises
1. Obtain an expression matrix from somewhere in the net and create a data
frame describing this dataset.
2. create an instance of class ’exprSet’ using the matrix and the data.frame
created.
3. export it to a MS Excel-friendly format.
4
4. Load the data package golubEsets. The exprSet to use is the golubTrain.
5. For each gene, compute the ratio
mean(ALL)
rALL.AML =
mean(AML)
6. For each gene, compute the product
pALL.AML = mean(ALL) × mean(AML)
7. Plot log(pALL.AML ) vs log(pALL.AML ).
1.2 affy
1.2.1 class AffyBatch
The class AffyBatch extends the class exprSet. This means that it inherits
characteristics from its ancestor. Therefore is it also constituted of:
a matrix of probe intensities

an attribute of class phenoData
an attribute description to be MIAME compliant
As it extends the class exprSet it contains supplementary attributes:

cdfName The name of the chip type. This name is used to resolve the cdfenv.
nrow, ncol The size of the Affymetrix array.
> library(affy)
> data(affybatch.example)
> affybatch.example
AffyBatch object
size of arrays=100x100 features (240 kb)
cdf=cdfenv.example (150 affyids)
number of samples=3
number of genes=150
annotation=
1.2.2 Computing on AffyBatch

bgcorrect Correct for background signal
abatch.bg <- bg.correct(affybatch.example, method="rma")
normalize Normalize probe intensities (to make them comparable)
abatch.n <- normalize(affybatch.example, method="qspline")
computeExprSet Compute a summary expression value
5
eset <- computeExprSet(affybatch.example, pmcorrect.method="pmonly", summary.method="m
expresso The function expresso is a wrapper around the processing methods
bgcorrect, normalize and computeExprSet applied in sequence.
eset <- expresso(affybatch.example, widget=TRUE)
mas, rma These functions are wrapper around expresso. They define popu-
lar/standard settings for pre-processing.
1.2.3 cdfenvs
The cdfenvs are of environment. They contain associative mappings probe set
identifiers and indices for the rows in the matrix that contains probe intensities.
1.2.4 Exercises
1. load the library affydata
2. load the dataset Dilution. This dataset is used for the exercises
3. use the hist to plot intensity distributions
4. process the AffyBatch using no background correction, the qspline nor-
malization method, the pmonly perfect match correction method and the
medianpolish summary method.

Microarray Data Analysis Using Bioconductor

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Microarray Data Analysis Using Bioconductor

Uploaded by

Copyright:

Available Formats

Microarray data analysis using Bioconductor

1.1.1 class phenoData

> samplenames <- letters[1:10]

First of all we set several variables to hold important values

> my.targets <- read.phenoData(file.path(dataDir,

phenoData object with 2 variables and 6 cases

1.1.2 class MIAME

> my.desc <- new("MIAME", name = "LPS_Experiment",

1.1.3 class exprSet

Expression Set (exprSet) with

Instances of class exprSet can be exported to tab delimited files, a format

1.1.4 Computing on exprSet objects

> m <- exprs(eset)

> gmean <- esApply(eset, 1, mean)

Bioconductor packages should be “exprSet compliant”. This means that,

6. For each gene, compute the product

pALL.AML = mean(ALL) × mean(AML)

7. Plot log(pALL.AML ) vs log(pALL.AML ).

a matrix of probe intensities

As it extends the class exprSet it contains supplementary attributes:

1.2.2 Computing on AffyBatch

You might also like

Microarray Data Analysis Using Bioconductor

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Microarray Data Analysis Using Bioconductor

Uploaded by

Copyright:

Available Formats

Microarray data analysis using Bioconductor

1.1.1 class phenoData

> samplenames <- letters[1:10]

First of all we set several variables to hold important values

> my.targets <- read.phenoData(file.path(dataDir,

phenoData object with 2 variables and 6 cases

1.1.2 class MIAME

> my.desc <- new("MIAME", name = "LPS_Experiment",

1.1.3 class exprSet

Expression Set (exprSet) with

Instances of class exprSet can be exported to tab delimited files, a format

1.1.4 Computing on exprSet objects

> m <- exprs(eset)

> gmean <- esApply(eset, 1, mean)

Bioconductor packages should be “exprSet compliant”. This means that,

6. For each gene, compute the product

pALL.AML = mean(ALL) × mean(AML)

7. Plot log(pALL.AML ) vs log(pALL.AML ).

 a matrix of probe intensities

As it extends the class exprSet it contains supplementary attributes:

1.2.2 Computing on AffyBatch

You might also like

a matrix of probe intensities