You are on page 1of 5

Contents

Preface

List of Contributors

xi

1 The ALL Dataset


F. Hahne and R. Gentleman
1.1
Introduction . . . . . . . . . .
1.2
The ALL data . . . . . . . . .
1.3
Data subsetting . . . . . . . . .
1.4
Nonspecic ltering . . . . . .
1.5
BCR/ABL ALL1/AF4 subset

1
.
.
.
.
.

.
.
.
.
.

2 R and Bioconductor Introduction


R. Gentleman, F. Hahne, S. Falcon,
and M. Morgan
2.1
Finding help in R . . . . . . . . . .
2.2
Working with packages . . . . . .
2.3
Some basic R . . . . . . . . . . . .
2.4
Structures for genomic data . . . .
2.5
Graphics . . . . . . . . . . . . . . .
3 Processing Aymetrix Expression
R. Gentleman and W. Huber
3.1
The input data: CEL les . . . .
3.2
Quality assessment . . . . . . . .
3.3
Preprocessing . . . . . . . . . . .
3.4
Ranking and ltering probe sets
3.5
Advanced preprocessing . . . . .
4 Two-Color Arrays
Florian Hahne and Wolfgang
4.1
Introduction . . . . .
4.2
Data import . . . . .
4.3
Image plots . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

Data
.
.
.
.
.

1
1
2
3
4

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

5
7
8
11
20
25

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

25
28
32
33
40
47

Huber
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .

47
48
50

viii

Contents

4.4
4.5

Normalization . . . . . . . . . . . . . . . . . . . . . . . . . .
Dierential expression . . . . . . . . . . . . . . . . . . . . .

5 Fold-Changes, Log-Ratios, Background Correction,


Shrinkage Estimation, and Variance Stabilization
W. Huber
5.1
Fold-changes and (log-)ratios . . . . . . . . . . . . . .
5.2
Background-correction and generalized logarithm . .
5.3
Calling VSN . . . . . . . . . . . . . . . . . . . . . . .
5.4
How does VSN work? . . . . . . . . . . . . . . . . . .
5.5
Robust tting and the most genes not dierentially
expressed assumption . . . . . . . . . . . . . . . . . .
5.6
Single-color normalization . . . . . . . . . . . . . . . .
5.7
The interpretation of glog-ratios . . . . . . . . . . . .
5.8
Reference normalization . . . . . . . . . . . . . . . . .
6 Easy Dierential Expression
F. Hahne and W. Huber
6.1
Example data . . . . . . . .
6.2
Nonspecic ltering . . . .
6.3
Dierential expression . . .
6.4
Multiple testing correction

50
57

63
.
.
.
.

.
.
.
.

.
.
.
.

63
65
70
72

.
.
.
.

.
.
.
.

.
.
.
.

74
78
79
81
83

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

7 Dierential Expression
W. Huber, D. Scholtens, F. Hahne,
and A. von Heydebreck
7.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . .
7.2
Nonspecic ltering . . . . . . . . . . . . . . . . . . .
7.3
Dierential expression . . . . . . . . . . . . . . . . . .
7.4
Multiple testing . . . . . . . . . . . . . . . . . . . . .
7.5
Moderated test statistics and the limma package . .
7.6
Gene selection by Receiver Operator Characteristic
(ROC) . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.7
When power increases . . . . . . . . . . . . . . . . . .
8 Annotation and Metadata
W. Huber and F. Hahne
8.1
Our data . . . . . . . . . . . . . . . . . . .
8.2
Multiple probe sets per gene . . . . . . . .
8.3
Categories and overrepresentation . . . . .
8.4
Working with GO . . . . . . . . . . . . . .
8.5
Other annotations available . . . . . . . . .
8.6
biomaRt . . . . . . . . . . . . . . . . . . .
8.7
Database versions of annotation packages .

.
.
.
.

.
.
.
.

.
.
.
.

83
84
85
87
89

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

89
90
92
94
95

. . .
. . .

99
101
103

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

103
106
107
109
112
113
115

Contents

9 Supervised Machine Learning


R. Gentleman, W. Huber, and V. J. Carey
9.1
Introduction . . . . . . . . . . . . . .
9.2
The example dataset . . . . . . . . . .
9.3
Feature selection and standardization
9.4
Selecting a distance . . . . . . . . . .
9.5
Machine learning . . . . . . . . . . . .
9.6
Cross-validation . . . . . . . . . . . .
9.7
Random forests . . . . . . . . . . . . .
9.8
Multigroup classication . . . . . . .
10 Unsupervised Machine Learning
R. Gentleman and V. J. Carey
10.1 Preliminaries . . . . . . . . . .
10.2 Distances . . . . . . . . . . . .
10.3 How many clusters? . . . . . .
10.4 Hierarchical clustering . . . . .
10.5 Partitioning methods . . . . .
10.6 Self-organizing maps . . . . . .
10.7 Hopach . . . . . . . . . . . . .
10.8 Silhouette plots . . . . . . . . .
10.9 Exploring transformations . .
10.10 Remarks . . . . . . . . . . . . .

ix

121
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

121
123
124
124
126
129
132
135
137

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

11 Using Graphs for Interactome Data


T. Chiang, S. Falcon, F. Hahne, and W. Huber
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Exploring the protein interaction graph . . . . . . . . . .
11.3 The co-expression graph . . . . . . . . . . . . . . . . . . .
11.4 Testing the association between physical interaction and
coexpression . . . . . . . . . . . . . . . . . . . . . . . . .
11.5 Some harder problems . . . . . . . . . . . . . . . . . . . .
11.6 Reading PSI-25 XML les from IntAct
with the Rintact package . . . . . . . . . . . . . . . . .
12 Graph Layout
F. Hahne, W. Huber, and R. Gentleman
12.1 Introduction . . . . . . . . . . . . . . . .
12.2 Layout and rendering using Rgraphviz
12.3 Directed graphs . . . . . . . . . . . . . . .
12.4 Subgraphs . . . . . . . . . . . . . . . . . .
12.5 Tooltips and hyperlinks on graphs . . . .

.
.
.
.
.
.
.
.
.
.

137
139
142
144
146
148
151
152
154
157
159

.
.
.

159
160
162

.
.

164
165

165
173

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

173
175
180
185
187

Contents

13 Gene Set Enrichment Analysis


193
R. Gentleman, M. Morgan, and W. Huber
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 193
13.2 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 196
13.3 Identifying and assessing the eects of overlapping gene
sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
14 Hypergeometric Testing Used for Gene
Analysis
S. Falcon and R. Gentleman
14.1 Introduction . . . . . . . . . . . . . . .
14.2 The basic problem . . . . . . . . . . . .
14.3 Preprocessing and inputs . . . . . . . .
14.4 Outputs and result summarization . . .
14.5 The conditional hypergeometric test . .
14.6 Other collections of gene sets . . . . . .

Set Enrichment
207
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

15 Solutions to Exercises
2
R and Bioconductor Introduction . . . . . . . . . . . . .
3
Processing Aymetrix Expression Data . . . . . . . . . .
4
Two-Color Arrays . . . . . . . . . . . . . . . . . . . . . .
5
Fold-Changes, Log-Ratios, Background Correction,
Shrinkage Estimation, and Variance Stabilization . . . .
6
Easy Dierential Expression . . . . . . . . . . . . . . . .
7
Dierential Expression . . . . . . . . . . . . . . . . . . . .
8
Annotation and Metadata . . . . . . . . . . . . . . . . . .
9
Supervised Machine Learning . . . . . . . . . . . . . . . .
10
Unsupervised Machine Learning . . . . . . . . . . . . . .
11
Using Graphs for Interactome Data . . . . . . . . . . . .
12
Graph Layout . . . . . . . . . . . . . . . . . . . . . . . . .
13
Gene Set Enrichment Analysis . . . . . . . . . . . . . . .
14
Hypergeometric Testing Used for Gene Set Enrichment
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

207
208
209
215
218
219

.
.
.

221
221
226
230

.
.
.
.
.
.
.
.
.

231
233
233
234
241
249
256
259
261

265

References

271

Index

277

http://www.springer.com/978-0-387-77239-4

You might also like