Professional Documents
Culture Documents
Robinson 2010
Robinson 2010
Gene expression
4 DISCUSSION
edgeR estimates the genewise dispersions by conditional Funding: National Health and Medical Research Council Program
maximum likelihood, conditioning on the total count for that gene (Grant 406657 to G.K.S.); NHMRC, Independent Research Institutes
(Smyth and Verbyla, 1996). An empirical Bayes procedure is used Infrastructure Support Scheme (Grant 361646); Victorian State
to shrink the dispersions towards a consensus value, effectively Government OIS grant (awarded to the WEHI); a Melbourne
borrowing information between genes (Robinson and Smyth, 2007). International Research Scholarship (to M.D.R.); Belz, Harris and
Finally, differential expression is assessed for each gene using IBS Honours scholarships (to D.J.M.).
an exact test analogous to Fisher’s exact test, but adapted for
Conflict of Interest: none declared.
overdispersed data (Robinson and Smyth, 2008).
REFERENCES
3 FEATURES
Andersson,A.F. et al. (2008) Comparative analysis of human gut microbiota by bar-
The required inputs for edgeR are the table of counts and two coded pyrosequencing. PLoS ONE, 3, e2836.
vectors annotating the samples: the vector of the library sizes (i.e. Gentleman,R.C. et al. (2004) Bioconductor: open software development for com-
total number of reads) and a factor specifying the experimental group putational biology and bioinformatics. Genome Biol., 5, R80.
or condition for each sample. Li,H. et al. (2008) Determination of tag density required for digital transcriptome
analysis: application to an androgen-sensitive prostate cancer model. Proc. Natl
For users of limma, the edgeR package has a number of Acad. Sci. USA, 105, 20179–20184.
analogous functions. Once the data have been processed and the Marioni,J.C. et al. (2008) RNA-seq: an assessment of technical reproducibility and
dispersion estimates are moderated, the topTags function can comparison with gene expression arrays. Genome Res., 18, 1509–1517.
be used to tabulate the top differentially expressed genes (or tags Robinson,M.D. and Smyth,G.K. (2007) Moderated statistical tests for assessing
differences in tag abundance. Bioinformatics, 23, 2881–2887.
or exons, etc.). Also, MA (log ratio versus abundance) plots can
Robinson,M.D. and Smyth,G.K. (2008) Small sample estimation of negative binomial
be created using the plotSmear function, allowing the same dispersion, with applications to SAGE data. Biostatistics, 9, 321–332.
visualizations for DGE data as used for microarray data analysis Smyth,G.K. (2004) Linear models and empirical Bayes methods for assessing
(Fig. 1). differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol.,
A number of features have been added to the edgeR package since 1, Art 3.
Smyth,G.K. and Verbyla,A.P. (1996). A conditional approach to residual maximum
the initial publications. The initial methodology worked only for a likelihood estimation in generalized linear models. J. R. Stat. Soc. B, 58, 565–572.
two-group comparison. The extension to estimating and moderating Wong,J.W.H. et al. (2008) Computational methods for the comparative quantification
the dispersion for multiple groups is straightforward and has been of proteins in label-free LCn-MS experiments. Brief. Bioinform., 9, 156–165.
140