5 views

Uploaded by hp_blacklight7402

Hoeschele I.-[Article] a Note on Joint Versus Gene-specific Mixed Model Analysis of Microarray Gene Expression Data (2005)

- Java Programs
- Econometrics
- 4. Searching for the Causal Structure of a VAR
- SSRN-id2357065
- Bio Info
- 513
- Forecasting Examples for Business and Economics Using The
- Tex Refcard Letter
- meca_2000
- Reference Card MATLAB
- 1054803
- Lab 3 Matlab Array & Matrix Operations
- Catalinprez
- The_effect
- a-study-of-english-reading-ability-based-on-multiple-linear-regression-analysis.pdf
- LMnotes_04
- Regulation and the Macroeconomy - Dawson 2002.pdf
- ch-3,4
- 04_03_Clements
- Volatility Transmission Between Exchange Rates and Stock Prices

You are on page 1of 4

183186

doi:10.1093/biostatistics/kxi001

of microarray gene expression data

INA HOESCHELE , HUA LI

Virginia Bioinformatics Institute and Department of Statistics, Virginia Tech,

Blacksburg, VA 24061-0477, USA

inah@vt.edu

S UMMARY

Currently, linear mixed model analyses of expression microarray experiments are performed either in a

gene-specific or global mode. The joint analysis provides more flexibility in terms of how parameters are

fitted and estimated and tends to be more powerful than the gene-specific analysis. Here we show how to

implement the gene-specific linear mixed model analysis as an exact algorithm for the joint linear mixed

model analysis. The gene-specific algorithm is exact, when the mixed model equations can be partitioned

into unrelated components: One for all global fixed and random effects and the others for the gene-specific

fixed and random effects for each gene separately. This unrelatedness holds under three conditions: (1) any

gene must have the same number of replicates or probes on all arrays, but these numbers can differ among

genes; (2) the residual variance of the (transformed) expression data must be homogeneous or constant

across genes (other variance components need not be homogeneous) and (3) the number of genes in the

experiment is large. When these conditions are violated, the gene-specific algorithm is expected to be

nearly exact.

Keywords: Differential gene expression; Microarrays; Mixed model analysis.

1. I NTRODUCTION

Microarray gene expression experiments produce expression profiles of thousands or ten thousands of

genes simultaneously. The focus of this paper is on the application of linear mixed model methodology

to the detection and estimation of differential gene expression or, more generally, to the identification of

which factors of interest influence the expression of which of the arrayed genes and to the estimation of

their effects. Linear mixed model analysis (LMMA) of microarray data is either gene-specific (Wolfinger

et al., 2001) or global (e.g. Kerr and Churchill, 2001), i.e. the analysis is performed separately for each

gene or jointly for all genes, respectively. Gene-specific analyses can have low power as noticed, e.g. by

Wu et al. (2003) and Pfister-Genskow et al. (2004). Reasons for lower power of gene-specific analyses,

relative to the joint analysis, include the difference in degrees of freedom, joint versus gene-specific

estimation of the error variance and other variance components, and the use of different contrasts.

Conditions for equivalence between gene-specific and joint analyses have been stated in previous

contributions for the case of fixed linear models (e.g. Kerr, 2003; Wu et al., 2003). Here we discuss the

use of the gene-specific analysis as an algorithm for the joint analysis under a mixed linear model.

To whom correspondence should be addressed.

c The Author 2005. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oupjournals.org.

184

I. H OESCHELE AND H. L I

2. M ETHODS

y = X1 1 + Z1 u1 + X2 2 + Z2 u2 + e,

(2.1)

where 1 and u1 contain the effects of the fixed (e.g. treatment) and random global factors, respectively, 2

and u2 contain the effects of the fixed and random gene-specific factors, respectively, and X1 , Z1 and X2 ,

Z2 are design matrices for the global and gene-specific factors, respectively. Gene-specific effects include

the gene main effects (fixed) and interactions between the global fixed and random factors with the gene

factor. It is usually appropriate to treat array effects as random, so e.g. let u1 = A be the vector of random

global array effects and let u2 = A G be the vector of random gene-specific array effects. For the mixed

model analysis, we need to specify the variancecovariance structure of the random factors. We typically

2

specify that E(A) = 0, Var(A) = G11 = I A2 , E(A G) = 0, E(e) = 0, Var(A G) = G22 = I AG

2

and Var(e) = R = Ie

, under the assumption that gene-specific

are constant across genes, or

g variances

g

2

2 , under the assumption that geneand Var(e) = R = i=1 Ini e(i)

Var(A G) = G22 = i=1 In a AG(i)

specific variances differ among genes, where n a is the number of arrays, n i is the number of observations

2 and 2 are the gene-specific array variance or variance

for gene i, A2 is the global array variance and AG

e

due to array-by-gene interaction and the residual variance, respectively. Inferences about the unknown

parameters (fixed effects and variance components) are obtained by utilizing the mixed model equations

(MME) (Goldberger, 1962; Henderson, 1963). The MME for the uncentered mixed model in (2.1) are

1

1

1

1

1

X

X

X1 R X1 X

X1 R y

1

1 R X2

1 R Z1

1 R Z2

X R1 X X R1 X

X R1 y

1

1

X

R

Z

X

R

Z

1

2

1

2

2

2 2

2

2

2

(2.2)

1

= 1 ,

R1 Z + G11 Z R1 Z + G12

1 X

Z1 R X1 Z

R

Z

R

y

Z

u

2

1

2

1

1

1

1

1

1

1

1

21 Z R1 Z + G22

1

Z

Z

u 2

2

2 R X1 Z2 R X2 Z2 R Z1 + G

2

2R y

where

1

G11

=

G21

G12

G22

G11

=

G21

G12

,

G22

where G12 = Cov(A, A G) = 0. Given known values of the variance components, -solutions

to the

are best linear unbiased predictions.

Wolfinger et al. (2001) proposed to implement the gene-specific analysis with a two-step procedure.

First, the data y are analyzed with a sub-model (normalization model) containing only global effects

(1 and u1 ), and then the predicted residuals from this model are analyzed with the second sub-model

(gene-model) containing only gene-specific effects (2 and u2 ). Fitting model (2.1), with centering of the

columns of X2 and Z2 (Rencher, 2000) or without 2 and u2 , implies a reparameterization from 1 to 1

and u1 to u1 , where an element in 1 or u1 now contains the average of the interactions of this factor

with the gene factor (G). We denote the centered matrices by X2 and Z2 . After centering, or equivalently, after fitting a mixed (normalization) model containing only global factors, the vector of global

array effects becomes A with elements Ak = Ak + (A G)k and with Var(A ) = G11 = I A2 and

Cov(A , A G) = G12 . The array variance component now is A2 = Var(Ak ) = Var(Ak + A G k ),

which in the simplest case of equal replicate or probe numbers across all genes simplifies to A2 =

2 , where n is the number of genes. Matrix G has non-zero elements between any A and

A2 + n1G AG

G

k

12

2 . The MME for the

(A G)kg for all g and a given k, which in the simplest case are equal to n1G AG

centered mixed model are obtained by replacing X2 and Z2 by X2 and Z2 and replacing G1 with

1 11

G12

G11 G12

G

1

.

G

=

=

G21 G22

G21 G22

185

It can be shown that there are non-zero elements in matrix G12 between Ak and (A G)kg for all

g, and that there are non-zero elements in matrix G22 between (A G)kg and (A G)kg for all g, g

with g = g . Therefore, the offdiagonal block in the coefficient matrix of the centered MME between

u1 = A and u2 = A G is no longer strictly zero, and the same holds for offdiagonal blocks between

any sub-vectors of u2 corresponding to two different genes. As a consequence, the global and the genespecific MME are not exactly unrelated so that the gene-specific analysis is no longer an exact algorithm

to perform the joint analysis. The degree of approximation approaches exactness when the number of

genes becomes large so that G approaches G. For the centered MME, all offdiagonal sub-matrices of

1

the coefficient matrix in (2.2) involving global and gene-specific design matrices (e.g. X

1 R X2 ) are

2

zero when R = Ie , but are not (exactly) zero for other R. Separating the MME (2.2) into n G + 1

(approximately) unrelated components greatly improves the efficiency of LMMA with joint estimation of

global and gene-specific variances.

3. R ESULTS

The gene-specific algorithm for the joint analysis is illustrated with a worked example using a small,

artificial data set and fixed and mixed models, which is presented as supplementary data available at

Biostatistics online. We also performed the joint analysis on a larger real data set and on data sets simulated with structure and parameters similar to those of the real data. The real data included 2640 cDNAs

printed in duplicate on 100 arrays, two cell populations and 10 biological replicates from each cell population [the analysis of this data set is described in detail by Pfister-Genskow et al. (2004)]. We analyzed

the real and simulated data with the ASReml software (Gilmour et al., 2002) and with our gene-specific

algorithm for the joint analysis and confirmed that the estimates of the variance components and the test

statistics were identical up to numerical accuracy [differences in the results between the joint analysis

and the original gene-specific analysis of Wolfinger et al. (2001) were reported in Pfister-Genskow et al.

(2004)].

4. D ISCUSSION

Joint analysis of the expression data on all genes in a microarray experiment has multiple advantages

over the common practice of analyzing the data on individual genes separately: (1) There is great flexibility on how the variancecovariance structure of the data is modeled, and different formulations can

be compared (homogeneous within-gene variance components, heterogeneous within-gene variance components estimated by shrinkage methods or by grouping of genes, etc.). (2) It allows us to evaluate the

significance of global treatment and technical factors and to evaluate gene-specific treatments contrasts

which include main effects (Black and Doerge, 2002). (3) As pointed out by Kerr (2003), joint analysis

also permits model evaluation and residual analysis. We note, however, that while for linear regression

models, there is well-established theory on residuals and influence, and outlier detection statistics or deletion diagnostics are commonly available in regression software packages, residual analysis and deletion

diagnostics are not yet performed routinely in mixed model analysis and are still underdeveloped for

mixed models despite recent progress (e.g. Haslett and Dillane, 2004).

We do not consider the joint and gene-specific analyses as alternative methods, but we view the genespecific analysis merely as an efficient algorithm for performing the joint analysis. We have shown that

the gene-specific algorithm provides (essentially) exact inferences for the joint LMMA under three conditions: (1) equal number of replicate spots or probes for any gene across all arrays (these numbers can

differ among genes), (2) homogeneous residual variance across genes (other variance components may be

heterogeneous) and (3) sufficiently large numbers of genes in the experiment (this condition is needed for

mixed but not for fixed models).

186

I. H OESCHELE AND H. L I

ACKNOWLEDGMENTS

This work was supported by NSF Plant Genome Cooperative Agreement DBI-0211863.

R EFERENCES

B LACK , M. A. AND D OERGE , R. W. (2002). Calculation of the minimum number of replicate spots required for

detection of significant gene expression fold change in microarray experiments. Bioinformatics 18, 16091616.

G ILMOUR , A. R., C ULLIS , B. R., W ELHAM , S. J. AND T HOMPSON , R. (2002). ASREML Reference Manual.

NSW, Australia: NSW Agriculture, Orange Agricultural Institute.

G OLDBERGER, A. S. (1962). Best linear unbiased prediction in the generalized linear regression model. Journal of

the American Statistical Association 57, 369375.

H ASLETT, J. AND D ILLANE , D. (2004). Application of delete = replace to deletion diagnostics for variance component estimation in the linear mixed model. Journal of the Royal Statistical Society, Series B 66, 131143.

H ENDERSON, C. R. (1963). Selection index and expected genetic advance. In Statistical Genetics and Plant Breeding

(NRC publication 982). Washington, DC: National Academy of Sciences, pp. 141163.

K ERR, M. K. (2003). Linear models for microarray data analysis: hidden similarities and differences. Journal of

Computational Biology 10, 891901.

K ERR , M. K.

183201.

AND

P FISTER -G ENSKOW, M., C HILDS , L., M YERS , C., L ACSON , J., B ETTHAUSER , J., G OULEKE , P., Z HENG , Y.,

L ENO , G., F ORSBERG , E., YANG , X., H OESCHELE , I. AND E ILERTSEN , K. J. (2004). Analysis of individual

pre-implantation cattle embryos using cDNA microarrays: comparison of NT and IVF produced embryos using

an interwoven loop design and linear mixed model analysis. Biology of Reproduction 72. In press.

R ENCHER, A. C. (2000) Linear Models in Statistics. New York: John Wiley & Sons.

W OLFINGER , R. D., G IBSON , G., W OLFINGER , E. D., B ENNETT, L., H AMADEH , H., B USHEL , P., A FSHARI , C.

AND PAULES , R. S. (2001). Assessing gene significance from cDNA microarray expression data via mixed

models. Journal of Computational Biology 8, 625637.

W U , H., K ERR , M. K., C UI , X. Q. AND C HURCHILL , G. A. (2003). MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments. The Analysis of Gene Expression Data: Methods and Software.

New York: Springer. http://www.jax.org/research/churchill/.

[Received February 18, 2004; first revision June 17, 2004; second revision September 14, 2004;

accepted for publication October 11, 2004]

- Java ProgramsUploaded byManoj Mahajan
- EconometricsUploaded byOladipupo Mayowa Paul
- 4. Searching for the Causal Structure of a VARUploaded byAltakeen
- SSRN-id2357065Uploaded bySok Heng
- Bio InfoUploaded byKeny Orlando M. Cano
- 513Uploaded bytsas9508
- Forecasting Examples for Business and Economics Using TheUploaded bynishu_amin
- Tex Refcard LetterUploaded byathanhcong
- meca_2000Uploaded bymariomato
- Reference Card MATLABUploaded byWesley Brown
- 1054803Uploaded byWidiarsih Widia
- Lab 3 Matlab Array & Matrix OperationsUploaded byMatt Imri
- CatalinprezUploaded byMuhammad Iqball
- The_effectUploaded byLisa Yusoff
- a-study-of-english-reading-ability-based-on-multiple-linear-regression-analysis.pdfUploaded byWardiman M Sakir
- LMnotes_04Uploaded byHendri Prabowo
- Regulation and the Macroeconomy - Dawson 2002.pdfUploaded byTangthietgiap
- ch-3,4Uploaded byabcdef
- 04_03_ClementsUploaded byDominik Stroukal
- Volatility Transmission Between Exchange Rates and Stock PricesUploaded byikutmilis
- US Federal Reserve: 200621papUploaded byThe Fed
- Thermodynamic Consistency of Data Bank FPE,110,1995,89-113Uploaded byBrando Martinez Hernandez
- Stability MainUploaded byEric Djemba Djemba
- Automatic Face Naming by Learning DiscriminativeUploaded byshalini
- Cp AssignmentUploaded byviki_05432
- Question a IreUploaded byYash Kapoor
- Research Report on TransportationUploaded byInnocent Saadi
- Matrix MultiplicationUploaded byNitant Panchal
- Fire Seat 2010 Pp 037 SunUploaded byMariano Serrano
- ConvDif2DUploaded byIsmaelGutierrez

- [Geraldine Van Bueren] Article 40 Child CriminalUploaded bydicoursfigure
- Elc Lit CategoriesUploaded byhp_blacklight7402
- MasterThesisAd_MachineLearning_AgroscopeUploaded byhp_blacklight7402
- Scribd DownloaderUploaded byhp_blacklight7402
- Google SpannerUploaded bySaroj Madhu
- Alt R., Vignes J.-10th GAMM - IMACS International Symposium on Scientific Computing, Computer Arithmetic, And Validated Numerics (2002)Uploaded byhp_blacklight7402
- Key For VirtualBox.txtUploaded byhp_blacklight7402
- (Journal of Strength & Conditioning Research) Schoenfeld BJ, Peterson MD, Ogborn D, Contreras B, Sonmez GT-[Article] Effects of Low- Versus High-Load Resistance Training on Muscle Strength and HypertrUploaded byhp_blacklight7402
- University of Kent Term Dates 2015-2016Uploaded byhp_blacklight7402
- PeruUploaded byhp_blacklight7402
- PortlanD Sacram Ento OaklanD San Francisco LosUploaded byhp_blacklight7402
- Uni Assist EnglischUploaded byhp_blacklight7402
- 5 James Maxwell PapersUploaded byhp_blacklight7402
- 12065masdfasdaUploaded byhp_blacklight7402
- Info Graph i e Investor DayUploaded byhp_blacklight7402
- Grounding and Shielding CompaudioUploaded byRadu Taşcă
- Theresa M. Vitolo, Aaron J. Sparks-Building a Paperless Service_ Making the Internship ConnectionUploaded byhp_blacklight7402
- [Mieke Verheyde; Geert Goedertier] CommentaryUploaded bydicoursfigure
- How to Beat the Computer at Chess [Gordon Smyth, 1995]Uploaded byalabi-alaba
- Errata to Nikravesh Computer-Aided Analysis of Mechanical SystemsUploaded bychanchai T
- Lucia M. a Blowing-up Branch of Solutions for a Mean Field EquationUploaded byhp_blacklight7402
- (Working Papers in Linguistics_Dept. of Linguistics. University of Washington) Noam Chomsky-Interview With Noam Chomsky-Dept. of Linguistics, University of Washington (1978)Uploaded byfreemgr
- Institute of Electrical and Electronics Engineers-2006 IEEE International Solid-state Circuits Conference Digest of Technical Papers-Ieee (2007)Uploaded byhp_blacklight7402
- Benjamin Nathaniel Bogue-Stammering_ Its Cause and Cure (Dodo Press) -Dodo Press (2009)Uploaded byhp_blacklight7402
- HelloUploaded byNgocEmEmky
- CPU dialogueUploaded byhp_blacklight7402
- Harm Rul Suf Dom7Uploaded byJuan María Jáuregui Navarro
- The Illuminati Papers by Robert Anton WilsonUploaded byMattia Strinati
- Estimating Eligibility and Participation for the WIC ProgramUploaded byhp_blacklight7402

- ASIC Interview Question & Answer_ ASIC VerificationUploaded byprodip7
- ANALYSIS OF FIBER REINFORCED PLASTIC NEEDLE GATE FOR K.T. WEIRSUploaded byIJIRAE- International Journal of Innovative Research in Advanced Engineering
- Report LanYardUploaded byAnisa Rifqi
- A Cyberpunk ManifestoUploaded byevanLe
- Mandala of Sacred ActivismUploaded bysbartone
- Cascading Style Sheets for Web DesignUploaded bymayomayomayo3068
- AES Discussion SheetUploaded byAjmal Salim
- Antisocial Behavior and HypnosisUploaded byMansonCaseFile
- Red_Hat_Enterprise_Linux-6-Logical_Volume_Manager_Administration-en-US.pdfUploaded bychakrikolla
- Plain Truth 1964 (Vol XXIX No 04) Apr_wUploaded byTheTruthRestored
- The Physical and Chemical Properties of Interstellar Dust and Dust in CometsUploaded byAzam-Savaşçı Anderson Mohammad
- Consumer Behavior TIVO 2Uploaded bysandeeprao83
- DBMSUploaded bykalyanram19858017
- c12tgUploaded byapi-285146233
- Sample Cv IdeaUploaded byAhmed Mamdouh
- 75126579-Lecture-6-The-Engineer-s-Transit-and-Theodolite.pdfUploaded byBryle James Nanglegan
- IdiomUploaded byBishwajeet Jha
- National Programme on School Standards and Evaluation_NEUPA_2015Uploaded byayushi bhatt
- Data Conversion MQUploaded bysuryo.gumilar
- Chapter 1.docxUploaded byAnonymous OeKGaIN
- 4L60-EUploaded bypavliano
- BCOM_Chp_14.pptUploaded byMuhaned Ramadan
- Kameleon K8 ManualUploaded byrglabbe
- Paper BontJ - Ports Seminar Delft 2010, Paper Jan de BontUploaded byAlexey Ozorishin
- Ortorectificacion De imagenes sentinelUploaded byIgor Davalos
- 978-1-63057-094-1-3Uploaded bygouravbhatia200189
- combination fertilizerUploaded bypakde jongko
- part 2 criteriaUploaded byapi-252776499
- Lewis Sykes - The Augmented Tonoscope - Literature ReviewUploaded byLewis Sykes
- PATH TO BLESSEDNESSUploaded bylawhater