You are on page 1of 5

Update TRENDS in Genetics Vol.21 No.

7 July 2005 377

Genetical genomics in humans and model organisms

Dirk-Jan de Koning and Chris S. Haley
The Roslin Institute, Roslin, Midlothian, UK, EH25 9PS

Genetical genomics has been proposed to map loci same bin as the transcript it influences, otherwise it is
controlling gene-expression differences (eQTLs) that termed trans acting.
might underlie functional trait variation. We briefly
review the studies in model species and conclude that, Differences in microarray platform and their effect on
although they successfully demonstrate the utility of eQTL studies
genetical genomics, they are too limited to unlock the Differences in performance between microarray platforms
full potential of this approach and some results should have been discussed in detail elsewhere [10]. Because
be interpreted with caution. We subsequently elaborate genetical genomics combines sequence polymorphisms
on two recent studies that use this approach in humans. with variation in expression levels, it is important to
The many differences between these studies complicate establish how robust the RNA measurement is against
meaningful comparisons between them. A joint analysis sequence variation [e.g. single nucleotide polymorphisms
of the two experiments offers some scope for more (SNPs)] in the transcript. The robustness of Affymetrix
powerful genetical genomics. chips ( against spurious cis-
effects resulting from SNPs in the transcripts has been
evaluated by re-sequencing some of the genes with cis-
effects in rats [6] and by using available SNP data in mice
[5]. Both studies concluded that the effect of SNP variation
Genetical genomics describes the combined study of
on the detection of cis-acting eQTLs was limited. An
gene expression and marker genotypes in a segregat-
alternative approach for Affymetrix chips would be to
ing population [1,2]. It aims to detect the genomic loci
study probe–eQTL interactions for cis-acting eQTL
that control gene-expression differences, these loci are
because Affymetrix chips use multiple probes to inter-
referred to as expression quantitative trait loci
rogate each transcript (Ritsert Jansen, personal com-
(eQTLs; see Glossary).
munication). Agilent 60-mer oligonucleotide arrays were
To date, most of these studies have used model species
shown to be robust against four SNPs or less in the probe
such as mice [3–5], maize [3], rats [6] and yeast [7,8]. The
region [11].
experimental designs include recombinant inbred lines
(RI; in rodents) [4–6], F2 or F3 crosses (in mice and maize)
[3] and haploid lines (in yeast) [7–9]. The common feature Major hubs of genes regulation: fact or artefact?
of these designs is that, compared with ‘traditional’ A common feature of eQTL studies is the detection of
phenotype-based QTL experiments, the sizes of the exper- ‘hotspots’ or hubs of trans-acting eQTL: chromosomal
iments are modest to small. We have compared the regions that affect the expression of a much larger number
statistical power to detect different QTL effects among the
different eQTLs studies to date and comment on potential Glossary
shortcomings (Box 1). The limited size of experiments can be Bonferroni correction: a statistical adjustment for multiple comparisons. The
attributed to the expense of gene-expression analyses. Bonferroni correction is simple: if a number (n) of outcomes are being tested
instead of a single outcome, the desired threshold level (P) is divided by n.
However, this should encourage collaborative efforts to
False discovery rate: the proportion of false-positive test results among all
perform more powerful eQTL studies rather than multiple significant tests (note that the FDR is conceptually different to the significance
studies that each lack sufficient power. level).
Haploid line: a line that is derived by crossing two strains and subsequently
manipulating the F1 gametes to develop into fully homozygous individuals.
Heritability: a statistic that estimates the proportion of variation in a trait that is
Cis and trans eQTL attributable to genetic factors.
eQTL can be classified as cis or trans acting based on the Phenotypic standard deviations: a statistic that describes the dispersion of data
location of the transcript compared with that of the eQTL about the mean.
Quantitative trait locus: genetic loci or chromosomal regions that contribute to
influencing the expression of that transcript. There is variability in complex quantitative traits, as identified by statistical analysis.
variation between studies in exactly how cis and trans are Quantitative traits are typically affected by several genes and by the
defined, but generally the genome is divided into segments environment.
Recombinant inbred lines: a strain that is formed by crossing two strains,
(bins; to allow for inherent inaccuracy in the mapping of followed by 20 or more consecutive generations of brother–sister mating or
eQTL) based on physical or mapping distance {e.g. 20kb in selfing. The resulting lines are homozygous (and therefore fixed) at each locus,
yeast [7], 5MB [4,5] or 2 cM (w3.6 MB) in mice [3] and enabling repeated replicates of genetically homogeneous lines to be assayed.
Statistical power: a statistic that describes how effective a given experiment is
20 MB in rats [6]}. A QTL is cis acting if it is located in the to detect a certain effect. Statistical power is expressed as the proportion of
tests that are expected to be significant given a certain experiment and a certain
Corresponding author: de Koning, D.-J. ( effect.
Available online 23 May 2005
378 Update TRENDS in Genetics Vol.21 No.7 July 2005

Box 1. The power of eQTL studies to date

Table I summarizes the statistical power to detect QTL for some eQTL stringent for a single trait, but fairly liberal overall, considering that
studies to date and compares these with hypothetical F2 designs that eQTL studies commonly examine the expression levels of thousands
are commonly encountered in QTL detection. For example, an eQTL of genes. This is a major issue in genetical genomics because it uses
with a Heritability of 0.03 (i.e. the eQTL explains 3% of the variation in multiple testing in two dimensions: hundreds of markers are tested for
RNA abundance among the F2 mice) would be detected in 7% of the their putative effect on O10 000 gene transcripts. Traditional
experiments performed with 111 F2 mice [3] and 16% of the approaches, such as the Bonferroni correction, that limit the discovery
experiments with 86 haploid yeast lines [8]. of spurious effects by increasing the stringency on the statistical
Although the experiment using 112 haploid yeast lines [9] is the significance threshold are demanding as the thresholds become
most powerful of all the studies, most studies have limited power to prohibitive for the detection of all but the most extreme effects.
detect any QTL with an effect !0.5 phenotypic standard deviations Alternatives such as the false discovery rate have been proposed for
(SD; equivalent to a QTL heritability of 0.13). As a result, the studies fail genome scans and gene-expression studies [15], and an overview of
to detect many loci with moderate effects on gene regulation and are multiple testing issues and alternatives in genetics was recently
also expected to miss some loci with major effects. The statistical presented by Manly et al. [16].
threshold that we have used for the power calculations is reasonably

Table I. A comparison of statistical power to detect QTL in eQTL studies

Refs Population Na Statistical power for different QTL effectsb
QTL effect (phenotypic SD)c 0.25 0.40 0.5 0.6 0.75
QTL heritability in F2 (variance explained)d 0.03 0.08 0.13 0.18 0.28
Brem et al. [7] Haploid yeast 40 0.05 0.2 0.51 0.73 0.99
Yvert et al. [8] Haploid yeast 86 0.16 0.67 0.94 0.99 0.99
Brem and Kruglyak [9] Haploid yeast 112 0.25 0.84 0.99 0.99 0.99
Schadt et al. [3] F2 mice 111 0.07 0.37 0.68 0.90 0.99
Schadt et al. [3] F3 maize 76 0.04 0.19 0.41 0.67 0.94
Chesler et al. (mice); Recombinant inbred linese 33 0.05 0.29 0.62 0.91 0.99
Bystryk et al. (mice);
Hubner et al. (rats) [4–6]
Hypothetical F2 200 0.21 0.77 0.96 0.99 0.99
Hypothetical F2 400 0.60 0.99 0.99 0.99 0.99
Number of individuals with expression data.
The probability of detecting as significant a QTL using a point-wise significance threshold of P!0.001, which corresponds to a LOD score of 3.0 for an F2 design (slightly
more stringent than the proposed threshold for suggestive linkage but much less stringent than the threshold for significant linkage [17]). The power calculations account
for different experimental designs but not for different genome length between species (the greater number of independent tests performed in a larger genome requires a
more stringent significance threshold).
Additive effect of the QTL (half of the difference between homozygotes) expressed in units of the phenotypic standard deviation.
The proportion of the total variation in the population explained by the QTL, assuming an F2 population where the QTL allele frequencies are both 0.5. In an RI or haploid
system, the heritability of the QTL is twice the magnitude in an F2.
Assuming a repeatability of 0.50 for gene transcripts and three replicates for every recombinant inbred (RI) line.

of genes than expected by chance. These major hubs of studies can be interpreted to support the hypothesis of
gene regulation are most prominent in yeast (eight) [7,8], coordinated trans-regulation of multiple genes, a major
followed by mice (approximately seven) [3–5]. Clustering concern is whether the correlation could be due to some
of eQTL was not reported for maize [3]. The locations of technical or environmental factors that are currently
the trans-acting eQTL show limited overlap among the unaccounted for. For example, the clustering of eQTL for
three mouse eQTL studies [3–5], which could be due to multiple traits could simply represent the clustering of
tissue-specific trans regulation. Although the most sig- spurious QTL for highly correlated traits (i.e. with so
nificant eQTL are cis-acting, the detection of trans-acting many traits we expect to see many false-positive QTL
regulatory hubs is plausible if cis-regulation provides effects, and if traits are highly correlated, for whatever
more direct (i.e. less variable) genetic control than trans reason, these false-positive QTLs will often locate to the
regulation, ensuring that cis-acting effects are larger and same region). Because of the limited understanding of
more consistent. Alternatively, it could be that the genetic and physiological control of gene expression and
proportion of false positive eQTL is greater among trans- the limited experimental sizes so far, any conclusions with
acting effects. regard to hotspots for gene regulation should be inter-
The strong clustering in ‘hubs’ of eQTLs reflects the preted with caution.
highly correlated expression levels of many gene tran-
scripts. This is illustrated by a recent simulation study eQTL studies in human cell lines
using real expression data from human pedigrees with a Although the genetic complexity of most eQTL studies is
simulated SNP map that was independent of the limited because of the use of inbred resources, two
expression levels [12]. As a result, all eQTLs detected recent studies report eQTL in analyses of cell lines
were by default false positives. The eQTL analyses showed derived from human pedigrees [13,14]. These initial
strong clustering of (trans) eQTLs and the five most studies both used lymphoblastoid cell lines from the
populated bins contained 20% of the significant, but CEPH pedigrees ( but other-
spurious, eQTLs [12]. Thus, although both the high wise have differences at almost every level of execution
correlation of expression levels among gene transcripts (Table 1). Many of the differences between the two studies
and the detection of eQTL hotspots in experimental are not unique to genetical genomics: discrepancies
Update TRENDS in Genetics Vol.21 No.7 July 2005 379

Table 1. A comparison between two eQTL analyses on human CEPH dataa

Morley et al. [14] Monks et al. [13]
CEPH families used 14 (eight in common) 15 (eight in common)
Gene expression
Platform Affymetrix genome focus 25-mer Agilent 60-mer oligonucleotide array
oligonucleotide arrays
Genes on array w8500 23 499
Design and replicates Direct measurement with two array replicates Reference design with at least two arrays per
per individual individual
Criterion for selecting genes for eQTL Greater variation between individuals than Differentially expressed in at least half of the
analysis within offspring
Genes taken forward to eQTL analysis 3554 2430
Marker genotypes 2756 autosomal SNP markers from the SNP 346 autosomal markers, selected from CEPH
consortium database genotype database
Data availability Genotypes available at http://www.ceph/fr/ Genotypes available at http://www.ceph/fr/
cephdb cephdb
Expression data at http://www.ncbi.nlm.nih. Expression data at http://www.ncbi.nlm.nih.
gov/geo/ (GEO accession GSE1485) gov/geo/ (GEO accession GSE1726)
eQTL analyses (i) Sib-pair analyses using S.A.G.E. for whole Variance component analyses using SOLAR
genome analysis for both heritability of transcript level and
(ii) QTDT and association study for 17 genes
with cis-acting eQTL
Test for hubs of gene regulation 5 MB genome bins, testing for deviation from At 4 cM (w3.2 MB) intervals comparing
poisson distribution number of hits with those obtained by
eQTL results 142 genes with at least one eQTL 33 genes with at least one eQTL
(P!4.3!10K7) (P!5.0!10K6)
984 genes with at least one eQTL 50 genes with at least one eQTL
(P!3.7!10K5) (P!5.0!10K5)
135 genes with at least one eQTL
Hubs of gene regulation Two hotspots on chromosomes 14 and 20 Six locations with five or six linkage hits on
affecting seven and six genes, respectively chromosome 6; according to the authors,
(using P!4.3!10K7) or 31 and 35 genes, these are attributable to allelic diversity and
respectively (using PZ3.7!10K5)b non-specificity of gene probes and were
therefore dismissed
Other analyses Hierarchical clustering of genes within 5 MB Test for enrichment of certain annotations
window on chromosome 14 among differentially expressed genes
RT–PCR for one gene with a large cis effect 574 genes with non-zero heritability; these
were subsequently clustered using genetic
or phenotypic correlations
Abbreviations: GEO, gene expression omnibus.
A different number of genes are affected by the eQTL, depending on the P value used.

between gene-expression platforms, different statistical are theoretically slightly more powerful than sib-pair
methods and protocols are common obstacles when analyses, because they use all of the genetic relationships
comparing different microarray studies. Although the within the pedigree. However, the power difference does
studies overlap for about half (eight) of the CEPH families not explain the marked difference in numbers of QTL
studied, they use different genetic marker sets and detected by the two studies. The greater number of eQTLs
different methods for expression analysis and eQTL for the Morley et al. study could be due to several factors
analysis. Furthermore, they use different criteria for including: (i) less technical noise in gene-expression
including genes in their eQTL analysis and apply different measurements, resulting in a larger proportion of the
thresholds for QTL detection (Table 1). The results variance attributable to the QTL effect; (ii) environmental
between the two studies are also remarkably different: conditions that promote greater genetically controlled
Morley et al. take w42% of the genes (nZ3554) on their variation in expression; or (iii) less robust gene-expression
arrays forward to eQTL analysis, whereas Monks et al. measurements or analyses, making the results more
use only w10% (nZ2430; Table 1). At comparable prone to bias and false positive results. Given the low
significance levels (3.7!10K5 and 5.0!10K5, respect- power of both studies to detect eQTLs under the stringent
ively), Morley et al. report eQTL for w28% of the genes thresholds that they apply, the results of Monks et al. are
that were taken forward to eQTL analysis compared with more consistent with prior expectation, unless eQTL
w2% for Monks et al. (Table 1). Figure 1 shows the effects are much stronger than those of phenotypic QTL.
theoretical power for detection of QTL for the two studies Although the low theoretical power does not explain why
using the two methods of QTL analysis. The QTL methods Morley et al. detect more QTL than Monks et al., it would
are briefly explained in Box 2. For the sib-pair analyses, explain differences in genes for which eQTL are detected,
both studies had similar power. The power calculations in addition to discrepancies in finding eQTL in different
confirm that variance component methods such as locations for a particular transcript. When both studies
sequential oligogenic linkage analysis routines (SOLAR) have limited power to detect a given QTL, they will each
380 Update TRENDS in Genetics Vol.21 No.7 July 2005

Power of eQTL studies in human pedigrees


Power to detect eQTL

VCA (Morley et al.)
0.6 VCA (Monks et al.)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
QTL heritability
TRENDS in Genetics

Figure 1. The statistical power to detect the eQTL of given heritability for the two studies using either a sib-pair analysis or a variance component analysis (VCA). Using sib-
pair analyses (red), both studies had similar power; therefore, only a single line is shown. The statistical power is defined as the proportion of analyses in which a QTL with a
given effect will be detected under a defined P value (in this case P!0.0001, which is still less stringent than the proposed genome-wide threshold [17]) The power for the sib-
pair analyses was assessed using the genetic power calculator [20] ( The power for the VCA (pink and blue) was assessed using routines that
were kindly provided by Xijiang Yu (University of Edinburgh) based on Williams and Blangero [21], using the CEPH pedigrees. For all power calculations, the background
heritability was assumed to be 0.30. To restrict the pedigree from the original 210 members to the 167 that were used by Monks et al., 43 individuals were randomly deleted
from the power calculations. For a brief explanation of QTL methods, see Box 2.

only detect a small proportion of actual eQTL and are analyses on the grandparent data, giving a more realistic
hence unlikely to detect the same effects. estimate of the actual QTL effect. This provides a solution
Both studies agree that the most significant QTL to the problem that when QTLs are initially detected in a
appear to be cis-acting, whereas the proportion of cis study with low power, the effects of those that are detected
acting eQTL is smaller in Morley et al. (w22%) than in can be grossly overestimated. This overestimation of QTL
Monks et al. (w40%) for the most stringent significance effects is apparent in the article by Monks et al., who
levels. However, although Morley et al. claim support for report genes with two, three and even one gene with 15
two trans-acting hubs of regulation on chromosomes 14 eQTLs. Subsequently, they claim that ‘all detectable QTL
and 20, Monks et al. claim ‘lack of evidence for linkage accounted for at least 50% of the trait variance with 75% of
hotspots’, although their permutations show that eQTL the QTL having heritabilities O0.76’. This illustrates the
are significantly ‘unevenly distributed’. However, Monks level at which QTL effects are overestimated: it is
et al. make their statement based on the eQTL with P! impossible to have 15 eQTL, each explaining 50% of the
0.000005, whereas Morley et al. use P!0.000037 (7.4 trait variance. This phenomenon is not unique to eQTL,
times larger) to claim the larger hubs. Therefore, the but it illustrates the issue particularly well.
difference in threshold, and the difference in genes that Morley et al. confirm one of their cis-acting eQTL by
were analysed, could explain this discrepancy. quantitative PCR, which would seem to allay concerns about
An interesting aspect of the Morley et al. article is the SNP variation in the probe. Only a single gene was
follow-up analyses on cis-acting QTL: they perform a confirmed, therefore, no general conclusion can be drawn
within family association test [quantitative transmission from this result. Monks et al. discuss the potential problem
disequilibrium test (QTDT); Box 2] with additional SNP of SNP variation with the probe sequence and subsequently
markers for 17 transcripts. Furthermore, they re-estimate question their own results for the human leukocyte antigen
the magnitude of these QTL effects by a regression (HLA) area, which harbours substantial sequence variation.

Box 2. QTL methods used in the eQTL analyses of human data

Sib-pair analysis is estimated across a population using the IBD proportions between all
Morley et al. [14] applied a sib-pair analysis using the SIBPAL related individuals for a putative QTL location.
procedure from S.A.G.E ( A
sib-pair analysis determines evidence for linkage between a marker Quantitative transmission disequilibrium test (QTDT)
and a quantitative trait by regressing the phenotypic difference Morley et al. [13] used a family-based association test to confirm some
between sibs on the proportion of alleles that are shared identical by of the cis-acting eQTL. Transmission disequilibrium tests (TDT) were
descent (IBD) between the sibs. initially proposed for studying mendelian disorders and provide a
combined test for linkage and association by comparing the
Variance component QTL analysis transmitted and non-transmitted marker alleles from the parents
Monks et al. [13] applied a variance component QTL analysis using with those of the affected offspring. The quantitative TDT (QTDT), used
SOLAR ( [18]. In a variance component QTL by Morley et al., extended this methodology to complex traits where
analysis, the proportion of phenotypic variation attributable to a QTL direct classification of offspring is not possible [19].
Update TRENDS in Genetics Vol.21 No.7 July 2005 381

Concluding remarks 6 Hubner, N. et al. (2005) Integrated transcriptional profiling and

Both articles present an interesting set of results but only linkage analysis for identification of genes underlying disease. Nat.
Genet. 37, 243–253
appear to share a limited theoretical power to detect eQTL
7 Brem, R.B. et al. (2002) Genetic dissection of transcriptional
of small to moderate sizes. A first step to compare both regulation in budding yeast. Science 296, 752–755
studies would be to analyse the experiment in the first 8 Yvert, G. et al. (2003) Trans-acting regulatory variation in Saccharomyces
study with the methods that were applied in the second cerevisiae and the role of transcription factors. Nat. Genet. 35, 57–64
study (i.e. re-analyse the data from Morley et al. with 9 Brem, R.B. and Kruglyak, L. (2005) The landscape of genetic
SOLAR and the data from Monks et al. with a sib-pair complexity across 5700 gene expression traits in yeast. Proc. Natl.
Acad. Sci. U. S. A. 102, 1572–1577
analysis). Given that the pedigree details, genotype and
10 Tan, P.K. et al. (2003) Evaluation of gene expression measurements from
gene-expression data for both studies are available online commercial microarray platforms. Nucleic Acids Res. 31, 5676–5684
(Table 1), ongoing exploration of these data sets is 11 Hughes, T.R. et al. (2001) Expression profiling using microarrays
expected to shed further light on the differences and fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol.
simalarities between the two studies. 19, 342–347
eQTL studies have been successfully linked to variation 12 Perez-Enciso, M. (2004) In silico study of transcriptome genetic
variation in outbred populations. Genetics 166, 547–554
in disease phenotype in mice [3] and rats [6]. Although the
13 Monks, S.A. et al. (2004) Genetic inheritance of gene expression in
current examples of eQTL mapping in humans lack this human cell lines. Am. J. Hum. Genet. 75, 1094–1105
important aspect (and motivation) of eQTL mapping, these 14 Morley, M. et al. (2004) Genetic analysis of genome-wide variation in
authors might have paved the way for future eQTL studies human gene expression. Nature 430, 743–747
that will address the complex nature of human disease. 15 Storey, J.D. and Tibshirani, R. (2003) Statistical significance for
genomewide studies. Proc. Natl. Acad. Sci. U. S. A. 100, 9440–9445
16 Manly, K.F. et al. (2004) Genomics, prior probability, and statistical
tests of multiple hypotheses. Genome Res. 14, 997–1001
We acknowledge financial support from the BBSRC. We are grateful to the
two referees, and to John Gibson, Ritsert Jansen and Rob Williams for 17 Lander, E. and Kruglyak, L. (1995) Genetic dissection of complex
constructive comments on an earlier draft of this article. We also thank Ritsert traits: guidelines for interpreting and reporting linkage results. Nat.
Jansen and Rob Williams for sharing their manuscripts on BXD data. Genet. 11, 241–247
18 Almasy, L. and Blangero, J. (1998) Multipoint quantitative-trait linkage
analysis in general pedigrees. Am. J. Hum. Genet. 62, 1198–1211
References 19 Abecasis, G.R. et al. (2000) A general test of association for quantitative
1 Jansen, R.C. and Nap, J.P. (2001) Genetical genomics: the added value
traits in nuclear families. Am. J. Hum. Genet. 66, 279–292
from segregation. Trends Genet. 17, 388–391
20 Purcell, S. et al. (2003) Genetic power calculator: design of linkage and
2 Jansen, R.C. (2003) Studying complex biological systems using
association genetic mapping studies of complex traits. Bioinformatics
multifactorial perturbation. Nat. Rev. Genet. 4, 145–151
19, 149–150
3 Schadt, E.E. et al. (2003) Genetics of gene expression surveyed in
21 Williams, J.T. and Blangero, J. (1999) Power of variance component
maize, mouse and man. Nature 422, 297–302
linkage analysis to detect quantitative trait loci. Ann. Hum. Genet. 63,
4 Bystrykh, L. et al. (2005) Uncovering regulatory pathways that affect
hematopoietic stem cell function using ‘genetical genomics’. Nat. 545–563
Genet. 37, 225–232
5 Chesler, E.J. et al. (2005) Complex trait analysis of gene expression
uncovers polygenic and pleiotropic networks that modulate nervous 0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved.
system function. Nat. Genet. 37, 233–242 doi:10.1016/j.tig.2005.05.004

Genome Analysis

A highly unexpected strong correlation between

fixation probability of nonsynonymous mutations and
mutation rate
Gerald J. Wyckoff1,4,*, Christine M. Malcom1,2,*, Eric J. Vallender1,3,* and
Bruce T. Lahn1
Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
Department of Anthropology, University of Chicago, Chicago, IL 60637, USA
Committee on Genetics, University of Chicago, Chicago, IL 60637, USA
Department of Molecular Biology and Biochemistry, University of Missouri-Kansas City, Kansas City, MO 64108, USA

Under prevailing theories, the nonsynonymous-to- mutations, is correlated with the strength of selection.
synonymous substitution ratio (i.e. Ka/Ks ), which In this article, we report that Ka/Ks is also strongly
measures the fixation probability of nonsynonymous correlated with the mutation rate as measured by Ks,
and that this correlation appears to have a similar
Corresponding author: Lahn, B.T. (
* These authors contributed equally to this work.
magnitude as the correlation between Ka/Ks and
selective strength. This finding cannot be reconciled