Professional Documents
Culture Documents
Significance level
the 500,000 SNPs tested across the genome. SNP IRGM
10
locations reflect their positions across the 23 human IBD5 NKX2-3
PTPN2
chromosomes. SNPs with significance levels exceed- 3p21
10q21
ing 10−5 (corresponding to 5 on the y axis) are col-
5
ored red; the remaining SNPs are in blue. Ten regions
with multiple significant SNPs are shown, labeled by
their location or by the likely disease-related gene
(e.g., IL23R on chromosome 1). (B) The fact that the 0
1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19 21 X
SNPs in red are extreme outliers is made clear from
Chromosome 14 16 18 20 22
a so-called Q-Q plot. A Q-Q plot is made as fol-
lows: The SNPs are ordered (from 1 to n) according
to their observed P values; observed and expected
P values are plotted for each SNP. Under the null B
30
distribution, the expected P value for the ith SNP is 25
i/n. If there are no significant associations, the Q-Q
Observed value
All SNPs
plot will lie along the 45° line; the gray region 20
corresponds to a 95% confidence region around
15
this null expectation. Black points correspond to all
500,000 SNPs studied that passed strict quality con- 10
trol; they diverge strongly from the null expectation. Confidence interval
Blue points reflect the P values that remain when the 5 around null distribution
SNPs in the 10 most significant regions are removed;
0
there is still some excess of significant P values, indi-
cating the presence of additional loci of more modest 0 5 10 15 20
effect. (C) Close-up of the region around the IL23R Expected chi-squared value
locus on chromosome 1. The first part shows the sig-
nificance levels for SNPs in a region of ~400 kb, with
colors as in (A). The highest significance level occurs Chromosome 1
C
at a SNP in the coding region of the IL23R gene
(causing an Arg381 → Gln change). The light blue curve
shows the inferred local rate of recombination across
the region. There are two clear hotspots of recombi-
nation, with SNPs lying between these hotspots being 16 rs11209032
strongly correlated in a few haplotypes. The second 14 (Arg381Gln) 60
level (–log10)
Significance
Recombination
12
rate (cM/Mb)
independent, highly significant disease-associated 10
40
alleles. The first site is the Arg381 → Gln polymorphism, 8
6
which has a single disease-associated haplotype 4
(shaded in blue) with frequency of 6.7%. The second 2 20
site is in the intron between exons 7 and 8; it tags 0
two disease-associated haplotypes with frequencies 0
of 27.5% and 19.2%. C1orf141 IL23R IL12RB2 SERBP1
IL23R
Intronic Arg381Gln
(p = 1.0 × 10–15) (p = 6.6 × 10–19)
GGCTTACTGC .433 GCTAAACGGGAGCCC .308
A AT G TAC T T C .275 TCTAGCTGGAGCCCA .275
GGCTCCTCT T .192 GCTAAATGGAGCTAC .125
GGCTTATTGC .050 GCTGGACAGAGCTCC .117
GTTAGATGGAGCCCC .075
TCGAGCTGAAACCCC .067
outcome being that different diseases will each be iants at very low frequency, and complex inter- suggest strategies for prevention, diagnosis, and ther-
characterized by a different balance of allele fre- actions among genes and with the environment. apy. From this perspective, the frequency of a genetic
quencies, interactions, and types. Although the variant is not related to the magnitude of its effect, nor
proportion of genetic variance explained is certain Disease Risk Versus Disease Mechanism to the potential clinical value that may be obtained.
to grow in the coming years, it is unlikely to ap- The primary value of genetic mapping is not risk The classic example is Brown and Goldstein’s
proach 100% because of practical limitations, such prediction, but providing novel insights about mech- studies of FH, which affects ~0.2% of the population
as the difficulty of detecting common variants with anisms of disease. Knowledge of disease pathways and accounts for a tiny fraction of the heritability of
extremely small effects, genes harboring rare var- (not limited to the causal genes and mutations) can LDL and myocardial infarction. Studies of FH led