You are on page 1of 128

QTL

22:48
22:48
QTL
QTL

QTL

QTL QTLs
22:48



MASMAI


22:48
QTL
QTL


QTL

QTLQTL
LD
22:48
QTL
22:48
22:48
22:48
LA
22:48
linkage analysis
only considers the linkage disequilibrium
that exists within families, which can
extend for 10s of cM, and is broken down
by recombination after only a few
generations.
Such as BC and F2 design

22:48

22:48

QTL
m d a r
Aa
QTLQq
QTL

) Pr( Aa Qq
) Pr(QqAa
) Pr( Aa
22:48

QTL

22:48

QTL
QTL
QTL


22:48

Lander and Botstein (1989)
QTL
QTL



22:48
Haldane



2
1
2M
e
r

=
M
cM M 100 1 =
22:48

Q
T
L



22:48

QTL
QTL


ij
y
i j
j
m
j d m+
a m
ij
e
) , 0 ( ~
2
o N e
ij
) , ( ~
2
o
j ij
m N y
22:48


QTL
AB

j
Q
i
i
A
N
i
B
22:48
LRT
QTL
QTL
reduced
L
full
L
LOD
22:48


QTL




RMS MSQ F =
MSQQTL
RMS

22:48
LSML
LS
ML

LS
SASML

22:48
F
QTLLS
ML





QTLLSML

22:48



QTL
QTL

F-ratioLRT
QTL
QTLQTL
22:48
22:48

false positive)
1 (1 )
n
o =
22:48

1
1 (1 )
n
n | o o = ~
22:48
Permutation test



QTL
LRT
QTL
22:48
Permutation test
22:48
FDR(false discovery rate)
is declared FDR (such as 0.05)
j is the largest order that met formula (1)
m is the number of marker
(1)
j
j
P
m
o
<=
22:48
FDR(false discovery rate)

Sort p values of all marker interval based on
ascending order

22:48
LODLOD drop
support interval
QTLQTL

QTL
QTL
QTL

1

QTL

d
d
2
_
22:48

95%QTL
3.84

1 LOD97%QTL

2 LOD99.8%QTL
22:48
22:48
Bootstrap
1.


2. QTL
3. 12200
4. 2.5%QTL

5. 95%
n n
22:48
QTL
22:48

QTL
Darvasi and
Soller (1997)95%
cM)





n
d a
22:48
Statistical power
22:48

QTL
QTLQTL

QTL
22:48







o
|
22:48
P(T)
T
Critical value
H
A
H
0

|
o
Statistical errors
22:48
Rejection of H
0
Nonrejection of H
0
H
0
true
H
A
true
Type I error
at rate o

Type II error
at rate |

Significant result
Nonsignificant result
POWER =(1- |)
22:48
Impact of | alpha
P(T)
T
Critical value
|
o
22:48
Impact of | effect size, N
P(T)
T
Critical value
|
o
22:48



QTL





22:48


QTL
t-F-
22:48
F
2


BC

n d a t
e BC
2
4 ) ( o =
2 2 2
) ( 4 d a t n
BC e
= o
22:48
QTL t
:

22:48
d a =
Sample size BC 672 128 42 11 6
22:48
BCF2
BCF2
BC F2
BCF2
2
e
o
22:48

22:48
22:48

0 > r

0 > r
22:48
QTL



22:48



QTL


22:48
QTL
22:48
Fine Mapping Strategies
Genomewide-based strategies:
Large scale BC, F2, half sibs, etc.
Recombinant inbred lines (RIL)
Advanced Intercross Lines (AIL)
Locus-based strategies:
Selective phenotyping
Recombinant progeny testing
Interval specific congenic strains (ISCS)
Recombinant inbred segregation test (RIST)


22:48
Recombinant inbred
lines RIL
F2
RIL
clonal Lines
RILF2

RILQTL
RIL
22:48
Advanced intercross
lines AIL
AILF2
RIL
AILF2QTLQTL

AIL

AIL
F2
22:48
Advanced intercross lines (AIL)
Semi-random intercrossing
P
F
1
F
2
F
3
F
t
CI = CIF2 / (t/2)
22:48
AIL
AILF2

AILQTLQTL15cM
/ 2 t
22:48
Locus-based strategies:
Selective phenotyping (SPh)
Theoretical basis: Only recombinants increase
mapping accuracy for a detected QTL.
Procedure: A large F2 or BC population,
Only individuals recombinant at a QTL-
containing interval are subsequently
phenotyped.
22:48
Requires only 2 generations .
Requires very large samples .
22:48
SPh - Experimental results
0
2
4
6
0 20 40 60 80
cM
L
O
D
Lesions density

Paigen et al.
BC
SPh-BC
22:48
Recombinant
progeny testing
22:48
Recombinant progeny testing
QTL
Males, recombinant at an interval of interest, are progeny
tested to check which QTL allele was retained.

Requires only 3 generations. Efficient for dominant effects
Requires large sample
22:48
Interval-specific congenic strains

22:48
Interval specific congenic strains (ISCS)
QTL
ISCS are produced by a series of backcrosses and intercrosses
- Requires very few individuals. Useful fro further studies
- Complicated and lengthy development process.
22:48
Recombinant inbred segregation test
(RIST)
P
1
RI P
2
x x
F
1,1
F
1,2
F
2,1
F
2,2
22:48
Each selected RIL is backcrossed to each parent and then the
BC1 is selfed and grown out for phenotypiing and genotyping in
the QTL region. Because the QTL was previously mapped to
this region, the BC to one of the parents will segregate while
the other will not; thus, indicating whether the gene controlling
the QTL is above or below the breakpoint. The overlapping
results of the various RILs will narrow the QTL interval.

- Requires only 2 generations. Few individuals required;
- Requires RILs with recombinations in region of interest .
22:48
RIST - Experimental results
F21 F22
C57L AKR AKXL-16
P=0.41
D2MIT64
D2MIT200
P=0.02
B. Taylor
A. Darvasi
Obesity QTL
22:48

QTL
22:48






QTL


22:48
QTL

QTL
QTL




22:48

22:48
F2

F1F2
F2
F2QTLQQQqqQqq




22:48

22:48

QTL



22:48






t

BC
22:48






BC
HS

QTL
t

22:48

HS





EM

22:48


QTLMm
22:48

ANOVA








22:48
Granddaughter design
GDD
Weller et al. (1990)
QTL



QTL
GDD

GDDAI
22:48
daughter yield
deviations DYD
ANOVA
DYD
22:48
NCP for the daughter design as:


NCP for the granddaughter design as:


Once the NCP parameters is calculated, power is
derived as the probability that a non-central
variate exceeds the threshold from a central
distribution.
GDD is generally much more powerful than a
daughter design

22:48


QTL






22:48



QTL


QTL

22:48
22:48
QTL
22:48

An example of a linear mixed model for a single
QTL analysis is:

22:48
Variance components can be estimated using maximum
likelihood or restricted maximum likelihood (REML), The log-
likelihood function is:

The assumed mean and variance structure of the
observations :
Q is the IBD matrix :

22:48
The distribution of the test statistics are, asymptotically,
a mixture of zero (with probability ) and a with 1
degree of freedom (also with probability of ).
2
_
22:48
The advantage of this likelihood-based approach.
The full maximum likelihood approach
simultaneously estimates the IBD probabilities
and the variance components, in a combined
segregation analysis and linkage analysis
framework.
distribution method
expectation method
22:48
So why is QTL mapping in general pedigrees not
used more frequently, in particular in large, deep
pedigrees?
IBD estimation in large pedigrees.
the unavailability of (user-friendly) software for
the variance component estimation part of the
analysis.
a finite budget.
the unavailability of DNA samples from most
ancestors
22:48
IBD
22:48
Perfect marker
As in the case of sibpairs, IBD sharing using a
fully informative marker is straightforward,
because we can simply count the number of
alleles that two relatives share by descent.
At a location linked to a perfect marker, IBD
probabilities can be calculated from the
observed IBD probability at the marker, the
average relationship between individuals, and
the recombination rate between the marker and
putative QTL position.

22:48
The general case: missing data
and non-informative markers
The marker information in complex pedigrees is
often incomplete.
Unknown linkage phases, non-informative
markers and/or missing marker genotypes
complicate the calculation of Q.
The calculation methods of Q are:
recursive algorithms,
correlation based algorithms
simulation based algorithms. 22:48
Implementation in Loki
The multiple-site segregation sampler in Loki is
a cleverly designed Gibbs sampler with batch
updating.



is the probability of the
segregation indicators across n loci at the ith
segregation conditional on all other
segregation indicators and observed marker
data.


22:48
A two step strategy to sample
The first step involves moving through the
genome, calculating locus by locus, cumulative
probabilities for S
ij
.
the second step involves moving back down the
genome, sampling S
ij
from a univariate density
that is a function of the associated cumulative
probability, the previous sampled segregation
indicator (S
i j+1
) and the recombination rate
between loci j and j+1.

22:48
Introduction to Loki
Loki was originally designed for multipoint linkage analysis
in general pedigrees using MCMC methods.
Then, it has since been modified for IBD probability
calculation.
The user supplies Loki with the pedigree structure, marker
genotypes, marker positions and QTL positions for which
the IBD matrices are to be calculated.
Dependent chains of IBD probabilities are then obtained
for each QTL position.
Convergence is determined by monitoring the IBD
probabilities over the iteration number.
Once the probabilities stabilize, the sampler is deemed to
have reached convergence.
22:48
Variance component estimation
After having calculated IBD probabilities, there
are two difficulties in estimating variance
components by ML(REML).
Firstly, the IBD matrix is a completely general
symmetrical matrix and does not have an
obvious inverse.
Secondly, the IBD matrix is likely to be
singular.

22:48
why the IBD matrices are often singular?
The reason is that two related relatives can share 0
or 100% of their alleles IBD, which can cause a
dependency in the matrix of IBD probabilities.
The genotypes of the parents are M1M2 and M3M4.
If the progeny have genotypes M1M3 and M2M4(a),
or M1M3 and M1M3(b), then the resulting IBD
matrix is:
a b
22:48
If the maximisation algorithm is based upon the
complete matrix V (or V
-1
), then there should not
be a problem.
If the maximisation is based upon an algorithm that
requires Q
-1
, then using genomic positions which
are slightly distant from the markers will give a
positive-definite Q,

22:48
Implementation example
Visscher et al. (1999) used the combination of an MCMC
sampling approach and REML variance component
estimation to map a QTL for bipolar disorder (manic
depression) in a human pedigree.
The pedigree size was 168, over 4 generations, and 143
individuals had a phenotypic score.
The incidence of major recurrent depression (unipolar
disorder) and bipolar disorder was 17/143 and 11/143.
A small segment of chromosome 4 was considered
because this region had previously shown linkage to
bipolar disorder using a parametric linkage analysis, and
11 microsatellite markers were scored spanning 26 cM.
22:48
IBD probabilities were estimated using Loki, using
10,000 samples.
REML was used to estimate 81 variance components,
with an algorithm based upon the complete (co)variance
matrix V, to avoid the problem of singular IBD matrices.
22:48
22:48
LD (
22:48
What is LD?
Linkage disequilibrium is a measure of
association between alleles at different loci.
Suppose we have two bi-allelic loci, A and B,
with allele frequencies p
A1
and p
A2
, and p
B1
and
p
B2
, respectively.

LE:

LD:
22:48
Measures of LD for single-allelic
marker
1. Falconer and Mackay, 1996; Lynch and Walsh
1998 for bi-allelic loci:
22:48
when D>0, the smaller of p
A1
p
B2
and p
A2
p
B1
.
when D<0, the smaller of p
A1
p
B1
and p
A2
p
B2
.
2. Another measure of LD is:

ranges from -1 to +1, whereas
ranges from 0 to 1.
Whenever one of the four haplotype
frequencies is zero, = 1.
22:48
3. For bi-allelic markers, another useful measure
is(Hill and Robertson, 1968):


Nr
2
is the test statistic for independence
as calculated from a 2x2 contingency table.
A statistical test of LD using the r
2
statistic is
therefore straightforward.


2
_
22:48
Measures of LD for multi-allelic
marker
Hedrick, 1987:

22:48
k and l are the number of alleles at locus A and B.
p
Ai
and p
Bj
are the population allele frequencies of allele i
at locus A and allele j at locus B.
|D
ij
| is the absolute value of the normalised measure.
p
AiBj
is the estimated population frequency of the
haplotype A
i
B
j
D
ij
max
is the maximum amount of disequilibrium possible
between allele i at locus A and allele j at locus B.
The corresponding multi-allelic measure of the squared
correlation is:


22:48
linkage disequilibrium vs.
gametic phase disequilibrium
The term linkage disequilibrium appears to imply that the
loci have to be linked.
However, this is not the case, because an association
between alleles can exist even if the alleles are unlinked.
two populations with unequal frequencies are mixed.
Non-random mating.
the case of an F1 population.
Selection
A better term for LD is gametic phase disequilibrium,
which is used in text books such as Falconer and Mackay
(1996) and Lynch and Walsh (1998)
22:48
D or r
2
?
Hedrick (1987) stated that a good measure of
disequilibrium should have the following
properties:
A simple biological interpretation.
Statistical tests should be possible.
Be directly related mathematically to
evolutionary factors such as recombination,
selection, genetic drift, gene flow etc
Be standardised to allow comparisons across
loci or populations
22:48
Dynamics of LD
There are a number of evolutionary forces that
create LD, including mutation, admixture
(crossbreeding), genetic drift, inbreeding, founder
effects and selection.
The main force that destroys LD is recombination.
22:48
22:48
LD mapping
mapping requires a marker to be in LD
with a QTL across the entire population.
To be a property of the whole population,
the association must have persisted for a
considerable number of generations, so
the marker(s) and QTL must therefore be
closely linked.
22:48
22:48
The difference between Linkage and LD analysis:
linkage analysis uses LD within families whereas LD
analysis uses LD in the whole population.
In linkage studies, information is observed on alleles
shared by descent (IBD), whereas in LD mapping
studies, in the absence of known pedigree information,
we can only observe alleles shared by state (IBS).
For linkage analysis we have observed recombination
events and realised genomic relationships between
individuals in the pedigree, whereas for LD analysis
the recombinations occurred in the recent or distant
past and we are trying to infer them from data.
22:48
Genome wide association tests using
single marker regression
It is suitable for a random mating
population with no population structure
22:48
single marker regression with
considering population structure
22:48
Genome wide association using
haplotypes
22:48
IBD LD mapping
1


( (
(
( (
(
' ' ' '
=
( (
(
( (
' ' ' ' (

( (

n n n n n
-1
n 1
-1
n 2

1 1 1 Z 1 W 1 y
u Z1 ZZ+ A ZW Zy
g W1 WZ WW+G Wy
22:48
Combined LD-LA mapping
Authors investigating the extent of LD in
both cattle and sheep were somewhat
surprised/alarmed to find not only was LD
highly variable across any particular
chromosome, but there was even
significant LD between markers which
were not even on the same chromosome!
22:48
Combining method
If the common ancestor occurs within the known
pedi-gree, then IBD probability can be calculated
from the markers by linkage analysis (LA)
If the common ances-tor is outside the known
pedigree it is a source of LD.In this case the
probability that the QTL alleles are IBD is
calculated from the similarity between the
marker haplotypes, i.e., which marker alleles
have both haplo-types in common
22:48
22:48

You might also like