Professional Documents
Culture Documents
Change in
DNA
Sequences
Evolutionary process is a series of
gene substitutions in which new
alleles, each arising as a mutation
in a single individual,
individual
progressively increase their
frequency and ultimately become
fixed in the population.
population
2
We may look at the process from a
different point of view.
3
If we use a time scale in which one time unit is
larger than the time of fixation, then the DNA
sequence at any given locus will appear to
change with time.
actgggggtaaactatcggtatagatcataa
actgggggttaactatcggtatagatcataa
actgggggttaactatcggtatagatcataa
actgggggttaactatcggtatagatcataa
actgggggtgaactatcggtatagatcataa
actgggggtgaactatcggtacagatcataa 4
To study the dynamics of
nucleotide substitution,
we must make several
assumptions regarding the
probability of substitution
of a nucleotide by
another.
5
Jukes & Cantor’s
one-parameter model
6
Assumption:
Substitutions occur with equal probabilities
among the four nucleotide types.
7
If the nucleotide residing at a
certain site in a DNA
sequence is A at time 0,
what is the probability, PA(t),
that this site will be occupied
by A at time t? 8
Since we start with A, PA(0) = 1. At
time 1, the probability of still
having A at this site is
P 1 3
A(1)
where 3 is the probability of A changing to
T, C, or G, and 1 – 3 is the probability that
A has remained unchanged. 9
To derive the probability of having A
at time 2, we consider two possible
scenarios:
10
1. The nucleotide has remained
unchanged from time 0 to time 2.
11
2. The nucleotide has changed to
T, C, or G at time 1, but has
subsequently reverted to A at
time 2.
12
P (1 3 )P 1 P
A(2) A(1) A(1)
13
The following equation applies
to any t and any t+1
PA(t 1) (1 3a)PA(t) a 1 PA(t)
14
We can rewrite the equation in
terms of the amount of change in
PA(t) per unit time as:
P P P 3aP a1 P 4aP
a
A(t) A(t 1) A(t) A(t) A(t) A(t)
15
We approximate the discrete-time
process by a continuous-time
model, by regarding PA(t) as the
rate of change at time t.
dPA (t )
4 PA(t )
dt 16
The solution is:
1 1 4at
P P e
A(t) 4 A(0) 4
17
1 1 4at
P P e
A(t) 4 A(0) 4
If we start with A, the probability that
the site has A at time 0 is 1. Thus, PA(0)
= 1, and consequently,
1 3 4at
P e
A(t) 4 4 18
1 1 4at
P P e
A(t) 3 A(0) 4
1 3
P 1 : P e 4at
A(0) A(t) 4 4
1 1
P 0 : P e 4at
A(0) A(t) 4 4
20
So far, we treated PA(t) as a probability.
22
23
24
25
26
NUMBER OF NUCLEOTIDE
SUBSTITUTIONS BETWEEN
TWO DNA SEQUENCES
27
After two nucleotide sequences diverge from
each other, each of them will start accumulating
nucleotide substitutions.
31
32
Number of substitutions
between two noncoding
(NOT protein coding)
sequences
33
The one-parameter model
3 8 t
p 1 e
4
t is usually not known and, thus, we cannot
estimate . Instead, we compute K, which is the
number of substitutions per site since the time of
divergence between the two sequences.
36
37
3 8 t
p 1 e
4
39
40
Kimura’s
two-parameter
model
41
Assumptions:
• The rate of transitional substitution
at each nucleotide site is per unit
time.
• The rate of each type of
transversional substitution is per
unit time.
42
α ⁄ β ≈ 5 - 10
43
If the nucleotide residing at a
certain site in a DNA sequence is
A at time 0, what is the
probability, PA(t), that this site will
be occupied by A at time t?
44
After one time unit the probability of A
changing into G is , the probability of A
changing into C is and the probability of A
changing into T is . Thus, the probability of A
remaining unchanged after one time unit is:
PAA(1) 1 2
45
To derive the probability of having A
at time 2, we consider four possible
scenarios:
46
1. A remained unchanged at t = 1
and t = 2
47
2. A changed into G at t = 1 and
reverted by a transition to A at t
=2
48
3. A changed into C at t = 1 and
reverted by a transversion to A at
t=2
49
4. A changed into T at t = 1 and
reverted by a transversion to A at t
=2
50
PAA(2) (1 2 )PAA(1) PTA(1) PCA(1) PGA(1)
51
By extension we obtain the
following recurrence equation for
the general case:
52
After rewriting this equation as
the amount of change in PAA(t) per
unit time, and after
approximating the discrete-time
model by the continuous-time
model, we obtain the following
differential equation
dP
AA(T )
( 2 )P P P P
dt AA(t) TA(t) CA(t) GA(t)
53
Similarly, we can obtain equations
for PTA(t), PCA(t), and PGA(t), and from
this set of four equations, we arrive
at the following solution
1 1 4 t 1 2( )t
PAA(t) e e
4 4 2
1 3 4at
P e
A(t) 4 4 54
In the Jukes-Cantor model:
1 1 4 t 1 2( )t
X e e
(t) 4 4 2
At equilibrium, the equation reduces to X() = 1/4.
Thus, as in the case of Jukes and Cantor's model, the
equilibrium frequencies of the four nucleotides are 1/4. 56
3 probabilities
Y(t) = The probability that the initial
nucleotide and the nucleotide at time t differ
from each other by a transition.
1 1 4 t 1 2( )t
Y e e
(t) 4 4 2
57
3 probabilities
Z(t) = The probability that the
nucleotide at time t and the initial
nucleotide differ by a specific type of
transversion is given by
1 1 4 t
Z e
(t) 4 4
58
Each nucleotide is subject to two types
of transversion, but only one type of
transition. Therefore, the probability
that the initial nucleotide and the
nucleotide at time t differ by a
transversion is twice the probability that
differ by a transition
59
Number of substitutions
between two noncoding
(NOT protein coding)
sequences
60
The differences between two
sequences are classified into
transitions and transversions.
61
62
63
64
1
2 2 2
1 1 1 P Q Q
V(K) P Q
L 1 2P Q 2 4P 2Q 2 4Q 1 2P Q 2 4P 2Q 2 4Q
65
66
Numerical example (2P-model)
67
There are substitution
schemes with more
than two parameters!
68
THANK YOU
69