You are on page 1of 3

# Hardy-Weinberg

f^2(ab) = 4f(aa)f(bb)
Wright-Fisher
-species is monecious
-generations are non-overlapping
-each generation is produced from prev by random mating
-no selection, mutation, or migration
pij = P(X(t+1) = j|X(t)=i) = (2N choose j) * (i/2N)^j * (1-(i/2N))^(2N-j)
difference equations
f(t+2) = a*f(t+1) + b*f(t)
r^2 = a*r + b
x(t) = A*a^t + z(t) // for some constant A
Selection
AA>Aa>aa:lim of fA(t) = 1
heterozygous: p = (WAa-Waa)/(2*WAa WAA Waa) lim of fA(t) = p
homozygous: calc p and then see if fA(0) (</=/>) p and that's the answer
cobwebbing
stable point is intersection between y = phi(x) and y=x and when points around it circle towards it, not
away
also, if |phi'(x)| < 1 then point x is stable
you can actually circle around a point without spiraling inwards or outwards
markov chain
P(X(n) = j | X(n-m) = i) = the sum over all the possible paths from time n-m to time n to get from state i
to state j
prow/col = prob of going from row value to col value
p(0) = [P(X(0)) = 0 ... P(X(0) = n)]
P(X(t) = j) = p(0) * A^t // mult p(0) down approp column (jth column) of A^t
E[X(t)] = p(0) * A^t * g // mult g values across row of A^t for each value/index in p(0)
"show E[h(X(t+1))|X(t) = i)] = h(i)" - show A*h = h
irreducible = every state communicates with each other
stationary distro: sum(pi's) = 1 and pi's = pi's * A // pi0 = pi0 * A00 + ... + pin * An0
"if chain in state i at t, how long until until state i again?" - 1/pii
shotgun sequencing
Xi denotes leftmost point of frag i
expected coverage: E[C] = 1 - e^(-N * E[L] / g)
prob frag is isolated: (1-(2L/g))^(N-1) = e^(-2NL/g)
below vvv -- a denotes position of X1
reasoning: frag j does not overlap frag 1 iff Xj not in interval [a-L, a+L] which is (1-2L/g). for this to
occur for each other frag, then you get equation
expected # of isolated frags: Ii = 1 if i is isolated -> sum from 1 to N of E[Ii] = N * e^(-2NL/g)

## poisson: Nt(k) = (np)^k * e^(-np) / k!

N(x) - N(y) = N(x-y)
(1 - (x/n))^n = e^-x
enzyme: rate = BIG PI (prob of each base)
E[length] = 1/rate
double digest rate: prob seq1 finishes * rate1 + prob seq2 finishes * rate2
"what is prob there are exactly 8 cuts of a seg 10^4 long, and no more than 2 cuts happen in first 3000
bases?" - sum from j=0 to 2 of P(N(3000) = j, N(10,000) - N(3,000) = 8 - j))
= sum from j=0 to 2 of [(3000p)^j * e^(-3000p) * (7000p)^(8-j) * e^(-7000p) / (8-j)!]
thinning/partial digest: if u is prob that an arrival is thinned (cut happens w/ prob u) then Mt has rate uv
where v is rate of Nt
moran markov model
finite popu, overlapping generations, no selection mutation or migration
reproduction is nonsexual duplication
two are selected - one clones, other dies
prob is sum of possibilities: clones, dies, with/without mutation
forward algo
f0(0) = 1 and fk(0) = 0 for 1<=k<=N
set f0(i) = 0 for i >= 1
fk(i) = sum from j=0 to N of [fj(i-1) * pjk*ek(xi)]
viterbi algo // for estimating path in HMM given outputs
vk(1) = p0k * ek(x1) 1 <= k <= N
vk(i) = max { vj(i-1) * pjk * ek(xi); 1<=j<=N } = ek(xi) * max {vj(i-1)pjk; 1<=j<=N}
pair HMM // for aligning two sequences
epsilon == prob from X (or Y) to M
d == prob from M to X (or Y)
n == prob from M (or X or Y) to end
5 states - start, stop -- X,M,Y
M emits a pair
X emits from x, but not y
qxy = P(M emits x,y), qx = P(X emits x) = P(Y emits x)
vk(i,j) = max prob that path produces intermediate output {x1 ... xi, y1 ... yj } and that the last state
visited by the hidden chain is k
vx(0,j) = vM(0,j) = 0
vy(i, 0) = vM(i,j) = 0
vM(i,j) = max { vM(i-1,j-1)*(1-2d-n), vx(i-1, j-1)*epsilon, vy(i-1, j-1) * epsilon} * qxiyj
vx(i,j) = max { vM(i-1, j)d, vx(i-1, j)*(1-epsilon-n)} * qxi
vy(i,j) = max {vM(i,j-1)d, vy(i,j-1)*(1-epsilon-n)} * qyj
profile HMM
vk(i) = max score over all admissible paths ending at (i,k)
vk(i) = max { vj(r) = * s(edge from (r, j) to (i,k)); the edge from (r, j) to (i,k) is admissible}
to start algo: vk(0) must be specified for rows k that correspond to start, match or insert state
these are: vm0(0) = 1, vi0(0) = vi1(0) etc = 0, vm1(0) = vm2(0) etc = 0

## global alignment (needleman-wunch)

F(i,j) = max { F(i-1, j-1) + s(xi, yj), F(i-1, j) + g, F(i, j-1) + g }
with affine: M(i,j) = max { M(i-1, j-1) + s(xi,yj), Ix(i-1, j-1) + s(xi, yj), Iy(i-1, j-1) + s(xi, yj) }
Ix(i,j) = max { M(i-1, j) + h + g, Ix(i-1, j) + g }
local
F(i,j) = max { F(i-1, j-1) + s(xi, yj), F(i-1, j) + g, F(i, j-1) + g, 0 }
gap penalties
linear: w(k) = gk
affine: w(k) = { h + gk if k >= 1, 0 if k = 0
overlap // trace back from high score on bottom or right frame and go back until top or left frame
F(i,j) = max { F(i-1, j-1) + s(xi, yj), F(i-1, j) + s(xi, -), F(i, j-1) + s(-, yj) }
repeated match
T == threshold that restricts search whose alignment is better than T
F(0,j) = 0 for all j
F(i,0) = max {F(i-1, 0) , F(i-1, j) - T for 1 <= j <= m }
F(i,j) = max {F(i,0), F(i-1, j-1) + s(xi,yj), F(i-1,j) - d, F(i,j-1) - d }
fill in from top left to bottom right
add F(n+1, 0) and compute it using first recurrence equation, this is optimal score
BLOSUM
entry S(x,y) = 2log2(pxy / px*py)
Nx == # of amino acids of type x in data base
N == sum of all Nx for all x = total # of amino acids in data base
Nxy == # of pairs (x,y) that occur in alignments of protein domains with blocks
M == sum of all x [Nxx] + sum of all x,y such that x!=y [Nxy/2] = total # of aligned amino acid pairs
px = Nx/N
pxy = { Nxy/2M if x != y, Nxx/M if x = y