Professional Documents
Culture Documents
S 2 0 0 0 20
= 3 3(10) 30
T 4 0 0 2 1 2
V 0 1 3 0 0 0
Calculate the matrix for qij qij = Cij Cij given in block and T 30
T
A I L S T V
C ij
A 11/30 q ij =
I 0 0
T
11
L 0 3/30 3/30 q AA
30
S 2/30 0 0 0
0
T 4/30 0 0 2/30 1/30 q II
30
V 0 0 3/30 0 0 0
3
q LL
A I L S T V 30
0
A 0.366 q SS
I 0 0 30
1
L 0 0.1 0.1 qTT
S 0.066 0 0 0
30
0
T 0.133 0 0 0.066.. 0.033.. qVV
0 0 30
V 0.033.. 0.1 0 0
C ij
Calculate the matrix for Pi as PA, PI, PL PS,PT and PV q ij =
T
A I L S T V 11
q AA
30
A 11/30 q II
0
30
I 0 0 3
q LL
30
L 0 3/30 3/30
0
q SS
S 2/30 0 0 30
1
qTT
T 4/30 0 0 2/30 1/30 30
0
V 0 1/30 3/30 0 0 0 qVV
30
q ij q AX 11 2 4 11 2 4
Pi q i j
i j 2
H e r e Pi P A q A X
i X 2
30
30
30
2
30
30
2
11 6 1 6 1 22 6 1 28 1 28 14
2 11 x 0 .4 6 6
30 30 30 2 30 2 30 2 30 2 30
qij 0 3 1 0 4 1 4 1 04 1 4 2
Pi qij Here Pi PI 2 2 0 0.066
i j 2 30 30 30 30 30 30 2 30 2 30 2
30
qijj 3 0 3 1 12 6
Pi qij Here Pi PL 2 0.
02
i j 2 30 30 30 30 2
30
qij 0 2 1 4 2
Pi qij Here Pi PS 0 2 0.06 6
i j 2 30 30 30 2 3 0
q ij 1 0 1 8 4
Pi q ij Here Pi PT 2 0.1 3 3
i j 2 30 30 30 2 3 0
q ij 0 3 1 1 4 2
Pi qij Here Pi PV 2 0.0 6 6
i j 2 30 30 30 30 2 30
Calculate the matrix for eij ; eij = Pi2 for i = 0 and eij =2PiPj i ≠ 0 14
PA
A I L S T V 30
2
A
14
2
PI
30 30
I
2 14 30
2 30
2
2
6
30 PL
L 30
30 30 30 6 30 6 30
2
2 14 6 2 2
2
PS
S 30 30 30 2 30 30 2 30 2 30
2
2 14 2 2 2 2 6 30
4
30 30 30 4 30 30 4 30 30 4 30 4 30
2
T 2 14 4 2 2 2 6 2 2 PT
30
30 30 30 2 30 30 2 30 30 2 30 30 2 30 2 30
2
V 2 14 2 2 2 2 6 2 2 2 4
2
PV
30
qij 0.366
L odds
Log dd ratio i S ij log
ti is l 2 , S AA log
l 2 log 1 6837 0.7516
l 2 1.6837 0 7516
14 30
2
eij
BLOSUM value for AA (the first diagonal element of the BLOSUM matrix)
= round (2(0.7516)) = 2
• Two
T sequences off similar
i il or variable
i bl length
l th
column
sequence 1 in 2 for 2 in 1
Seq 1 G T A C A T G
Seq 2 T A G A T G
S 1
Seq G A T T C T A T C T A A C T
Seq 2 G T T C T A T T C T A A C
G T A C A T G
T
A
G
A
T
G
Example: Align two sequences globally
Seq 1 G A T T C T A T C T A A C T
Seq 2 G T T C T A T T C T A A C
G A T T C T A T C T A A C T
G
T
T
C
T
A
T
T
C
T
A
A
C
Dot plots with thresholds
G A T T C T A T C T A A C T
G
T
T
C
T
A
T
T
C
T
A
A
C
G A T T C T A T C T A A C T
G
T
T
C
T
A
T
T
C
T
A
A
C
Seq 1 G A T T C T A T ‐ T C T A A C T
Seq 2 G ‐ T T C T A T C T C T A A C ‐
Advantages
• Good for identification of long regions if strong similarity
• Easyy to make and interpret
p
• Can be used for any length sequence
Disadvantages
• Need to find best window size
• Graphical representation doesn’t give information about
mutation
Needleman‐Wunch (global alignment)
We want to align two sequences x1. x2,….xn and y1, y2,…ym and
create an m x n matrix F where
F i 1, j 1 Sij (match/mismatch in the diagonal)
F i, j max Fi 1, j d (gap in sequence 1)
F d (gap in sequence 2)
i , j 1
with F 0,0 0, F i,0 id , F 0, j jd
This is the recursive relation in dynamic programming algorithm.
In the tabular computation we start in cell (0, 0) and calculate one
row at a time. In each cell (I, j) we keep a pointer to the optimal
previous position, given the current one. j
i, j
Example: Align two sequences globally GAATTCAGTTA and GGATCGA
Answer: Seq 1 G A A T T C A G T T A
Seq 2 G G A T C G A
1 Initialization:
1. i i li i C
Create M +1
1 column
l and
d N + 1 rows
Seq 1 G A A T T C A G T T A
Seq 2 G G A T C G A
G GA A A A T T T T C C A A GG TT TT A
G 0 0 0 0 0 0 0 0 0 0 0 0
G
G 0
G
A 0
AT 0
TC 0
CG 0
G
A 0
A 0
2. Matrix fill (Scoring) j
i G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
F (0,0)
G 0 1
F (0,1)
G 0
F (1,0) A 0
F (1,1) T 0
F (1,2) C 0
F (2,1) G 0
A 0
F i 1, j 1 S ij (m atch/m ism atch in the diagonal)
F i , j m ax Fi 1, j d (gap in sequence 1)
F
i , j 1 d (gap in sequence 2 )
F o r p o s itio n 1, 1 S ij S 1 ,1 1 ( S in c e G is p r e s e n t in b o th s e q )
w ( g a p p e n a lty ) 0 . T h u s , F1 ,11 = M a x F i 1 , j 1 1,
1 F1 1 ,11 0 , F1 ,11 1 0
T h u s , F1 ,1 = M a x F 0 , 0 1, F 0 ,1 0 , F1 , 0 0 = M a x 0 1, 0 0 , 0 0
M a x 1, 0 , 0 . 1 is th e la r g e s t v a lu e is p la c e d in 1 s t p o s itio n
2. Matrix fill (Scoring)
F i 1,
1 j 1 Sij (match/mismatch in the diagonal)
F i, j max Fi1, j d (gap in sequence 1)
F d (gap in sequence 2)
i, j 1
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
3. Traceback (Alignment): It begins in the M, J position in the matrix ,
i.e. the position that leads to the maximal score. Here it is 6.
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
A match or a Either a Deletion in seq 1 Either a Insertion in seq
substitution or Insertion in seq 2 1 or Deletion in seq 2
Seq 1 G A A T T C A G T T A
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A
3. Traceback (Alignment): It begins in the M, J position in the matrix ,
i.e. the position that leads to the maximal score. Here it is 6.
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
A match or a Either a Deletion in seq 1 Either a Insertion in seq
substitution or Insertion in seq 2 1 or Deletion in seq 2
Seq 1 G ‐ A A T T C A G T T A
Seq 2 G G ‐ A ‐ T C ‐ G ‐ ‐ A
An advance Scoring Scheme
The sequences are treated with an advanced scoring scheme
where it is assumed that Length of Sequence 1 i.e. M and
Sequence 2 i.e. N
F i 1, j 1 S ij (m a tc h /m is m a tc h in th e d ia g o n a l)
F i , j m a x Fi 1, j d (g a p in s e q u e n c e 1)
F
i , j 1 d (g a p in s e q u e n c e 2 )
F o r p o s itio n 1,1 S ij S 1,1 2 (S in c e G is p re s e n t in b o th s e q )
w ( g a p p e n a lty ) 2 . T h u s , F1 ,1 = M a x F1, 0 2 , F 0 ,1 2 , F 0 ,1 2
M a x 2 , 2 , 2 . 2 is th e la rg e s t v a lu e is p la c e d in 1 s t p o s itio n
2. Matrix fill (Scoring)
F i 1, j 1 Siijj (match/mismatch in the diagonal)
F i, j max Fi 1, j d (gap in sequence 1)
F d (gap in sequence 2)
(gap
i , j 1
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 1 ‐1 ‐1
G 0 ‐2 1 1 ‐2 ‐2 ‐2 ‐2 1 1 ‐1 ‐2
A 0 0 ‐4 3 1 ‐1 ‐3 0 ‐1 0 0 1
T 0 ‐1
1 2 3 5 3 1 ‐1
1 ‐1
1 1 2 0
C 0 ‐1 0 1 3 ‐4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 ‐2 3
3. Traceback (Alignment): It begins in the M, J position in the matrix ,
i.e. the position that leads to the maximal score. Here it is 3.
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 0 ‐1 ‐1
G 0 2 1 1 ‐2 ‐2 ‐2 ‐2 1 1 ‐1 ‐2
A 0 0 ‐4
4 33 1 ‐1
1 ‐3
3 0 ‐1
1 0 0 1
T 0 ‐1 2 3 5 3 1 ‐1 ‐1 1 2 0
C 0 ‐1 0 1 3 ‐4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 ‐2 3
(Alignment) G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 0 ‐1 ‐1
G 0 2 1 1 ‐22 ‐22 ‐22 ‐22 1 1 ‐11 ‐2
2
A 0 0 4 3 1 ‐1 ‐3 0 ‐1 0 0 1
T 0 ‐1
1 2 3 5 3 1 ‐1 1 ‐1
1 1 2 0
C 0 ‐1 0 1 3 4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 2 3
A match or a Either a Deletion in seq 1 Either a Insertion in seq
substitution or Insertion in seq 2 1 or Deletion in seq 2
Seq 1 G A A T T C A G T T A
Seq 2 G G A T ‐ C ‐ G ‐ ‐ A
3. Traceback (Alignment): It begins in the M, J position in the matrix ,
i.e. the position that leads to the maximal score. Here it is 3.
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 0 ‐1 ‐1
G 0 ‐2 1 1 ‐2 ‐2 ‐2 ‐2 1 1 ‐1 ‐2
A 0 0 ‐4
4 3 1 ‐11 ‐33 0 ‐11 0 0 1
T 0 ‐1 2 3 5 3 1 ‐1 ‐1 1 2 0
C 0 ‐1 0 1 3 ‐4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 ‐2 3
(Alignment) G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 0 ‐1 ‐1
G 0 ‐2
2 1 1 ‐2
2 ‐2
2 ‐2
2 ‐2
2 1 1 ‐1
1 ‐2
2
A 0 0 ‐4 3 1 ‐1 ‐3 0 ‐1 0 0 1
T 0 ‐1
1 2 3 5 3 1 ‐1
1 ‐1
1 1 2 0
C 0 ‐1 0 1 3 ‐4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 ‐2 3
A match or a Either a Deletion in seq 1 Either a Insertion in seq
substitution or Insertion in seq 2 1 or Deletion in seq 2
Seq 1 G A A T T C A G T T A
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A
Aligning of sequences by the Simple Scoring Matrix
Seq 1 G A A T T C A G T T A
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A
Seq 1 G ‐ A A T T C A G T T A
Seq 2 G G ‐ A ‐ T C ‐ G ‐ ‐ A
Aligning of sequences by an Advance Scoring Matrix
Seq 1 G A A T T C A G T T A
Seq 2 G G A T ‐ C ‐ G ‐ ‐ A
Seq 1 G A A T T C A G T T A
Seq
q2 G G A ‐ T C ‐ G ‐ ‐ A
Seq 1 G A A T T C A G T T A
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A
Test to make sure the result of a valid score for alignment
Rememberingg that the scoringg scheme is +2 for a match,, ‐1 for a mismatch,,
and ‐2 for a gap, both the sequences can be tested to make sure that they
result in a score of 3.
Aligning of sequences by the Simple Scoring Matrix
Seq 1 G A A T T C A G T T A = ‐1 +2 +2
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A = ‐11 + 4 = 3
+2 ‐1 +2 ‐2 +2 +2 ‐2 +2 ‐2 ‐2 +2
Seq 1 G A A T T C A G T T A
= ‐1 +2 +2
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A = ‐1 + 4 = 3
+2 ‐1 +2 ‐2 +2 +2 ‐2 +2 ‐2 ‐2 +2
Three steps in dynamic programming
1. Initialization
2.. Matrix
at fill (Sco
(Scoring)
g)
3. Traceback (Alignment)
Example: Align two sequences globally AGCT and GCT
Answer: Seq 1 A G C T
Seq 2 G C T
Rule:
1. Match = 1; 4. Box besides (+ gap)
2. Mismatch = ‐1; 5. Box top (+ gap)
3. Gap = ‐2 6. /
Diagonal box ( Match/Mismatch)
A A
G CG TC T 0 + ‐1 = ‐1 ‐2 +‐ 1 = ‐3
G 0 ‐2 ‐4 ‐6 ‐8
‐2 +‐1=‐3 0 + ‐1 = ‐1
CG ‐2 ‐1 ‐1 ‐3 ‐5
TC ‐4
4 ‐3
3 ‐2
2 0 ‐2
2
T ‐6 ‐5 ‐4 ‐2 1
Three steps in dynamic programming
1. Initialization
2. Matrix fill (Scoring)
3. Traceback (Alignment)
A G C T
0 ‐2 ‐4 ‐6 ‐8
G ‐2
2 ‐1
1 ‐1
1 ‐3
3 5
‐5
C ‐4 ‐3 ‐2 0 ‐2
T ‐6 ‐5 ‐4 ‐2 1
Seq 1 A G C T
Seq 2 ‐ G C T
Smith‐Waterman (Local Alignment)
A A G A
0 0 0 0 0
T 0 0 0 0 0
T 0 0 0 0 0 Alignment
A 0 1 1 0 1 S 1
Seq A A G
Seq 2 A A G
A 0 1 2 0 1
G 0 0 0 3 1
Needleman‐Wunch Smith‐Waterman