BLOSUM

BLOSUM
BLOcks Substitution Matrix

• Index denoting the level of clustering
• 62 block clustered at the 62% identity level
• Eliminate those sequence which are identical in more
than x% of their Amino acid Sequence to avoid the
biasness to certain protein
• This can be done either by removing the sequence from
the block or by finding a cluster of similar sequence
• Matrix built from blocks with not >x% similarity is called

BLOSUMx (so 60% similarity is called BLOSUM 60).
Steps of Building up BLOSUM Matrix
Seq 1 A A B C D ‐ ‐ ‐ B B C D A
Seq 2 D A B C D ‐ A ‐ B B C B B
Seq 3 B B B C D B A ‐ B C C A A
S 4
Seq A A A C D C ‐ D C B C D B
Seq 5 C C B A D B ‐ D B B D C C
q6
Seq A A A C A ‐ ‐ ‐ B B C C C
•Calculate the log odds ratio in each column of each block.

•This is done by counting the pairs of Amino acid in each
column of the multiple alignment.
e.g.
g in column of Amino acid AABACA ((first column): )
AA Pairs = 6; AB Pairs = 4; AC Pairs = 4; BC Pairs = 1;
BB Pairs = 0; CC Pairs = 0
•Hence there is a contribution of all pairs i.e.
6 + 4 + 4 + 1 + 0 + 0 = 15
Generally speaking for each pair of amino acids I and j for each column k of each
block we have for
ij 2  
like comparisons, C ( k )  ni pairs for a column of n amino acids
For unlike comparisons
Cij( k )  ni n j , ni is the number of times residue I was observed in the column
In the last stage the results are normalized according to the following definitions
Scores for each column across columns are summed up: Cij   Cij( k )
k
The pair of frequencies are normalized so that their sum becomes 1.
n(n  1)
T   Cij( k )  w where w  number of columns and n  no of seque
q nces
i i 2
qij is the observed probability for a pair of amino acids in the same column to be
Cij
i and j and is given by qij =
T
Seq 1 A A B C D ‐ ‐ ‐ B B C D A
4 + 8 + 0 + 0 + 0 + 0 +0 Seq 2 D A B C D ‐ A ‐ B B C B B
q AB 
(6)(5) Seq 3 B B B C D B A ‐ B C C A A
7
2 Seq 4 A A A C D C ‐ D C B C D B
12 Seq 5 C C B A D B ‐ D B B D C C

105 Seq 6 A A A C A ‐ ‐ ‐ B B C C C
Calculating the denominator of the log odds ratio : Probability of occurrence
qij
of the i residue in an  i, j  pair:
th
Pi  qij  
i j 2
Assuming independence,
independence given pairs should occur with frequencies
eij  Pi 2 for i  0 and eij  2 Pi Pj for i  0
qij
The odds matrix is S ij  log 2 . The final result is the rounded 2S ij
eij
This value is stoted in the (i , j ) entry of the BLOSUM matrix.
If the observed no. of differences between a pair of amino acid is equal
to the expected no. then S ij  0, less than expected no. then S ij  0
and more than expected no. then S ij  0
(1) Cij (2) T , (3) qij ,

(4) Pi , (5) eij , (6) log odds ratio S AA
Seq 1 A A I
Ques:‐ Find the BLOSUM value of AA Seq 2 S A L
for given sequences Seq 3 T A L
Seq 4 T A V
Seq 5 A A L
Ans‐ First calculate the Cij
Count the different letters present in the sequences i.e. A, S, T, I,
L V or, A,
L, A I,
I L,
L S,
S T,
TV
Value Cijin form of Matrix

A I L S T V n(n  1)
T   Cij( k )  w
A 11 i i 2
I 0 0  5(5  1)   5(4) 
= 3 = 3
L 0 3 3  2   2 
S 2 0 0 0  20 
= 3    3(10)  30
T 4 0 0 2 1 2
V 0 1 3 0 0 0
Calculate the matrix for qij qij = Cij Cij  given in block and T  30
T
A I L S T V
C ij
A 11/30 q ij =
I 0 0
T
11
L 0 3/30 3/30 q AA 
30
S 2/30 0 0 0
0
T 4/30 0 0 2/30 1/30 q II 
30
V 0 0 3/30 0 0 0
3
q LL 
A I L S T V 30
0
A 0.366 q SS 
I 0 0 30
1
L 0 0.1 0.1 qTT 
S 0.066 0 0 0
30
0
T 0.133 0 0 0.066.. 0.033.. qVV 
0 0 30
V 0.033.. 0.1 0 0
C ij
Calculate the matrix for Pi as PA, PI, PL PS,PT and PV q ij =
T
A I L S T V 11
q AA 
30
A 11/30 q II 
0
30
I 0 0 3
q LL 
30
L 0 3/30 3/30
0
q SS 
S 2/30 0 0 30
1
qTT 
T 4/30 0 0 2/30 1/30 30
0
V 0 1/30 3/30 0 0 0 qVV 
30
q ij q AX 11  2 4  11  2  4 
Pi  q i j  
i j 2
H e r e Pi  P A  q A X  
i X 2

30
 
 30
 
30 
2 
30
 
 30 
 2
11 6 1  6  1  22  6  1  28  1 28 14
  2   11          x   0 .4 6 6
30 30 30  2  30  2  30  2  30 2 30
qij 0  3 1  0 4 1  4 1 04 1 4 2
Pi  qij   Here Pi  PI     2  2  0        0.066
i j 2 30  30 30  30 30 30  2  30  2  30 2
  30
qijj 3  0 3  1  12  6
Pi  qij   Here Pi  PL     2    0.
02
i j 2 30  30 30  30 2
  30
qij 0  2  1 4 2
Pi  qij   Here Pi  PS   0   2    0.06 6
i j 2 30  30  30  2  3 0
q ij 1  0  1 8 4
Pi  q ij   Here Pi  PT    2      0.1 3 3
i j 2 30  30  30  2  3 0
q ij 0  3 1  1 4 2
Pi  qij   Here Pi  PV     2    0.0 6 6
i j 2 30  30 30  30  2  30
Calculate the matrix for eij ; eij = Pi2 for i = 0 and eij =2PiPj i ≠ 0 14
PA 
A I L S T V 30
2
A 
14 
2
PI 
30 30
I 
2 14  30
2  30
2
2
6
30 PL 
L 30
 30 30  30 6 30  6 30
2
2 14 6 2 2
2
PS 
S  30 30   30 2 30  30 2 30  2 30
2
2 14 2 2 2 2 6 30
 4
30 30  30 4 30  30 4 30  30 4 30  4 30
2
T 2 14 4 2 2 2 6 2 2 PT 
30
 30 30  30 2 30  30 2 30  30 2 30  30 2 30  2 30
2
V 2 14 2 2 2 2 6 2 2 2 4
2
PV 
30
qij 0.366
L odds
Log dd ratio i S ij  log
ti is l 2 , S AA  log
l 2  log 1 6837  0.7516
l 2 1.6837 0 7516
14 30 
2
eij
BLOSUM value for AA (the first diagonal element of the BLOSUM matrix)
= round (2(0.7516)) = 2
• Two
T sequences off similar
i il or variable
i bl length
l th
• Write each letter of one sequence in a row
• Write each letter of the other sequence in
column
• Start filling boxes where there is a letter for
sequence 1 in 2 for 2 in 1
• Interpret the plot

Example: Align two sequences globally AGCT and GCT
Seq 1 G T A C A T G
Seq 2 T A G A T G
S 1
Seq G A T T C T A T C T A A C T
Seq 2 G T T C T A T T C T A A C
G T A C A T G
T
A
G
A
T
G
Example: Align two sequences globally
Seq 1 G A T T C T A T C T A A C T
G A T T C T A T C T A A C T
G
T
T
C
T
A
T
T
C
T
A
A
C
Dot plots with thresholds
• If you colour in all cells with an identical letter, some dots

may be due to chance similarities.
• Therefore, it is common to use a threshold to decide whether
to plot a 'dot' in a cell.
• A window of a certain size (eg. window size = 3) is moved up
all possible diagonals, one‐by‐one.
• A score is calculated for each position of the window on a
diagonal: the number of identical letters in the window.
• If the score is equal
eq al to or above
abo e the threshold (eg.,
(eg threshold
= score of 2), all the cells in the window are coloured in.
• The choice of values for the window size and threshold for
the dot plot are chosen by trial‐and‐error
Seq 1 G A T T C T A T C T A A C T
G
T
T
C
T
A
T
T
C
T
A
A
C
G
T
T
C
T
A
T
T
C
T
A
A
C
Seq 1 G A T T C T A T ‐ T C T A A C T
Seq 2 G ‐ T T C T A T C T C T A A C ‐
Advantages
• Good for identification of long regions if strong similarity
• Easyy to make and interpret
p
• Can be used for any length sequence
Disadvantages
• Need to find best window size
• Graphical representation doesn’t give information about
mutation
Needleman‐Wunch (global alignment)
We want to align two sequences x1. x2,….xn and y1, y2,…ym and
create an m x n matrix F where
 F  i  1, j  1  Sij (match/mismatch in the diagonal)

F  i, j   max  Fi 1, j  d (gap in sequence 1)
F  d (gap in sequence 2)
 i , j 1
with F  0,0   0, F  i,0   id , F  0, j    jd
This is the recursive relation in dynamic programming algorithm.
In the tabular computation we start in cell (0, 0) and calculate one
row at a time. In each cell (I, j) we keep a pointer to the optimal
previous position, given the current one. j
i, j
Example: Align two sequences globally GAATTCAGTTA and GGATCGA
Answer: Seq 1 G A A T T C A G T T A
Seq 2 G G A T C G A
Length of Seq 1 i.e. M = 11 and Seq 2 i.e. N = 7

The simple scoring scheme is assumed where
Sij =1 if the residue at position i of seq 1 is the same as the residue at
position
iti j off Seq
S 2 (match
( t h score))
Sij = 0 (mismatch score)
D = 0 (gap penalty) G A A T T C A G T T A
G 1 0 0 0 0 0 0 1 0 0 0
G 1 0 0 0 0 0 0 1 0 0 0
A 0 1 1 0 0 0 1 0 0 0 1
T 0 0 0 1 0 0 0 0 1 1 0
The matrix value C 0 0 0 0 0 1 0 0 0 0 0
are to be used in G 1 0 0 0 0 0 0 1 0 0 0
the next step A 0 1 1 0 0 0 1 0 0 0 1
Three steps in dynamic programming
1. Initialization
2 Matrix
2. M t i fill (Scoring)
(S i )
3. Traceback (Alignment)
1 Initialization:
1. i i li i C
Create M +1
1 column
l and
d N + 1 rows
Seq 1 G A A T T C A G T T A
Seq 2 G G A T C G A
G GA A A A T T T T C C A A GG TT TT A
G 0 0 0 0 0 0 0 0 0 0 0 0
G
G 0
G
A 0
AT 0
TC 0
CG 0
G
A 0
A 0
2. Matrix fill (Scoring) j
i G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
F (0,0)
G 0 1
F (0,1)
G 0
F (1,0) A 0
F (1,1) T 0
F (1,2) C 0
F (2,1) G 0
A 0
 F  i  1, j  1   S ij (m atch/m ism atch in the diagonal)

F  i , j   m ax  Fi 1, j  d (gap in sequence 1)
F
 i , j 1  d (gap in sequence 2 )
F o r p o s itio n 1, 1   S ij  S 1 ,1  1 ( S in c e G is p r e s e n t in b o th s e q )
w ( g a p p e n a lty )  0 . T h u s , F1 ,11 = M a x  F i  1 , j  1  1,
1 F1  1 ,11  0 , F1 ,11  1  0 
T h u s , F1 ,1 = M a x  F 0 , 0  1, F 0 ,1  0 , F1 , 0  0  = M a x  0  1, 0  0 , 0  0 
 M a x 1, 0 , 0  . 1 is th e la r g e s t v a lu e is p la c e d in 1 s t p o s itio n
2. Matrix fill (Scoring)
 F  i 1,
1 j 1  Sij (match/mismatch in the diagonal)

F  i, j   max  Fi1, j  d (gap in sequence 1)
 i, j 1
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
3. Traceback (Alignment): It begins in the M, J position in the matrix ,
i.e. the position that leads to the maximal score. Here it is 6.
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
A match or a Either a Deletion in seq 1 Either a Insertion in seq
substitution or Insertion in seq 2 1 or Deletion in seq 2
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 1 2 2 2 2 2 2 2 2 3
T 0 1 2 2 2 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 3 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
Seq 1 G ‐ A A T T C A G T T A
Seq 2 G G ‐ A ‐ T C ‐ G ‐ ‐ A
An advance Scoring Scheme
The sequences are treated with an advanced scoring scheme
where it is assumed that Length of Sequence 1 i.e. M and
Sequence 2 i.e. N
1. Sij =2 if the residue at position i of seq 1 is the same as the

residue at position j of Seq 2 (match score)
2 Sij = ‐1
2. 1 (mismatch
( i h score))
3. W = ‐2 (gap penalty)
The Simple Scoring Scheme An Advance Scoring Scheme
=1 if the residue at position i of =2 if the residue at position i
seq 1 is the same as the of seq 1 is the same as the
Sij
residue at position j of Seq 2 residue at position j of Seq 2
(match score) (match score)
Sij = 0 (mismatch score) = ‐1
1 (mismatch score)
D/W = 0 (gap penalty) = ‐2 (gap penalty)
1. Initialization
2.. Matrix
at fill (Sco
(Scoring)
g)
Example: Align two sequences globally GAATTCAGTTA and GGATCGA
A
Answer: Seq 1 G A A T T C A G T T A
Seq 2 G G A T C G A
1. Initialization: Create M +1 column and N + 1 rows
G0 0 0 0 0 0 0 0 0 0 0 0
G G0
G A0
A T0
T C0
C G0
G A0
A 0
2. Matrix fill (Scoring) j
i G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
F (0,0)
G 0 2
F (0,1)
G 0
F (1,0) A 0
F (1,1) T 0
F (1,2) C 0
F (2,1) G 0
A 0
 F  i  1, j  1   S ij (m a tc h /m is m a tc h in th e d ia g o n a l)

F  i , j   m a x  Fi 1, j  d (g a p in s e q u e n c e 1)
F
 i , j 1  d (g a p in s e q u e n c e 2 )
F o r p o s itio n 1,1   S ij  S 1,1  2 (S in c e G is p re s e n t in b o th s e q )
w ( g a p p e n a lty )   2 . T h u s , F1 ,1 = M a x  F1, 0  2 , F 0 ,1  2 , F 0 ,1  2 
 M a x  2 ,  2 ,  2  . 2 is th e la rg e s t v a lu e is p la c e d in 1 s t p o s itio n
 F  i  1, j  1  Siijj (match/mismatch in the diagonal)

F  i, j   max  Fi 1, j  d (gap in sequence 1)
(gap
 i , j 1
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 1 ‐1 ‐1
G 0 ‐2 1 1 ‐2 ‐2 ‐2 ‐2 1 1 ‐1 ‐2
A 0 0 ‐4 3 1 ‐1 ‐3 0 ‐1 0 0 1
T 0 ‐1
1 2 3 5 3 1 ‐1
1 ‐1
1 1 2 0
C 0 ‐1 0 1 3 ‐4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 ‐2 3
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 0 ‐1 ‐1
G 0 2 1 1 ‐2 ‐2 ‐2 ‐2 1 1 ‐1 ‐2
A 0 0 ‐4
4 33 1 ‐1
1 ‐3
3 0 ‐1
1 0 0 1
T 0 ‐1 2 3 5 3 1 ‐1 ‐1 1 2 0
C 0 ‐1 0 1 3 ‐4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 ‐2 3
(Alignment) G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 0 ‐1 ‐1
G 0 2 1 1 ‐22 ‐22 ‐22 ‐22 1 1 ‐11 ‐2
2
A 0 0 4 3 1 ‐1 ‐3 0 ‐1 0 0 1
T 0 ‐1
1 2 3 5 3 1 ‐1 1 ‐1
1 1 2 0
C 0 ‐1 0 1 3 4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 2 3
Seq 2 G G A T ‐ C ‐ G ‐ ‐ A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 0 ‐1 ‐1
G 0 ‐2 1 1 ‐2 ‐2 ‐2 ‐2 1 1 ‐1 ‐2
A 0 0 ‐4
4 3 1 ‐11 ‐33 0 ‐11 0 0 1
T 0 ‐1 2 3 5 3 1 ‐1 ‐1 1 2 0
C 0 ‐1 0 1 3 ‐4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 ‐2 3
(Alignment) G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 2 0 ‐1 ‐1 ‐1 ‐1 ‐1 2 0 ‐1 ‐1
G 0 ‐2
2 1 1 ‐2
2 ‐2
2 ‐2
2 ‐2
2 1 1 ‐1
1 ‐2
2
A 0 0 ‐4 3 1 ‐1 ‐3 0 ‐1 0 0 1
T 0 ‐1
1 2 3 5 3 1 ‐1
1 ‐1
1 1 2 0
C 0 ‐1 0 1 3 ‐4 5 3 1 ‐1 0 1
G 0 2 0 ‐1 1 2 3 ‐4 5 3 1 ‐1
A 0 0 4 2 0 0 1 5 3 4 ‐2 3
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A
Aligning of sequences by the Simple Scoring Matrix
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A
Seq 1 G ‐ A A T T C A G T T A
Seq 2 G G ‐ A ‐ T C ‐ G ‐ ‐ A
Aligning of sequences by an Advance Scoring Matrix
Seq 2 G G A T ‐ C ‐ G ‐ ‐ A
Seq
q2 G G A ‐ T C ‐ G ‐ ‐ A
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A
Test to make sure the result of a valid score for alignment
Rememberingg that the scoringg scheme is +2 for a match,, ‐1 for a mismatch,,
and ‐2 for a gap, both the sequences can be tested to make sure that they
result in a score of 3.
Aligning of sequences by the Simple Scoring Matrix
Seq 1 G A A T T C A G T T A = ‐1 +2 +2
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A = ‐11 + 4 = 3
+2 ‐1 +2 ‐2 +2 +2 ‐2 +2 ‐2 ‐2 +2
= ‐1 +2 +2
Seq 2 G G A ‐ T C ‐ G ‐ ‐ A = ‐1 + 4 = 3
+2 ‐1 +2 ‐2 +2 +2 ‐2 +2 ‐2 ‐2 +2
1. Initialization
2.. Matrix
at fill (Sco
(Scoring)
g)
Example: Align two sequences globally AGCT and GCT
Answer: Seq 1 A G C T
Seq 2 G C T
Rule:
1. Match = 1; 4. Box besides (+ gap)
2. Mismatch = ‐1; 5. Box top (+ gap)
3. Gap = ‐2 6. /
Diagonal box ( Match/Mismatch)
A A
G CG TC T 0 + ‐1 = ‐1 ‐2 +‐ 1 = ‐3
G 0 ‐2 ‐4 ‐6 ‐8
‐2 +‐1=‐3 0 + ‐1 = ‐1
CG ‐2 ‐1 ‐1 ‐3 ‐5
TC ‐4
4 ‐3
3 ‐2
2 0 ‐2
2
T ‐6 ‐5 ‐4 ‐2 1
1. Initialization
A G C T
0 ‐2 ‐4 ‐6 ‐8
G ‐2
2 ‐1
1 ‐1
1 ‐3
3 5
‐5
C ‐4 ‐3 ‐2 0 ‐2
T ‐6 ‐5 ‐4 ‐2 1
A match or a Either a Deletion in seqq1 Either a Insertion in seq

Seq 1 A G C T
Seq 2 ‐ G C T
Smith‐Waterman (Local Alignment)
Sometimes we want to find the conserved region in protein domain

and not align the entire sequences.
This method is useful for comparing the following:
1 Protein sequences that share a common motif (conserved
1.
pattern) or domain (independently folded unit) but differ
elsewhere.
2. Protein sequences against genomic DNA sequences (long
stretches of uncharacterized sequences).
3. DNA sequences that share a similar motif but differ elsewhere.
4. It is sensitive when comparing the highly diverged sequences.
1. Initialization: The first row and first column initialized with 0’s
2 Matrix fill (Scoring): Calculate the F(i,j) value
2.
3. Traceback (Alignment): Starts counting trace back in the cell with
highets score and then reach to the cell (0, 0)
Example: Align two sequences locally AAGA and TTAAG
 F  i  1, j  1   S ( x i , y j )
Answer: Seq 1 A A G A 
F  i , j   m ax  Fi 1 , j  d
Seq 2 T T A A G
F
 i , j 1  d
1. Initialization: Create M +1 column and N + 1 rows
AA AA GG A
T 0 0 0 0 0
T 0
T
T 0
A
A 0
A
A 0
G G 0
 F  i  1, j  1   S ( x i , y j )
Seq 1 A A G A 
F  i , j   m ax  Fi 1 , j  d
Seq 2 T T A A G F
 i , j 1  d
A A G A
0 0 0 0 0
T 0 0 0 0 0
T 0 0 0 0 0 Alignment
A 0 1 1 0 1 S 1
Seq A A G
Seq 2 A A G
A 0 1 2 0 1
G 0 0 0 3 1
Needleman‐Wunch Smith‐Waterman
1 Global alignment Local alignment

2 Residue alignment score may be positive or Require alignment score for a
negative pair of residues to be ≥ 0
3 No gap penalty required Require a gap penalty to work
effectively
4 Negative score weight must be given to No such score is assigned
mismatches so that the score drops as more
and more mismatches are added
5 Compares sequences and gives best overall Finds regions of ungapped
alignment sequence with a high degree of
similarity

BLOSUM - Dot Plot - Needleman & Wunch - Smith & Waterman Matrix Filling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BLOSUM - Dot Plot - Needleman & Wunch - Smith & Waterman Matrix Filling

Uploaded by

Copyright:

Available Formats

BLOcks Substitution Matrix

• Matrix built from blocks with not >x% similarity is called

•Calculate the log odds ratio in each column of each block.

(1) Cij (2) T , (3) qij ,

Value Cijin form of Matrix

• Write each letter of one sequence in a row

• Write each letter of the other sequence in

• Start filling boxes where there is a letter for

• Interpret the plot

• If you colour in all cells with an identical letter, some dots

Length of Seq 1 i.e. M = 11 and Seq 2 i.e. N = 7

1. Sij =2 if the residue at position i of seq 1 is the same as the

A match or a Either a Deletion in seqq1 Either a Insertion in seq

Sometimes we want to find the conserved region in protein domain

1 Global alignment Local alignment

You might also like