Professional Documents
Culture Documents
Motivation for and challenge of MSA Sum of Pairs (SP) method Progressive MSA: ClustalW algorithm
Baeyer-Villiger monooxygenases (BVMOs) - taken from Fraaije, et al (2002) FEBS Letters 518:43-47
Carillo-Lipman Bound
m=2 A T
0 0 2 1 1
m=3 T
0 0 2 2 2
G
0 0 1 3 3
Carillo-Lipman Bound
0 1 0 0 0
A T G G
0 0 0 0
A
A T G G
- where mik is the kth entry in the ith column and mil is lth entry in ith column
T - G C - G - A G C T G - A G C - G
T G C - G A G C T G A G C - G
T - G C - G - A G C T G - A G C - G
m1 m2 m3 m4 m5 m6
T G C - G A G C T G A G C - G
m1 m2 m3 m4 m5
-4 -3 3 3 -4 3 SP(M) = -2
-1 3 3 -4 3 SP(M) = 4
Using the simplified substitution matrix, the Sum of Pairs method ranks the second alignment as the higher scoring alignment
S1 S2 S3 S4
= = = =
We wish to use the SP method to score the following alignment of these four sequences:
AQPILLLV ALR-LL-AK-ILLLDPPVLILV
Use BLOSUM62 scoring matrix for scoring matches/mismatches and a gap score of -2 [ s(x.-) = s(-,y) = -2 ]
What is the score for the first column if we change the first letter of the last sequence from D to A -- SP(A,A,A,A)?
CLUSTALW Method
CLUSTALW is a progressive method for multiple sequence alignment A progressive MSA method starts by doing pair-wise alignments of all sequences to determine the most related sequences, and aligns these sequences The progressive MSA method then progressively adds less related sequences or groups of sequences to the initial alignment Progressive MSA is similar in concept to hierarchical clustering of microarray data CLUSTAL comes in three versions: CLUSTAL: gives equal weight to all sequences CLUSTALW: includes weights for sequences CLUSTALX: provides a GUI to CLUSTAL
SP(AN ) " SP(AN "1,D) 6(N "1) 3 = = SP(AN ) (4(N)(N "1) /2) N
Would expect that the relative difference to increase with more evidence (sequences) we have for a conserved alanine residue
The SP method requires extensive search time: an alignment of N sequences of length L has an efficiency of O(LN2NN2) Even after truncating this search space with the Carillo-Lipman Bound, the SP method requires extensive search time for many or large size sequences
seq(i) seq(j)
NKL-EN -MLNEN
CLUSTALW Step 2
Construct a similarity tree (guide tree) The CLUSTALW packages uses the distance matrix and a technique called the Neighbor Joining method to construct the similarity tree If two or more sequences share a branch, this may indicate an evolutionary relationship between the sequences Length of each branch indicates the degree of sequence divergence
.17 .59 .59 .77 .81 .87 .60 .59 .77 .82 .86 .13 .75 .73 .86 .75 .74 .88 .80 .93 .90 -
CLUSTALW Step 3
Combine the alignments starting from the most closely related sequences/ groups to the most distantly-related sequences/groups by following the similarity/guide tree (from tip to root of guide tree) In the example we align the sequences in the following order: (1) align Hbb_Human and Hbb_Horse (group 1) (2) align Hba_Human with Hba_Horse (group 2) (3) align group 1 with group 2 (group 3) (4) align Myg_Phyca with group 3 (group 4) (5) align Glb5_Petma with group 4 (group 5) (6) align Lgb2_Luplu with group 5 -- Have reached the root of the tree