Published by Aayudh Das
Multiple Sequence Alignment (MSA)
Multiple Sequence Alignment (MSA)

Published by: Aayudh Das on Sep 07, 2013
Copyright:Attribution Non-commercial


©Aayudh Das
A multiple sequence alignment is a collection
of three or more protein (or nucleicacid) sequences that are partially or completely aligned.
Sum of Pair (SP) method-
Methods for applying multiple sequencealignment 
Three important methods are1.
Hidden Markov Model (HMMs).
Profiles express the
 patterns inherent in a multiple sequence alignment 
of a set of homologous sequences. They have several applications like -
They permit 
greater accuracy in alignments of distantly-related sequences
Sets of residues that are
highly conserved
likely to be part of the activesite
, and
give clues to function
The conservation patterns
facilitate identification of other homologoussequences
Patterns from the sequences are
useful in classifying subfamilies
within a set of homologues.5.
Set of residues that show
little conservation, and are subject to insertion anddeletion
, are likely to be in surface loops. This
information has been appliedto vaccine design
, because such regions are
likely to elicit antibodies that willcross-react well with the native structure
Working procedure-
The basic idea in using profile patterns in
identifying homologues is to match thequery sequence from the database against the sequences in the alignment table
higher weight to positions that are conserved
than to those that arevariable
.But one must not be too compulsive as in that case there is a
chance of missinginteresting distant relatives
A quantitative measure of conservation-For each position in the table of aligned sequences, take inventory of the distribution of amino acids.
It is evident that the
positions 26, 27 and 29 contribute
high score
disagreement at these positions contributes a very low score
For moderately conserved positions, such as position
, we want a modest 
positive contribution to the score if the query sequence has an S
or a
at this position, and a
smaller contribution if it has T or Y 
So the general idea is to score each residue from the query sequence based onthe amino acid distribution at that position in the multiple sequence alignment table.
A simple approach would be to use the inventories as scores directly.
The sequence VDFSAE would score 13+16+16+7+16+4=72
           
Thus we have to take inventory for each
query sequence
and will
have to test allpossible alignments
the multiple alignment table
, and take the
largest totalscore.
 It is obvious from these discussions that if the table contained a large and unbiasedsample of sequences then the inventory would provide the correct picture of thepotential distribution of residues at each position.With similar arguments we can say that if our sample were small, the pattern derivedwould be unlikely to reflect the complete repertoire.

