Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
0Activity
0 of .
Results for:
No results containing your search query
P. 1
Lecture 7

Lecture 7

Ratings: (0)|Views: 0 |Likes:
Published by Yogi Bhaskar

More info:

Published by: Yogi Bhaskar on Oct 10, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

10/10/2012

pdf

text

original

 
Lecture 7: Multiple Sequence Alignment (MSA)
Motivation for and challenge of MSASum of Pairs (SP) methodProgressive MSA: ClustalW algorithm
Some of the notes are derived from slides by Dr. Donald R. Williamson at the University of DelawareSome slides adapted from slides created by Dr. Keith Dunker 
What is Multiple Sequence Alignment?
§ Multiple sequence alignment is the alignment of N sequences (aminoacids/nucleotides), where N > 2§
Goal:
to write each sequence along the others to express any similaritybetween the sequencesEach element of sequence is either placed alongside a correspondingelement in the other sequences and/or a gap character § Example:
TGCG
,
 AGCTG
, and
 
 AGCG
can be aligned as follows:
T-GC-G-AGCTG-AGC-G
§ Problems:How do we efficiently find this alignment?Can we find a better alignment?
Species 3: --
TGGACGTTATCACAGTTTGTCCG
----------Gene
CGATATGC
GGACSWTAT
§ The addition of related sequences to a pair-wise alignment facilitates theidentification of subsequences of high functional importance. Why?§ Example: Aligning genomic sequence of related organisms to identify conservednon-coding regulatory sequence elements
Species 1: -
GGGACGATATGCAATATGAAATT
-----------GeneSpecies 2: -
GACGCGATATGCTCCGATTAAGT
-----------GeneNot always so easy to pick out conserved motifs. How about this example?Species 1: --
GGGACGATATGCAATATGAAATT
----------GeneSpecies 2: --
 AGGACCTTATATATTAGCAATGT
----------Gene
Motivation for Multiple Sequence AlignmentMotivation for Multiple Sequence Alignment (continued)
§ Multiple sequence alignment can find biologically important sequencesimilarities that may be widely dispersed or hidden in the sequences§ Multiple sequence alignment can provide information about theevolutionary history of the respective sequences§ Multiple sequence alignment can give insight into the basis for sequencesimilarities between homologous sequence 
 
Example of a Multiple Sequence Alignment (MSA)
Baeyer-Villiger monooxygenases (BVMOs)
 
- taken from Fraaije, et al (2002)
 FEBS Letters
 
518
:43-47
Efficiently Computing a MSA: The Complexity Problem
§ Adding additional sequences results in an exponential increase inthe number of computations required to find the optimal alignment§ For 
m
-wise comparisons, even the dynamic programming methodsquickly break down§
Example
: number of comparisons made to align
m
protein sequences,each 300 amino acids in length• m = 2: 90,000 comparisonsm = 3: 2.7 x 10
7
comparisonsm = 4: ~ 8 x 10
9
comparisonsm = 5: ~ 2.4 x 10
12
comparisons
 A Solution: Dynamic Programming (DP) and theCarrillo-Lipman Bound
§ In the pair-wise Dynamic Programming sequence alignment method, thesolution path usually fell within a small area around the diagonal in thesequence vs sequence matrix§ If we extend this idea to MSA, we have a multi-dimensional figure(hypercube) instead of a plane (N x N) figure§ The Carrillo-Lipman Bound is a procedure to provide a bound in the formof a polyhedron around the diagonal in the hypercube§ This Bound limits the search space for finding the optimal MSA of a setof sequences, leading to a large increase in search efficiency
Method:
use DP for MSA, but limit search space using Carillo-Lipman Bound
Dynamic Programming with Carillo-Lipman Bound
32100 32100 12200 00010 00000
 A T T G A TGG
Carillo-LipmanBound
m = 2
 A T T G A TGG
 A  T  C  G
m = 3
Carillo-LipmanBound
 
Scoring MSA’s: Sum of Pairs Method
§ To identify the optimal multiple sequence alignment, we need ascoring method§ The Sum of Pairs (SP) scoring method is as follows:
Given:
(1) A set of N aligned sequences each of length L in the formof a L x N MSA alignment matrix M(2) A substitution matrix (PAM or BLOSUM) that gives thescore s(x,y) for aligning two characters x,yThen the SP core SP(m
i
) for the i
th
column of M (denoted m
i
) iscalculated using the formula:- where
m
is the k
th
entry in the i
th
column and
m
is l
th
entry ini
th
column
 
SP
(
m
i
)
=
s
(
m
i
<
l
"
,
m
il
)
Sum of Pairs Method: DNA Example
§ The SP score for the complete alignment M is the sum of the scores for each column (m
i
) in the alignment:§ Example: we wish to align the following three DNA sequences:
S1 = TGCGS2 = AGCTGS3 = AGCG
§ We wish to use the SP method to score the following alignments of thesethree sequences: Alignment #1Alignment #2
T-GC-G TGC-G-AGCTG AGCTG-AGC-G AGC-G
 
SP
(
 M 
)
=
SP
(
m
i
)
i
"
 
Sum of Pairs: DNA Example
§ We will use the following simplified DNA substitution matrix:s(x,y)= 1: when x = y [match]s(x,y) = -1: when x
!
y [mismatch]• s(x,-) = -2: [gap]s(-,y)= -2: [gap]s(-,-) = 0: to prevent double counting of gaps§ We will construct the following matrices M for each alignment:
 
T - G C - G T G C - G- A G C T G A G C T G- A G C - G A G C - G
 m 
1
 m 
2
 m 
3
 m 
4
 m 
5
 m 
6
 m 
1
 m 
2
 m 
3
 m 
4
 m 
5
m
1
= s(
T,-
) + s(
T,-
) + s(
-,-
)m
1
= -2 + -2 + 0m
1
= -
 
4m
1
= s(
T,A 
) + s(
T,A 
) + s(
 A,A 
)m
1
= -1 + -1 + 1m
1
= -
 
1
Sum of Pairs: DNA Example
§ The SP score for each alignment is calculated by summing the individualscores for each column in the matrix
 
T - G C - G T G C - G- A G C T G A G C T G- A G C - G A G C - G
 
-4 -3 3 3 -4 3 -1 3 3 -4 3
§ Using the simplified substitution matrix, the Sum of Pairs method ranksthe second alignment as the higher scoring alignment
 m 
1
 m 
2
 m 
3
 m 
4
 m 
5
 m 
6
 m 
1
 m 
2
 m 
3
 m 
4
 m 
5
SP(M) =
-2
SP(M) =
4

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->