Bioinformatics 04

Bioinformatics
Lecture -04
Nadia Afrin Ritu

Lecturer, Computer Science and Engineering
Chapter 2: The Needleman and Wunsch Algorithm
The Needleman and Wunsch Algorithm
Global Alignment
✓Dynamic Programming
Problems Sub-problems Optimal solution
How does dynamic programming work?
A divide-and-conquer strategy:
• Break the problem into smaller subproblems.
• Solve the smaller problems optimally.
• Use the sub-problem solutions to construct an optimal solution for
the original problem.
Dynamic Programming vs exhaustive search
✓ In Exhaustive search, all possible alignments is generally not feasible. As the
lengths of the sequences grow, the number of possible alignments to search
quickly becomes intractable or impossible to compute in a reasonable amount of
time. We can overcome this problem using dynamic programming.
✓Dynamic programming is used for optimal alignment of two sequences.
✓ It finds the alignment in a more quantitative way by giving some scores for
matches and mismatches (Scoring matrices), rather than only applying dots. By
searching the highest scores in the matrix, alignment can be accurately obtained.
✓ The Dynamic Programming solves the original problem by dividing the problem
into smaller independent sub problems.
✓ Needleman-Wunsch and Smith-Waterman algorithms for sequence alignment are
defined by dynamic programming approach.
Scoring matrices
-In optimal alignment procedures, mostly Needleman-Wunsch and Smith-Waterman

algorithms use scoring system.
- For nucleotide sequence alignment, the scoring matrices used are relatively simpler
since the frequency of mutation for all the bases are equal.
- Positive or higher value is assigned for a match and a negative or a lower value is
assigned for mismatch. These assumption based scores can be used for scoring the
matrices.
Scoring matrices
Mainly used predefined matrices are PAM and BLOSUM.
• PAM Matrices: Margaret Dayhoff was the first one to develop the PAM matrix,
PAM stands for Point Accepted Mutations. PAM matrices are calculated by
observing the differences in closely related proteins. One PAM unit (PAM1)
specifies one accepted point mutation per 100 amino acid residues, i.e. 1% change
and 99% remains as such.
• BLOSUM: Blocks Substitution Matrix, developed by Henikoff and Henikoff in

1992, used conserved regions. These matrices are actual percentage identity
values. Simply to say, they depend on similarity. Blosum 62 means there is 62 %
similarity.
Scoring Scheme:
+1 for every match
-1 for mismatch
-10 for gap penalty
The Needleman-Wunsch algorithm consists of three steps:
1. Create 2D matrix
2. Trace Back
3. Final Alignment
© Bioinformatics| Nadia Afrin Ritu

Continued
© Bioinformatics| Nadia Afrin Ritu

Consider the simple example
• The alignment of two sequences SEND and AND. Find the best
alignment between two sequences using Needleman and Wunsch
Algorithm.
The score and traceback matrices
Initialization
Scoring
Continued
The Needleman-Wunsch progression
The value of C(2, 2)
Filling the score and traceback matrices
The progression is recursive
The final score and traceback matrices
The traceback
• Traceback = the process of deduction of the best alignment from the
traceback matrix.
• The traceback always begins with the last cell to be filled with the
score, i.e. the bottom right cell.
• One moves according to the traceback value written in the cell. There
are three possible moves: diagonally (toward the top-left corner of
the matrix), up, or left.
• The traceback is completed when the first, top-left cell of the matrix
is reached (”done” cell).
The traceback path
The traceback step-by-step (1)
• The first cell from the traceback path is ”diag” implying that the
corresponding letters are aligned:
D
D
• The second cell from the traceback path is also ”diag” implying that
the corresponding letters are aligned:
ND
ND
• The third cell from the traceback path is ”left” implying the gap in the
left sequence (i.e. we stay on the letter A from the left sequence):
END
-ND
• The fourth cell from the traceback path is also ”diag” implying that
the corresponding letters are aligned. We consider the letter A again,
this time it is aligned with S:
SEND
A-ND
Compare with the exhaustive search
• The best alignment via the Needleman-Wunsch algorithm:
SEND
A-ND
-The exhaustive search:
SEND
-AND score: +1
A-ND score: +3 ← the best
AN-D score: -3
AND- score: -8
Summary
• The alignment is over the entire length of two sequences: the
traceback starts from the lower right corner of the traceback matrix,
and completes in the upper left cell of the matrix.
• The Needleman-Wunsch algorithm works in the same way regardless
of the length or complexity of sequences, and guarantees to find the
best alignment.
• The Needleman-Wunsch algorithm is appropriate for finding the best
alignment of two sequences which are (i) of the similar length; (ii)
similar across their entire lengths.
Continued

Bioinformatics 04

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bioinformatics 04

Uploaded by

Copyright:

Available Formats

Bioinformatics

Nadia Afrin Ritu

The Needleman and Wunsch Algorithm

-In optimal alignment procedures, mostly Needleman-Wunsch and Smith-Waterman

• BLOSUM: Blocks Substitution Matrix, developed by Henikoff and Henikoff in

The Needleman-Wunsch algorithm consists of three steps:

© Bioinformatics| Nadia Afrin Ritu

© Bioinformatics| Nadia Afrin Ritu

You might also like