You are on page 1of 2

There are three different types of sequence alignment:

Global alignment - for example Needleman-Wunsch - used to align proteine or


nucleotide sequences; dynamic programming
- attempt to align every residue in every sequence
- most useful when the sequences in the query set are similar and of roughly equal
size.

Local alignment- Smith-Waterman algorithm


-Involving stretches that are shorter than the entire sequences, possibly more than
one.
-Suitable when comparing substantially different sequences, which possibly differ
significantly in length, and have only a short patches of similarity

Multiple sequence alignment:


- Simultaneous alignment of more than two sequences.
- Suitable when searching for subtle conserved sequence patterns in a protein
family, and when more than two sequences of the protein family are available.

The scoring scheme -set of rules which assigns the alignment score/goodness of
alignment/ to any given alignment of two sequences But it does not tell us
how to find the best alignment!

Evolutionary substitution matrices:


PAM (”point accepted mutation”) family - PAM250, PAM120, etc.
BLOSUM (”Blocks substitution matrix”) family - BLOSUM62, BLOSUM50, etc.
The substitution scores of both PAM and BLOSUM matrices are derived from the
analysis of known alignments of closely related proteins.
The BLOSUM matrices are newer and considered better.

The differences between PAM and BLOSUM:


PAM
BLOSUM
Based on global alignments of closely related proteins
Based on local alignments.
PAM1 is the matrix calculated from comparisons of sequences with no more than 15%
divergence but corresponds to 99% sequence identity.BLOSUM 62 is a matrix
calculated from comparisons of sequences with a pairwise identity of no more than
62%.
Other PAM matrices are extrapolated from PAM1.Based on observed alignments; they
are not extrapolated from comparisons of closely related proteins.
Higher numbers in matrices naming scheme denote larger evolutionary distance.
Larger numbers in matrices naming scheme denote higher sequence similarity
and therefore smaller evolutionary distance.[19]

Gaps - corresponds to an insertion or a deletion of a residue


A conventional wisdom dictates that the penalty for a gap must be several times
greater than the penalty for a mutation. That is because a gap/extra residue
-Interrupts the entire polymer chain
-In DNA shifts the reading frame

--------------------
BWT-Transormation -> compression:
idea: compress RRRRBBBBTTT as 4R4B3T
BUT: There are no many clustered letters in the genome(AGCTAGCT)
=> here comes BWT

Algorithm:
Take the string and sort all circular shifts of it; BANANA: / or BANANA$ for suffix
trees/
ABANAN
ANABAN
ANANAB
BANANA
NABANA
NANABA

- Notice that every row AND column is a permutation of the string.


- we are interested in the last column/NNBAAA/ -> noticed that its clustered, its
not a coincidence!
- we store the word and its position (4)
- now we make a new table, where we repeatedly write the word NNBAA and sort it in
every column:
First step -> write the word and sort
A
A
A
B
N
N
Second step -> prepend the word again
NA
NA
BA
AB
AN
AN

Third step -> sort; noticed we've got the first two columns of the initial table
AB
AN
AN
BA
NA
NA

- repeat until you get the initial table


- the original string is at the row number which we have stored above(4)

You might also like