Presentationforblast Algorithm

GROUP MEMBERS
NAME
A.K.M.ASADUZZAMAN
KOUSHIK ROY
MD.ZAHID HASAN
MD.ASIF-AL-FAHAD
BLAST
Introduction
Suppose you have acquired a DNA/Protein sequence derived
from a sample of some environments such as lake, pond or plant.
Sequencing process
KLMNTRARLIVHISG
LTRK…………………………
……………………
Your sequence
Cell Samples
Introduction
• Or you might get a DNA/Protein sequence from a database such as
NCBI/EMBL/Swiss-Prot. You might also find an interesting
gene/sequence from a journal.
KLMNTRARLIVHISG
LTRK…………………………
……………………
Your sequence
Introduction
• In that case, you might want to know if the sequence that you have,
already exists or is similar to some sequences in a database, may be
down to a particular organism database.
Your sequence
Already in Similar?
here?
• Why do you want to know that?

• Because you can infer structural, functional and evolutionary
relationship to your query sequence.
Sequence Alignment
In bioinformatics, a sequence alignment is a way of arranging the
sequences of DNA, RNA, or protein to identify regions of similarity that
may be a consequence of functional, structural, or evolutionary
relationships between the sequences.
Type Of Alignment
1. Local Alignment
2. Global Alignment
Your Sequence Unknown Sequence
KLMNTRARLIVHISGLTRK
What is this Sequence? Where does it come from?
????????????????????????????
Introducing BLAST (Basic Local
Alignment Search Tool)
 BLAST tool is used to compare a query sequence with a
library or database of sequences.
 It uses a heuristic search algorithm based on statistical
methods. The algorithm was invented by Stephen
Altschul and his co-workers in 1990.
 BLAST programs were designed for fast database
searching.
BLAST Algorithm
BLAST Algorithm
BLAST Algorithm
BLAST Algorithm (Protein)
Query Sequence
Length 11 Y A N C L E H K M G S
L E H
E H K
W= 3
Y A N H K M
H K M
H K M
This generates 11 – 3 + 1 = 9 words
BLAST Algorithm Example
For each word from a window = 3, generate neighborhood words using
BLOSUM62 matrix with score threshold = 11
3 Amino Acids
L E H 20 x 20 x 20 203 alignments
17 L E H
13 L K H
Score 12 C E H
threshold 11 QE H All aligned with
(cut off here) LEH using
10 L MH
BLOSUM62
9 L F H (then sorted by
scores)
9 L E R
9 D EH
...
Sorted by scores
L E H Database sequences
L E H
Word List L E H
L K H
L E H L K H
L K H L E H C E H
L E H C E H
C E H
QEH
QE H
DAPCQEHKRGWPNDC
Exact matches of words from the word list to the database sequences
For each exact word match, alignment is extended in both directions
to find high score segments. Query = Y A N C L E H K M G S
Extended in the right direction Max drop off score X= 2
Q E H K MG S
D A P C Q E H K R G W P N D C
5 5 8 5 -1 6 -3
5 10 18 23 22 28 25
30
Score drop = 3 > X
Accumulated Score
25
Score drop = 1 <= X
20
Trim to max
15
10
5
0
Q-Q E-E H-H K-K M-R G-G S-W
For each exact word match, alignment is extended in both directions
to find high score segments. Query = Y A N C L E H K M G S
Extended in the left direction Max drop off score X= 2
Y A N CQ E H K MG
D A P C Q E H K R G W P N D C
-3 4 -2 9 5 5 8
26 2925 27 18 13 8
35
Accumulated Score
30
Score drop = 3 > X
Score drop = 2 <= X
25
20
15
10
5
0
H-H E-E Q-Q C-C N-P A-A Y-D
Maximal Segment Pair (MSP)
A N C Q E H K M G
A P C Q E H K R G
4 -2 9 5 5 8 5 -1 6
Pair Score = 4-2+9+5+5+8+5-1+6 = 39

BLOSUM62
Scoring Matrix
55
51
45
Maximal Segment Pairs 42
(MSPs) from other A N C Q E H K M G 39
seeds A P C Q E H K R G
37
35
33
Sorted by alignment
Each match has its own E-Value
scores
BLAST Algorithm
Expect Value (E-Value)
 E-Value: The number of MSPs with similar score or
higher that one can EXPECT to see by chance alone
when searching a database of a particular size.
BLAST Algorithm
 For example: if the E-Value is equal to 10 for a
particular MSP with score S, one can say that
actually…about 10 MSPs with score >= S can just
happen by chance alone (for any query sequence).
 So most likely that our MSP is not a significant match
at all.
BLAST Algorithm
 If E-Value if very small e.x. 10-4 (very high score S), one
can say that it is almost impossible that there would be
any MSP with score >= S.
 Thus, our MSP is a pretty significant match
(homologous).
BLAST Algorithm
E-Value Calculation
 First: Calculate bit score
S = Score of the alignment (Raw Score)

 , values depend on the scoring scheme and
sequence composition of a database.
[log value is natural logarithm (log base e)]
BLAST Algorithm
 The lower the E-Value, the better.
 E-Value can be used to limit the number of hits in the
result page.
BLAST Algorithm
E-Value Calculation
 Second: Calculate E-Value
 = Bit Score
 m = query length
 n = length of database
BLAST Algorithm
E-Value Interpretation
• E-values of 10-4 and lower indicate a significant
homology.
• E-values between 10-4 and 10-2 should be checked

(similar domains, maybe non-homologous).
• E-values between 10-2 and 1 do not indicate a good

homology
Gapped BLAST
 The Gapped BLAST algorithm allows gaps to be
introduced into the alignments. That means similar
regions are not broken into several segments.
 This method reflects biological relationships much
better.
 This results in different parameter values when
calculating E-Value ( , ).
BLAST programs
Name Description
Blastp Amino acid query sequence against a protein database
Blastn Nucleotide query sequence against a nucleotide sequence database
Blastx Nucleotide query sequence translated in all reading frames against a

protein database
Tblastn Protein query sequence against a nucleotide sequence database

dynamically translated in all reading frames
Tblastx Six frame translations of a nucleotide query sequence against the

six-frame translations of a nucleotide sequence database.
BLAST programs
Name Common Word Size
Blastp 3
Blastn 11
Blastx 3
Tblastn 3
Tblastx 3
BLAST Suggestion
 Where possible use translated sequence (Protein).
 Split large query sequence (if > 1000 for DNA, >200 for
protein) into small ones.
 If the query has low complexity regions or repeated
segments, remove them and repeat the search.
IVLKVALRPVLRPVLRPVWQARNGS
Repeated segments might confuse the program to find

the ‘real’ significant matches in a database.

Presentationforblast Algorithm

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentationforblast Algorithm

Uploaded by

Copyright:

Available Formats

GROUP MEMBERS

• Why do you want to know that?

What is this Sequence? Where does it come from?

Pair Score = 4-2+9+5+5+8+5-1+6 = 39

S = Score of the alignment (Raw Score)

• E-values between 10-4 and 10-2 should be checked

• E-values between 10-2 and 1 do not indicate a good

Blastp Amino acid query sequence against a protein database

Blastn Nucleotide query sequence against a nucleotide sequence database

Blastx Nucleotide query sequence translated in all reading frames against a

Tblastn Protein query sequence against a nucleotide sequence database

Tblastx Six frame translations of a nucleotide query sequence against the

Repeated segments might confuse the program to find

You might also like