You are on page 1of 3

BASIC LOCAL ALIGNMENT SEARCH TOOL (BLAST)

The BLAST program was developed by Stephen Altschul of NCBI in 1990 and has since
become one of the most popular programs for sequence analysis. BLAST uses heuristics to align
a query sequence with all sequences in a database. The objective is to find high-scoring
ungapped segments among related sequences. The steps are represented in figure. BLAST is a
family of programs that includes BLASTN, BLASTP, BLASTX, TBLASTN, and TBLASTX
BLAST performs sequence alignment through the following steps.
Step I (seeding): To create a list of words from the query sequence. Each word is typically three
residues for protein sequences and eleven residues for DNA sequences.
Step II: To search a sequence database for the occurrence of these words by searching database
sequences containing the matching words.
Step III (scoring): The matching of the words is scored by a given substitution matrix. A word
is considered a match if it is above a threshold.
Step IV: It involves pairwise alignment by extending from the words in both directions while
counting the alignment score using the same substitution matrix. The extension continues until
the score of the alignment drops below a threshold due to mismatches (the drop threshold is
twenty-two for proteins and twenty for DNA).
STEP V: The resulting contiguous aligned segment pair without gaps is called high-scoring
segment pair (HSP). Statistical analysis is done for the alignment which produces the E-value.
The highest scored HSPs are presented as the final report.
Figure: Illustration of the BLAST procedure using a hypothetical query sequence matching with
a hypothetical database sequence. The alignment scoring is based on the BLOSUM62 matrix.
Output of BLAST: The BLAST output provides a list of pairwise sequence matches ranked by
statistical significance. The significance scores help to distinguish evolutionarily related
sequences from unrelated ones. In BLAST searches, this statistical indicator is known as the E-
value (expectation value), and it indicates the probability that the resulting alignments from a
database search are caused by random chance. BLAST compares a query sequence against all
database sequences, and so the E-value is determined by the following formula:
E = m × n × P, where m is the total number of residues in a database, n is the number of residues
in the query sequence, and P is the probability that an HSP alignment is a result of random
chance.
The E-value provides information about the likelihood that a given sequence matches is purely
by chance. The lower the E-value, the less likely the database match is a result of random chance
and therefore the more significant the match is.
A bit score is another prominent statistical indicator used in addition to the E-value in a BLAST
output. The bit score measures sequence similarity independent of query sequence length and
database size and is normalized based on the raw pairwise alignment score. The bit score (S) is
determined by the following formula:
S' = (λ × S − lnK)/ ln2, where λ is the Gumble distribution constant, S is the raw alignment
score, and K is a constant associated with the scoring matrix used. Clearly, the bit score (S') is
linearly related to the raw alignment score (S). Thus, the higher the bit score, the more highly
significant the match is.

You might also like