You are on page 1of 16

Bioinformatics

Dr Mohamed Abdelmoteleb
Lecturer of Microbiology - Bioinformatics
BLAST (The Basic Local Alignment Search Tool)

BLAST algorithm: Karlin Altschul algorithm


Find a common character-pattern in two sequences, and
use it as core, and extend the alignment in both
directions from the core
BLAST Programs
How to select a program:
– What type of query sequence you have
(nucleotide or protein)
– What type of database you want to search
against (nucleotide or protein)

Start With Compare against Use


Nucleotide sequence Nucleotide sequence database blastn

Protein sequence Protein sequence database blastp

Nucleotide sequence Protein sequence database blastx


Nucleotide sequence database
Protein sequence tblastn
(6 frame translations)
Nucleotide sequence Nucleotide sequence database
tblastx
(6 frame translations) (6 frame translations)
https://blast.ncbi.nlm.nih.gov/Blast.cgi
Important definitions

The RAW score is calculated as a sum of:


The match or mismatch scores between nucleic acid
bases OR amino acid residues (BLOSUM or PAM
scoring matrices)
the number of indels (insertions/deletions) gap
opening penalty
the total length of indels gap extension penalty.

The BIT score is the log2 of the raw score

Parameters λ and K depend on the substitution


matrix and the gap penalties (Altschul algorithm)
Important definitions

P-value: is the probability to obtain


by chance a score x at least equal
to S
P-val (S) = P(x ≥ S)
Important definitions

E-value (Expectation value): is a correction of P


value for multiple testing
In the context of database searches, E value is the
number of distinct alignments, with a score equivalent to
or better than S, that are expected to occur in a
database search by chance.
The lower the E value, the more significant the score is.

E-val (S) = P-val (S) * N

where N is the size of the search space (N = n*m where


n is the length of the query sequence and m is the
length of the database).
Important definitions
Max score = highest alignment score (bit-score)
between the query sequence and the database
sequence segment .

Total score = sum of alignment scores of all segments


from the same database sequence that match the
query sequence (calculated over all segments). This
score is different from the max score if several parts
of the database sequence match different parts of
the query sequence.

Query coverage = percent of the query length that is


included in the aligned segments.
Percent Sequence Identity:
Percent of identical matches between base pairs or
amino acids in pairwise sequence alignment
Percent Sequence Similarity:

There are amino acid changes. However, amino acid

changes tend to preserve the physico-chemical

properties of the original residue


– Polar to polar
• aspartate à glutamate
– Nonpolar to nonpolar
• leucine à valine
– Similar sized residues
• Glycine to alanine
Classification of Amino acids

• Acidic amino acid residues: • Aliphatic (oily, long chain) residues


– aspartic acid (D) and – leucine (L)
– glutamic acid (E) – isoleucine (I)
– valine (V)
• Basic (high pH) amino acid
– methionine(M)
residues:
– arginine (R)
• Aromatic residues
– triptophan (the largest residue, W)
– lysine (K) – phenylalanine (F)
– to a lesser extent histidine (H) – tyrosine (Y)
• Other polar (hydrophilic) • Small side chain
– asparagine (N) – glycine (the smallest residue, G)
– glutamine (Q) – alanine (A)
– serine (S) • Disulphide bridge forming
– threonine (T) – cysteine (C) , see also
selenocysteine
• Alpha-helix breaker, rigid structure
– proline (P)
Scoring matrices FYI

• Amino acid substitution matrices


– PAM
– BLOSUM

• DNA substitution matrices


– As a rule, DNA is much less conserved
than protein sequences
– Less effective to compare coding regions
at nucleotide level

FYI: For your information


PAM FYI: For your information

Point accepted mutation


PAM matrices are amino acid
substitution matrices that encode the expected
evolutionary change at the amino acid level.

A PAM matrix is a matrix where each column and


row represents one of the twenty standard
amino acids.
For any specific pair (Ai, Aj) of amino acids
PAM matrix reflects the frequency at which Ai is
expected to replace with Aj in two sequences
that are n PAM units diverged.
PAM matrix

FYI: For your information


BLOSUM

Blocks Substitution Matrix


– Scores derived from observations of the
frequencies of substitutions in blocks of
local alignments in related proteins
– Matrix name indicates evolutionary
distance
– BLOSUM62 was created using sequences
sharing no more than 62% identity

FYI: For your information


The BLOSUM62 scoring matrix:
a brief summary of
a large part of protein biochemistry

-OH, -SH

Small aliphatic

Acidic pH

Basic pH

Large aliphatic

Aromatic

FYI: For your information

You might also like