You are on page 1of 28

Bioinformatics

Lecture 5: Calculating Identities,


Similarity and Gab Scores
Calculating Identities, Similarity &
Gab Scores:
How the computer does it behind the
scenes
Recall:
Sequence identity
• Exactly same Nucleotide/Amino Acid in same the position

Sequence similarity
• Substitutions with similar chemical properties

Sequence homology
• General term that indicates evolutionary relatedness among
sequences
• Sequences are homologous if they are derived from a
common ancestral sequence.
Pairwise alignment of retinol-binding protein
and b-lactoglobulin:
Example of an alignment with internal, terminal gaps

1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP
. ||| | . |. . . | : .||||.:| :
1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin

51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP
: | | | | :: | .| . || |: || |.
45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin

98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP


|| ||. | :.|||| | . .|
94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin

137 RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV 185 RBP


. | | | : || . | || |
136 QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI....... 178 lactoglobulin
• Pairwise alignment of human RBP and bovine
B-lactoglobulin:

• Note that the alignment is global (i.e., the


entire lengths of each protein are compared)

• Identity between the two sequences is


indicated with bars, I .
• There are five different kinds of dots in this
alignment:
• (1) The paired dots between aligned residues
indicate different amounts of similarity (e.g.,
on the top line R & K have two dots and share
similar physiochemical properties) (arrow 1).
• There are five different kinds of dots in this
alignment:

• (2) Single dots between aligned residues (


arrow 2) also indicate similarity, but less than
for paired dots.
• There are five different kinds of dots in this
alignment:
• (3,4) The alignment contains both internal
gaps (indicated by dots in place of alphabetic
characters along the sequence (arrow 3) and
gaps at the amino and carboxy termini of B-
lactoglobulin (arrow 4).
• There are five different kinds of dots in this
alignment:
• (5) A dot is indicated above the sequences to
mark every 10 bp (arrow 5).
• Notice that along the top row the residues
GTWY are all identical between the two
proteins.
• Notice that along the top row the residues
GTWY are all identical between the two
proteins.
• We can count the number of identical
residues; in this case, the two proteins share
23% identity (43 residues/185 aligned
residues).

• Identity is the extent to which two amino acid


(or nucleotide) sequences are invariant.
• Some of the aligned residues are similar but not
identical

• They are related to each other because they


share similar biochemical properties.

• Similar pairs of residues are structurally or


functionally related.

• For example, on the first row of the alignment we


can find arginine and lysine (R and K connected
by two dots, : ); also we can see an aspartate
and a glutamate residue that are aligned.
• These are conservative substitutions. Amino
acids with similar properties include
• The basic amino acids (K, R, H),
• acidic amino acids (D, E),
• Hydroxylated amino acids (S, T),
• and hydrophobic amino acids (W, F, Y, L, I, V,
M, A).
• The percent similarity of two protein
sequences is the sum of both identical and
similar matches.
• On the top part of Figure 3.5, there are (50)
aligned amino acid residues of which 11 are
identical and 3 are similar.
• The percent identity is 22% (1 1/50) and the
percent similarity is 28% (14/50).
• More useful to consider the identity shared by
two protein sequences, rather than the
similarity, since similarity measure may be
based upon a variety of definitions of how
related (similar) two amino acid residues are
to each other.
• Pairwise alignment is the process of lining up
two sequences to achieve maximal levels of
identity (and maximal levels of conservation in
the case of amino acid alignments).
• The purpose of a pairwise alignment is to
assess the degree of similarity and the
possibility of homology between two
molecules.
• We may say that two proteins share 22% amino
acid identity or (as in the alignment above) that
they share 28% similarity.
• If the amount of sequence identity is
significant, then the two sequences are
probably homologous.
• Not correct to say that two proteins share a
certain percent homology; they are either
homologous or not.
• Strongest evidence to determine whether two
proteins are homologous comes from
structural studies in combination with
evolutionary analyses.
Terms of Sequence Comparison
Sequence identity
• Exactly same Nucleotide/AminoAcid in same position

Sequence similarity
• Substitutions with similar chemical properties

Sequence homology
• General term that indicates evolutionary relatedness among sequences
• Sequences are homologous if they are derived from a common ancestral
sequence.
Conservation
• Changes at a specific position of an amino acid or (less commonly,
DNA) sequence that preserve the physico-chemical properties of the
original residue.
Homework: From Second Edition

• Letters in between alignment sequences indicate matches (identity)

• + sign indicate similarity

Calculate percent identity and percent similarity.


How many gabs are there?
Calculate the percentage of gabs.
End here

You might also like