You are on page 1of 24

Scoring Matrices

How did they get the values for


PAM-1?
Look at 71 groups of protein sequences
where the proteins in each group are at
least 85% similar (Why these groups?)
Compute relative mutability of each amino
acid probability of change
From relative mutability, compute
mutability probability for each amino acid
pair X,Y probability that X will change to Y
over a certain evolutionary time
Normalize the mutability probability for
each pair to a value between 0 and 1

Computing Relative Mutability A Measure of


the Likelihood that an Amino Acid Will Mutate
For each amino acid
Changes (p) = number of times the amino acid
changed into something else
exposure to mutation =
(percentage occurrence of the amino acid in the
group of sequences being analyzed) * (frequency of
amino acids changes in the group based on the
phylogenetic tree)
relative mutability =
(changes/exposure to mutation) / 100

Computing
Mutability Probability Between Amino Acid Pairs

For each pair of amino acids X and Y:


r = relative mutability of X
c = num times X becomes Y or vice versa
p = num changes involving X
mutability probability of X to Y =
(r * c) / p

Computing Relative Mutability of A:


changes = # times A changes into something else = 4
% occurrence of A in group = 10 / 63 = 0.159
frequency of all amino acid changes in group = 6 * 2 = 12
(Note: Count changes backwards and forwards.)
exposure to mutation = (% occurrence of A in group)
* (frequency of all amino acid changes in group)
= 12 * 0.159
relative mutability = (changes / exposure to mutation) / 100
= (4 / (12 * 0.159)) = 2.09 / 100 = 0.0209

Divide this value by 100 to give us PAM 1, where were modeling


1 substitution per 100 residues.
Example from Fundamental Concepts of Bioinformatics by Krane
and Raymer.

Computing Mutability Probability that


A will change to G:
r = relative mutability of A = .0209
c = num times A becomes G or vice versa
=3
p = num changes involving A = 4
mutability probability of A to G =
(r * c) / p = (0.0209 * 3) / 4 = 0.0156

Normalizing
Mutability Probability, X to Y
For each Y among all amino acids,
compute mutability probability of X to Y
as described above
Get a total of these 20 probabilities.
Divide them by a normalizing factor such
that the probability that X will NOT
change is 99% and the sum of
probabilities that it will change to any
other amino acid is 1%

Converting
Mutability Probabilities to
Log Odds Score for X to Y

Compute the relative frequency of change for X to


Y as follows:
Get the X to Y mutability probability
Divide by the % frequency of X in the sequence
data
Convert to log base 10, multiply by 10

In our example, we get log10(0.0156/0.1587) =


log10(.098)
To compute log10(.098) solve for x:
10x = 0.098

x = -1.01

Compute log odds score for Y to X

Usefulness of Log Odds Scores


A score of 0 indicates that the change
from one amino acid to another is what is
expected by chance
A negative score means that the change is
probably due to chance
A positive score means that the change is
more than expected by chance
Because the scores are in log form, they
can be added (i.e., the chance that X will
change to Y and then Y to Z)

Disadvantages of PAM
Matrices
A
phylogenetic
tree
must
be
constructed
first,
implying
some
circularity in the analysis
Disadvantage:
The original PAM-1
matrix was based on a limited number
of
families,
not
necessarily
representative of all protein families
The Markov model does not take into
account that multi-step mutations
should be treated differently from
single-step ones