You are on page 1of 34

Protein Identification

Order to identify and characterize proteins we use a variety laboratory


techniques. 
These techniques are always based on properties that are shared, but
different, between proteins:
Size
Charge
Sequence
Shape
The resulting data types are then used in a variety of bioinformatics
pipelines
Protein Separation
 

One of the simplest forms of characterization begins with the


separation of proteins from a mixture
E.g. cell lysate:  all proteins in a cell can be precipitated
Next we have to separate them from each other so that we can begin to
identify and
characterize them
In gel separation, proteins are placed in one end of a semi-permeable matrix,
typically a gel,
and made to run from one end to another by applying an electrical charge
Depending on the porosity of the gel, smaller proteins will travel faster – thus
farther – down
the gel, separating the proteins by size
Size as well as charge can have an impact
Charge comes from the amino acid R groups!
As do many other properties used to separate and characterize proteins, as you will
see later
PAGE
 

PAGE stands for Polyacrylamide Gel Electrophoresis


Can be used to separate proteins OR DNA.
Can separate native (non-denatured) or denatured proteins
Proteins are typically denatured using SDS (sodium dodecyl sulfate),
which eliminates any
conformation differences – effectively “balling” up the proteins so that
separation is exclusively
based on size and charge
SDS-PAGE
 

A very common method for separating proteins by electrophoresis uses


a discontinuous polyacrylamide gel as a support medium and Sodium
Dodecyl Sulfate (SDS) to denature the proteins. The method is
called Sodium Dodecyl Sulfate Polyacrylamide Gel
Electrophoresis (SDS-PAGE). Alternatively, a chemical denaturant may
be added to remove this structure and turn the molecule into an
unstructured molecule whose mobility depends only on its length and
mass-to-charge ratio. This procedure is called SDS-PAGE.
Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE)
is a method of separating molecules based on the difference of their
molecular weight.
In a typical PAGE gel, there are many lanes (like swim lanes in a pool). 
One lane is reserved for the marker, which is a calibrated substance that
is used as a size reference
The other lanes contain various samples
A stain or dye must be used to visualize the location of the protein bands
In the “bad” old days radioisotopes were used
SDS-PAGE (cont'd)
 

Sample preparation begins by treating the protein samples with SDS


SDS breaks the disulfide bonds that form higher order structure in a
protein
The resulting proteins are all essentially globular
Now let’s review a schematic representation of the SDS-PAGE procedure as
shown below. 

Samples are then loaded onto the gel and voltage applied
Proteins move away from the – charged cathode and towards the +
charged anode buffer
Now let's review a schematic representation of SDS-PAGE.

Important Note:
After running for the required length of time, the gel is visualized – in the
example to the right,
the entire gel has been stained with coomassie blue, which binds the
proteins. Let's review the
visualization depicting the same below.
Gels can be run in one dimension (separating from top to bottom)
Gels can optionally be run in 2 dimensions (first do a 1D run as described,
then rotate the gel 90
degrees and repeat) for better separation
When you see that a protein is 65kDa, for example, that means that it is
65 kilo-Daltons in size, and
will “run” on the gel to the same distance as the equivalent 65kDa marker
Now let's review a schematic of an SDS-PAGE gel as shown below.
Gel Blotting
 

The contents (in this case the proteins) on a gel can be transferred to
nitrocellulose filter paper for
further analysis
This simply requires placing the gel and the filter paper in contact under
pressure and allowing the
contents of the gel to wick into the filter paper
This can be sped up using an electrical charge system similar to that used
for the gel separation
Now let's review a schematic of the Protein Identification &
Separation process.
Protein Blots
 

Once transferred to the nitrocellulose, additional analysis can be done to


see what will bind to the protein bands
For a protein blot this is typically called a Western, SouthWestern, or
Eastern Blot
Once the proteins are transferred to the nitrocellulose the blot is blocked
An agent is bound to the rest of the nitrocellulose to prevent the “probe”
from absorbing into the
blot. Now the blot can be probed. A probe is the desired factor that you
want to see if your protein
binds, e.g. another protein or an antibody, or even specific DNA fragments
if you are looking for a
promotor, for example. For this type of blotting you need to run your PAGE
as a native gel, rather
than an SDS denatured gel.  If the probe “sticks” (illuminates) a particular
band you now know at the
very least the size of the protein that binds your probe – and you can even
band-purify that protein
from another PAGE gel to isolate it.  Now let's review a schematic of the
Protein Blots.
ELISA
 

ELISA stands for  Enzyme-linked Immunosorbent Assay (ELISA)


Similar to a Western Blot
Used to detect either a specific protein OR a specific antibody
Presence of the protein or antibody in question induces a color change
caused by the enzyme
used in the assay
Useful for looking for pathogens
Now let’s review a schematic about the different types of ELISA. 

Similar to PAGE, chromatography allows for the separation of proteins


based on properties
Invented in 1900 by Mikhail Semjonowitsch Tswett for the separation of
leaf pigments
Proteins are bound to a solid support and then eluted from the column
using methods that separate them based on their properties
You can run several “columns” in series to separate by multiple
propertiesTypes of Chromatography
 Bookmark this page
Now let’s review the different types of chromatography.

In the next section we will be discussing each of these types in detail. 


Basic Chromatography
 Bookmark this page

Thin-layer Chromatography
Analyte mixture (pink spots) is spotted at the bottom of a TLC plate and
the plate is placed into a solvent (cyan). The solvent travels up the TLC
plate via capillary action. Analytes travel with the mobile phase and
separate (red vs. blue spots) based on their interaction with the
stationary and mobile phase. Analytes can then be identified based on
the distance traveled in a given solvent and period of time. Now let's
review a schematic representation of the Thin-
layer Chromatography below.
Column Chromatography
Protein I has a weak affinity for the stationary phase and passes
through the column rapidly. Protein III has high affinity for the
stationary phase and moves through the column slowly; it has a
high retention time. These two proteins can thus be separated based
on their different retention times. Now let's review a schematic
representation of the Column Chromatography below.
Ion-Exchange Chromatography
 Bookmark this page

In Ion-exchange chromatography, the electrical charge of the protein


(which depends on the amino acid R-groups as well as the pH of the
medium), proteins are separated by charge
If the pH is lower than the isoelectric point of the protein, the protein will
have a net + charge
If the pH is higher than the isoelectric point of the protein, the protein will
have a net – charge
The medium in the column (or thin layer) may be loaded with charged
groups to affect binding
Ion exchange chromatography is primarily used for protein purification 
Now let's review a schematic of the Ion-Exchange Chromatography as
shown below. 
Gel Filtration Chromatography
 Bookmark this page

Separates proteins, peptides, and oligonucleotides on the basis of size


Molecules move through a bed of porous beads, diffusing into the beads
to greater or lesser
degrees
Smaller molecules diffuse further into the pores of the beads and
therefore move through the bed
more slowly, while larger molecules enter less or not at all and thus move
through the bed more
quickly
Both molecular weight and three-dimensional shape contribute to the
degree of retention
May be used for analysis of molecular size, for separations of
components in a mixture, or for salt
removal or buffer exchange from a preparation of macromolecules
Essentially the same as SDS-PAGE except run in a column

Important Note:
The matrix contains pores of various sizes; small proteins get “lost” in
pores and take longer to be eluted out the bottom. Large proteins go
around the matrix. Now let's review a schematic representation of the
Gel Filtration matrix used during chromatography. 

Hydrophobic Interaction Chromatography


 Bookmark this page

In hydrophobic interaction chromatography proteins are separated


based on how many hydrophobic R groups they have
Proteins are passed dissolved in salt solution and passed over the
column
Eluted with gradient of decreasing salt until even the most hydrophobic
proteins are eluted
Now let’s review a schematic of the Hydrophobic Interaction
Chromatography. 

Affinity Chromatography
 Bookmark this page

In affinity chromatography, proteins are very specifically separated


based on binding partners
Highly selective
High resolution
Most common form has an antibody bound to the matrix to which only
one protein binds
All other proteins get washed through
Once all the other proteins are eluted, the column is washed with an
agent that allows the antibody to release the protein

Mass Spectroscopy
 Bookmark this page

Mass spectrometry (MS) is an analytical technique that ionizes


chemical species and sorts the ions based on their mass-to-charge
ratio. In simpler terms, a mass spectrum measures the masses within a
sample. It is used in many different fields and is applied to pure
samples as well as complex mixtures. In addition to their size and other
properties (e.g. hydrophobicity, charge, binding characteristics), we can
identify proteins using this method.
Can be used on purified or non-purified protein mixes!
Identifies proteins based on size of fragments
Now let's review the schematic representations for Mass Spectometry(MS) as shown
below. 

Types of Mass Spectroscopy


 Bookmark this page

The different types of Mass Spectroscopy are determined based on the


type of Mass Analyzers used and are listed below. 
MALDI-TOF Mass Spectrometry
 Bookmark this page

 In the recent years Matrix Assisted Laser Desorption Ionization-Time Of


Flight mass spectrometry (MALDI-TOF MS) has emerged as a potential
tool for microbial identification and diagnosis. During the MALDI-TOF
MS process, microbes are identified using either intact cells or cell
extracts. Let’s review some features linked to MALDI-TOF MS.
All protein mass spectroscopy involves ionization of the proteins
In MALDI-TOF this is accomplished by embedding the proteins in a
solid matrix and exciting them with a laser to ionize them
For mass protein identification, the proteins are first broken into
smaller peptide fragments using proteolysis (e.g. trypsin)
This can be done to a single protein or to a mixture of many different
proteins!
Now let’s review a schematic of the MALDI-TOF Mass Spectrometry. 

MALDI-TOF Mass Spectrometry (cont'd)


 Bookmark this page

Once the proteins are digested into smaller fragments and ionized, they
are exposed to an electrical field which literally “shoots” the ions away
from the matrix, towards a detector
This is where the TOF part comes in:  Time of Flight
Heavier fragments travel shorter distances
Lighter fragments travel longer distances
A detector identifies how many “hits” at each possible mass there are
It generates a “mass peptide fingerprint” showing how many hits of each
possible mass there were in that sample. The “hard” part of MALDI-TOF
is turning the mass numbers back into actual proteins.
This is called Peptide Mass Fingerprinting
Computationally intensive
Looks at all fragments and their frequencies and is able to infer the
actual full-size proteins in the original mix based on fragment assembly
MASCOT:  Typically uses SwissProt as the standard protein mass
database
Now let’s review a schematic of the Peptide Mass
Fingerprinting as shown below. 
Introduction
 Bookmark this page

Similar to the use of mass spectroscopy to identify proteins,


a single purified protein can be fragmented and sequenced using Mass
Spectroscopy
Proteolytically digest protein into all possible fragments
Perform Mass Spectroscopy on the fragments to identify
Computationally assemble fragments starting with the simplest case
Protein Sequencing begins with the mass of a single amino acid ion and
look for that in the mass spec plot
Now let’s review the Protein Sequencing Table as shown below. 
Protein Sequencing
 Bookmark this page

Now work your way up to two-AA fragments


Repeat using this technique until enough overlap is identified to
reconstruct the entire peptide
sequence
Now let's review matrix constructed using this sequencing technique as
shown below. 
Important Note:
Like Mass Spectroscopy this is very computationally intensive!
Working with Protein Sequencing
 Bookmark this page

Protein sequence data can be analyzed in a variety of ways:


Single sequence analysis:  look at a single protein sequence:  e.g.
Hydrophobicity plots
Motif Scan
BLAST
Multiple sequence analysis:  compare multiple protein sequences: e.g.
Sequence Alignment
Domain Discovery
Phylogenetic Analysis
Hydrophobicity Plots
 Bookmark this page
A hydrophobicity plot is a quantitative analysis of the amino acids in a
protein with regards to their degree of hydrophobicity
Can be used to characterize or identify possible structural domains of the
protein
Can give information on the cellular location of the protein (e.g. is it
transmembrane)
Construction of a hydrophobicity plot:
Done computationally
Plot the amino acids on the X axis
Plot the known hydrobocity of these amino acids on the Y axis
Hyrophobicity Plots (cont'd)
 Bookmark this page

The following is a Kyte-Doolittle Plot where the y axis is


Hydrophobicity.  Alternately Hydrophilicity can be plotted, in which case
the plot is referred to as a Hopp-Woods Hydropathy plot.
Let’s review a table of AA hydropathy scores derived from
the literature. 
Hydrophobicity Plots (cont'd)
 Bookmark this page

Analyzing the shape of the hydropathy plot:


20 amino acids stretch positive for hydrophobicity suggests an alpha-
helix spanning a lipid bilayer
Amino acids with low hydropathy suggest that these residues are in
contact with an aqueous medium (water)
likely to reside on the outer surface of the protein.
Now let us review the following Hydropath Index as shown below.  

Motif Scan
 Bookmark this page

A motif scan  simply looks through a protein sequence for a match to


any known protein motifs in the databases
A protein motif is defined as a “super secondary structure” – not
necessarily indicative of function, but suggestive
Examples:  helix-turn-helix, omega loops, etc
A motif is a pattern which may contain some variation
Now let’s review the helix-turn-helix protein super secondary structure
and sequence. 
  

Important Note:
Sequence logo of the “canonical” helix-turn-helix protein sequence.
Not that while some of the
positions tend to be mostly conserved (e.g. position 5 is almost always an
A), other positions
may have multiple possible residues and still support the helix-turn-helix
structural motif.
Motif Scan(cont'd)
 Bookmark this page
So a motif scan is essentially a string search which searches for a match in
a string of contiguous amino acids
Since a motif can have significant variability (as seen in the sequence logo
for the helix-turn-helix motif), the search is very flexible in finding
matches
There are many existing tools to find sequence motifs
 Scanprosite: You can access the tool by clicking Scanprosite.
 Emboss: You can access the tool by clicking Emboss.
Motifscan: You can access the tool by clicking Motifscan. 
BLAST
 Bookmark this page

We have discussed about BLAST in the previous MicroMasters


course DNA Sequences: Alignments & Analysis. BLAST is
perhaps the most famous – and commonly used – algorithm in
bioinformatics
Basic Local Alignment Search Tool
Allows comparison of two or more sequences to discover sequence
similarity
Designed by Stephen Altschul at the NIH
Using a heuristic approach to solving a basic pattern matching
problem
The heuristic approach makes it feasible to submit a single query
and compare it against the entire database of Genbank
BLAST Algorithms
 Bookmark this page

There are actually 5 different ”flavors” of BLAST, each is designed for a


specific query/target combination:
Nucleotide-Nucleotide BLAST (blastn):  This program, given a DNA
query, returns the most similar DNA sequences from the DNA database
that the user specifies.
Protein-Protein BLAST (blastp):  This program, given a protein query,
returns the most similar protein sequences from the protein database
chosen
Nucleotide 6-Frame Translation-Protein (blastx):  This program
compares the six-frame conceptual translation products of a nucleotide
query sequence (both strands) against a protein sequence database.
Nucleotide 6-Frame Translation-Nucleotide 6-Frame Translation
(tblastx):  This program is the slowest of the BLAST family. It translates
the query nucleotide sequence in all six possible frames and compares
it against the six-frame translations of a nucleotide sequence database.
The purpose of tblastx is to find very distant relationships between
nucleotide sequences.
Protein-Nucleotide 6-Frame Translation (tblastn): This program
compares a protein query against the all six possible reading frames of
a nucleotide sequence database
Substitution Matrices
 Bookmark this page

When comparing sequences a perfect match is not always possible – nor


expected
BLAST (and other string matching algorithms) use a scoring system
Matches are worth one value
Mismatches are worth a different value
Indels are worth a third value
Matches, Mismatches & Indels
 Bookmark this page

Let’s take the following string comparison as an example:


While not a perfect match, this is actually a pretty good match:
6 matches
2 mismatches
1 indel (insertion or deletion event)
Scoring
 Bookmark this page

Since we will have a combination of matches, mismatches, and indels, we


need a way of scoring an
alignment so that we can choose the best one
Scoring a match is easy – typically a match is given a score of “+4”
An Indel (also called a gap) may be given a “gap penalty” of a fairly high
negative value, e.g. “-8”
Substitutions are more interesting for protein alignments - we need to
take into account the
properties of the amino acids and construct a matrix which gives a different
score for each possible
substitution
Blosum Matrix
 Bookmark this page

The Blocks Substitution Matrix is a table that provides a pairwise


comparison of each possible amino acid pair
For each pair, a value is given representing the score that should be
assigned to an alignment containing that particular substitution
These values were derived empirically!
Henikoff and Henikoff (1992) scanned the BLOCKS database, looking at
very conserved regions of protein families (regions without gaps) and
looked at the frequency of substitutions for each amino acid pair seen
The results are the BLOSUM 62 table:
Scoring Example Review
 Bookmark this page

Let’s review our example sequence:

Now let’s do the math:


6 matches at +4 each = 24 pts
2 mismatches:  TRP to ALA (-3), and PHE to VAL (-1) = -4
1 indel (insertion or deletion event) at -8 = -8
So our final score using BLOSUM is 12.
PAM
 Bookmark this page
Point accepted mutation (PAM) is the replacement of a single amino acid in
the primary structure of a protein with another single amino acid .
It is an alternative to BLOSUM
A PAM matrix is one where each column and row represent one of the
standard twenty amino acids.
PAM is also derived from real-world data (each entry represents the
likelihood of the amino acid in the row being replaced by the one in the
column), but using a different methodology (evolutionary relationships
and global rather than local alignments)
In practice both will yield similar results…
Scoring Matrix
 Bookmark this page

What BLAST and similar algorithms do is construct a scoring matrix


Query and target sequences are along the top and side
Score for each base is filled in
The optimal path (best score) is chosen – this is the best alignment
For a small or simple alignment the scoring matrix is easy to construct and read
To find best alignment, you start at the bottom right and work your way towards the
top left
The idea is to find the “optimum path” – the path that gives you the highest score
Scoring Matrix (cont'd)
 Bookmark this page

In this example, which is both a longer sequence and more complex,
there are a variety of paths
that could be taken
The BLAST algorithm will evaluate these and choose the path with the
best score
Finally the results are presented….
Sample BLAST Results
 Bookmark this page

BLAST calculates the alignment score for each High Scoring


Pair(HSP) and presents matches in best
to worst order
Score and other data are included in the return
Now let's review a sample BLAST Results table as shown below.
Sample BLAST Results (cont'd)
 Bookmark this page

Now let’s review the description of the various parameters that are
listed in the BLAST Table Results as shown below.

You might also like