You are on page 1of 25

Protein Structure Determination

 Bookmark this page

As we have seen, protein structure and shape is critical to protein


function
Proteins are dynamic and tend to flex - also known as
'breathing' (meaning that the shape changes in small ways
constantly in vivo)
We can still make “best guesses” as to actual structure:
X-ray diffraction
Electronic microscopy
Nuclear magnetic resonance
X-Ray Diffraction
 Bookmark this page

X-Ray Diffraction (also known as X-Ray Crystallography, or XRC) involves


passing x-ray beams through crystallized proteins
The protein crystal acts as a grid and diffracts the beam
The resulting diffraction pattern is captured (either on photographic film
or on a solid-state detector)
From the diffraction pattern the positions of atoms can be determined
Resolution is fairly high – 3 angstroms
Crystallizing Proteins
 Bookmark this page

A very pure sample of the protein is required for X-Ray diffraction


Crystallization procedures differ from protein to protein – and must be
optimized in order to ensure that the protein retains as much of the in
vivo  shape as is possible
Protein crystallization is almost always done in solution
This is a two-step process:
Nucleation of a very small sample, containing perhaps only 100
molecules
Crystal growth
Both steps must be done very slowly!
In general the idea is to create conditions which force the protein out of
solution (precipitation) without losing form
Crystallizing Proteins(cont'd)
 Bookmark this page

While this sounds simple, it is not:


Crystallizing proteins is a bit of an art form
Needs to be extremely pure
Difficult to grow good crystals that will provide clean diffraction patterns
A crystallized protein may not be truly representative of the actual
protein structure in vivo
Interpreting the diffraction patterns provides an electronic density chart,
from which the structure must be extrapolated – also a bit of an art
form
Dynein X-Ray Diffraction
 Bookmark this page

From the 6 different views (two each from the X, Y, and Z axes) of
crystallized dynein (the motor
protein), the structure can be analyzed from the X-Rays shown below. 
Dynein X-Ray Diffraction (cont'd)
 Bookmark this page

The actual process is mathematically intense and requires Fourier


transforms to calculate electron
positions
From electron positions, atoms and then amino acids are extrapolated
Now let’s review the Orthogonal Surface views as shown below.
Electron Microscopy (EM)
 Bookmark this page

Another approach that attempts to define protein structure is Electron


Microscopy (EM).
Like X-Ray Crystallography, requires a crystallized protein
Inherits the same problems, e.g. crystallization is a difficult process and
may not accurate
represent the actual structure of the protein
Useful to examine membrane proteins, which do form two-dimensional
crystals within the
membrane itself – in which case the protein is probably closer to its native
form than it would be
in a pure crystallized form
Membrane proteins all have the same orientation relative to one another
due to their
placement in the membrane
Resolution is limited to 10-20 angstroms
Cryo-Electron Microscopy
 Bookmark this page

Cryo-Electron Microscopy is an alternative approach.


Freezes proteins in their native state! 
So the structures will be more representative of the actual protein
structure found in the cell
Several membrane proteins visualized by Cryo-EM
Note that the structure is less detailed than with X-Ray Diffraction (as
shown below)

Nuclear Magnetic Resonance(NMR)


 Bookmark this page

Nuclear Magnetic Resonance (NMR) is used as an imaging technique


for medical diagnostic scans on humans
NMR relies on the fact that atomic nuclei with an odd number of nucleons
(proton/neutron pair) have an angular momentum, or “spin”
Atoms such as H, C, P, and N
This spin creates a tiny magnetic field, which is measurable with the
appropriate equipment (e.g., an NMR)!
We can take advantage of this to detect atoms within proteins as well
 The magnetic field generated by the atoms is partially shielded by the
electrons surrounding the atoms, which can give us clues as the kind of
bonds those atoms are participating in
This means that we can elucidate secondary and tertiary structure of
proteins by looking at these bonds!
NMR protein analysis does NOT require the proteins to be
crystallized – they can be in solution, so are more likely to be in their
“native” shape
Limitations:  NMR is very expensive and requires a large amount of
sample (several milligrams)
Nuclear Magnetic Resonance(NMR) (cont'd)
 Bookmark this page

NMR spectroscopy will generate a plot for each atom that is being looked
for – in the example plot
below only the H atoms are shown
The peaks are converted into a plot (see right) that shows where the
atoms are located in the sample
– again very computationally intensive
Now let's review a typical Hydrogen Spectrum of a Protein as shown
below. 
Pure Computational Folding
 Bookmark this page

In addition to these three techniques we have purely computational


methods for predicting a protein structure
The advantage of these is that they require no sample, no
crystallization, and no lab work
The (obvious) disadvantage is that these methods are entirely
predictive, and while they are based on known and tested algorithms,
and adjusted based on “known” structures from XRC, EM, and NMR,
they are definitely not 100% accurate
But – in theory, since a protein’s structure already contains all of the
information necessary to fold in vivo, there is no reason to assume that
we cannot ultimately devise software to predict these structures
 This is known as the Anfinsen hypothesis, named after Dr. Christian
Anfinsen, a Nobel prize
laureate molecular biologist who formulate the idea
Pure Computational Folding(cont'd)
 Bookmark this page

Given the advantages, computational folding is a very active and


important field
CASP (Critical Assessment of Structural Prediction) is an online
community-driven experiment that comes together every two years to
evaluate progress in these algorithms
CAMEO3D (Continuous Automated Model Evaluation) similarly compares
methods
Both of these rely on comparison to the “known” structures to help
improve algorithms
Some of the rules of folding are very simple and straightforward:
Protein folding reduces free energy
The native structure will be the one with the lowest free energy
There will be exceptions to this based on local optima
There will be exceptions to this based on the involvement of other factors
(e.g. chaperones)
We can also take advantage of known motifs and domains – sequences
which we know form specific structures. 
Free Energy of Protein Folding
 Bookmark this page

A local optima is a solution that appears optimal given the neighboring


conditions, but may not be
globally optimal when all possible solutions are taken into account
The local minimum/kinetic trap shown below is optimal if you look at only
that portion of the
free energy graph, but clearly they are other optima
Not all optima are equal!  It is important to differentiate between the local
and the global conditions

Rosetta3
 Bookmark this page

One of the best protein structure prediction systems is Rosetta


Rosetta does “ab initio” (“from the beginning” folding….meaning it is
completely based on “first
principles” and physics
It takes the protein sequence and looks for known motifs and domains,
based on a fragment
library it has
Each of these (e.g. an alpha helix motif) is assumed to fold in a consistent
manner
Where no known motif or domain is available, free energy calculations
are performed to make a
“best guess”
Energy Methods are established for each possible type of pairing of
residues, as well as short range/long range stretches of amino acids
These values are context dependent
Once secondary structure is predicted, tertiary folding is computed
similarly
The structure which conforms to the best combination of “known” motifs
and low free energy is assumed to be the most accurate prediction
Important Note:
You can access Rosetta prediction system by clicking the Rosetta link. 
Folding Prediction Workflow
 Bookmark this page

As you might expect, errors in the initial folding (e.g. secondary structure)
are only compounded
when computing tertiary structure
No physics model currently takes into account the presence of
chaperones, which we know are
involved in the folding of many proteins
Now let's review a schematic of the Folding Prediction Workflow. 
Distributed Computing
 Bookmark this page

Given the complexity of the problem of protein folding (e.g. the number
of possible conformations and solving the free energy equations for
each), this is one application that has found a fantastic use for
distributed computing
Projects such as Rosetta@home and Folding@home are distributed
computer projects
A “master” computer breaks the job(s) up into smaller pieces
The smaller pieces are sent to volunteers (people like you and me) and
are analyzed on the
home computers
Results are compiled and assembled back at the “master” computer
Distributed Computing (cont'd)
 Bookmark this page

These distributed projects have to date answered questions about


proteins involved in:
Parkinson’s Disease
Huntington’s Disease
Alzheimers
Osteogenesis Imperfecta
If you want to get involved:
Click the Foldingathome link to access the home page for
Folding@Home where you can
download the client. 
Click the Rosettaathome link to access the home page for
Rosetta@Home where you can
download the client. 
Protein Function Prediction

Introduction
 Bookmark this page

Just as we can use the basic rules of physics (and the known rules of
protein folding) to help us
computationally predict protein folding, we can predict
protein function based on known and ab initio
methods
Predictions of function will be based on many different criteria:
DNA/protein sequence
Expression profiles
Protein Domain information
Protein – Protein Interaction information
Why Computational Prediction?
 Bookmark this page

Protein folding is important in terms of function – many diseases are the


result of misfolding
Determining protein structure in the lab via the methods we discussed previous (e.g.
crystallography) is very time and money intensive
Using software to predict folding may be inaccurate and computationally
intense, but it is orders of
magnitude faster than determining protein structure using laboratory
techniques!
We face a similar situation with prediction of protein function!
DNA sequencing has progressed so rapidly that we can now generate
genomic sequence data far faster than we can analyze it in the lab
Many proteins in sequenced genomes are “known” only through
computational analysis, e.g.
they are similar to known genes, or based on bioinformatics algorithms are
“most likely”
expressed genes/proteins
Protein Function Prediction
 Bookmark this page

Just as we can use the basic rules of physics (and the known rules of
protein folding) to help us computationally predict protein folding, we
can predict protein function based on known and ab initio methods
Predictions of function will be based on many different criteria:
DNA/protein sequence
Expression profiles
Protein Domain information
Protein – Protein Interaction information
Protein Function Prediction (cont'd)
 Bookmark this page

Sequence homology is a good predictor of function


Proteins with similar sequences usually have similar functions
There are always exceptions – for evolutionary reasons (since new
proteins usually evolve from existing proteins) even very similar
proteins may have very different functions
For example:  Gal1 and Gal3
Gal1 and Gal3 are similar in sequence as well as structure
An example of duplication & subsequent divergence
They share 73% sequence Identity (exact match), and 92% similarity
(amino acids with similar characteristics)
Yet Gal1 is a galactokinase and Gal3 is a transcriptional inducer
So sequence homology alone is not an ideal predictor of function
Protein Function Prediction (cont'd)
 Bookmark this page

 Genes in the same pathway are often transcriptionally linked


Control of expression is often linked so that all of the required genes are
turned on/off in sync
Genes in the same pathway are often also found near each other
Another way to control expression
In prokaryotes this is called an operon and expression is very rigidly
linked
In Eukaryotes this can result in the co-availability of the genes in
packaging (e.g. histone binding)
The end result is that you can look at the expression patterns of similar
proteins and “guess” that they must be similar in function
This is less accurate than sequence homology
But when added to all of the other information it can contribute to the
full body of evidence
dcGO Database
 Bookmark this page
A more informative guess can be made from examination of protein
domains
PFAM (Protein Families) database stores known motifs and domains,
and can be used to compare
sequence to function
dcGO database stores information on function and sequential domains
dcGO is a database of domain super-families (supra-domains)
dcGO looks at combinations of domains in sequence (2 or more)
Built on a knowledge of existing proteins – by using proteins with well-
characterized functions (done by wet lab) and constructing a database
of known supra-domains and then analyzing proteins of unknown
function in terms of domains and order
Since domains/motifs often have a known function, this works fairly well
both with PFAM and dcGO comparisons
Protein-Protein Interaction Data & Binding
 Bookmark this page

One simple way to determine what a protein does is to isolate it in


conjunction with it’s binding
partners
Can be done in a variety of ways, e.g. immune precipitation, affinity
chromatography, or even
simple precipitation and western blotting
The first step in a protein-protein interaction assay is to chemically cross-link proteins
that are
already bound
This keeps bound proteins from coming unbound during subsequent analysis
Crosslinking forms strong covalent OR ionic bonds between the two proteins
BS3 (bis(sulfosuccinimidyl)suberate) is a common cross-linking reagent
Knowing what a protein binds to can then tell us what it’s function might be – e.g. if
it binds DNA it
might be involved in regulation of expression, or if it binds a protein it may modify that
protein
Knowing what domains it has can then further expand that knowledge
Protein Docking Prediction
 Bookmark this page

Protein docking is an extension of protein binding


In docking studies, we look not only at what a protein binds, but in what
orientation
Recall the “lock and key” analogy of enzyme activity – with docking we
specify not only which key
fits which lock, but how to insert the key into that lock

Important Note:
Notice how the ligand must be in exactly the right conformation in order to fit into the
target (top). 
Similarly when binding anything, whether DNA, another protein, or a small
molecule, the exact
orientation is important.
Energetics of Protein Docking
 Bookmark this page

In general, the protein and ligand will bind in an energetically favorable
(low free energy) fashion
There are always exceptions to this rule, but in general by looking at the
protein-ligand system and
keeping free energy transformations in mind we can begin to calculate the
“best” conformation
Since the sequence of the amino acids determines the structure/shape as
well as binding
characteristics, knowing the exact binding orientation helps us
understanding why a specific mutation
may lead to a change in function
We can also take advantage of this when engineering proteins to design a
binding (docking!) site
more accurately
Approaches to Docking Prediction
 Bookmark this page

Shape Complementarity:  this approach describes the protein and it’s


binding partner (ligand) in terms of features:
Each has a solvent-accessible surface area
Each has surface area characteristics, e.g. hydrophobicity, charge, etc
Searching for two compatible surfaces, one on each partner, takes all of
this into account
Often this is limited on the protein side to only looking at the protein’s
active site, which makes the analysis go much faster
Approaches to Docking Prediction (cont'd)
 Bookmark this page

Simulation
In simulation the protein and ligand are both simulated in 3D space, some
distance apart, and the ligand is pushed at the protein’s active site in all
possible conformations
Also takes into account all of the factors that shape complementarity
does, e.g. hydrophobicity
free energy is calculate at each possible step
Lowest free energy of binding is the “winner”
Very computationally intensive but far more representative of “real world”
docking
Basic Docking Algorithm
 Bookmark this page

Both techniques have similar characteristics


You must know the structure of the protein and ligand in advance
(obviously!)
You limit the search space to include only the protein’s active site
You must take into account the flexibility of both protein and ligand (e.g.
how much “give” is
there for an induced fit)
 Score possible fits for free energy of the protein-ligand system
Scoring Docking
 Bookmark this page

It is not sufficient to simply score the final “fit”


Rotational/conformation changes of the protein and ligand must be taken
into account!
Even if a particular docking conformation is “ideal” in terms of free energy,
if an intermediate
(either the change in shape of protein or ligand or the path required to get
there) requires too
much energetically, it is unlikely to be the actual fit
Types of Scoring Functions
        There are four different types of used to score the final docking. Let's review
them. 
Alternative Splicing Prediction
 Bookmark this page

There is one last area of computational prediction we will address – that


of alternative splicing
Recall that eukaryotic mRNA is spliced on it’s way to the ribosome to
remove the introns
Splicing occurs at very specific sites – the exon/intron boundaries – and is
controlled by a signal
sequences
Splicing is performed by a protein complex called the spliceosome
RNA Splicing
 Bookmark this page
In molecular biology, splicing is the editing of the nascent precursor
messenger RNA (pre-mRNA)
transcript into a mature messenger RNA (mRNA)
 The end result of splicing is that introns are removed and exons are
joined together (ligated)
 For nuclear-encoded genes, splicing takes place within the nucleus either
during or immediately
after transcription
 For those eukaryotic genes that contain introns, splicing is usually
required in order to create an
mRNA molecule that can be translated into protein
 For many eukaryotic introns, splicing is carried out in a series of reactions
which are catalyzed by
the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs)
Self-splicing introns, or ribozymes capable of catalyzing their own excision
from their parent RNA
molecule, also exist
The spliceosome recognizes a very specific set of signals at both the 5’
and 3’ end of the exon
Now let's review a schematic of the RNA Splicing as shown below.
Splicing Signal
 Bookmark this page

The splicing signal is highly conserved:


3' splice sites: CAG|G where the “cut” is at |
5' splice sites: MAG|GTRAGT where M is A or C and R is A or G  
Now let's review a schematic of the Splicing signal. 
Alternative Splicing
 Bookmark this page

In alternative splicing, there are several different ways that a pre-mRNA


can be spliced, resulting in different proteins. Now let’s review a
schematic on Alternative Splicing as shown below. 
Alternative Splicing Signals
 Bookmark this page

Splicing thus involves both cis and trans acting factors:


cis (latin for “same side”) – the splice signals are on the mRNA, hence on
the same molecule as the splicing occurs on
trans (latin for “different side”) – the spliceosome complex itself is
separate from the mRNA, hence it is a trans-acting factor
Alternate splice sites must also have the same cis data (the consensus
sequence), so there must be alternative transfactors that are involved in
alternative splicing
But there are also splicing silencers – sequences that can occur in the
intron itself or in any of the exons nearby, and allow the trans factors
(splicing repressor proteins) to bind
Alternative Splicing Signals (cont'd)
 Bookmark this page

So predicting splicing can be done in two steps:


First identify all possible splice sites
Then determine which, if any, splice silencer sequences exist nearby
In addition to splice silencers there are splice enhancers
Also a cis element – a sequence element that can lie in the intron or in
the exons
Must also be included in any predictive splicing
Important Note:
For more information on the same click the Splicing link.  

You might also like