You are on page 1of 29

Tertiary Structure Prediction Methods

Any given protein sequence

Compare sequence with proteins have solved structure

> 35% < 35% < 35%


Homology Fold ab initio
Modeling Recognition Folding

Structure selection

Structure refinement

Final Structure
Why Homology modelling ?
X-ray Diffraction
– Only a small number of proteins can be made to form crystals.
– A crystal is not the protein’s native environment.
– Very time consuming.

NMR Distance Measurement –


– This method generally looks at isolated proteins rather
than protein complexes.
– Very time consuming
Homology Modeling:
Principles, tools and techniques
• Development of molecular biology: rapid
identification, isolation and sequencing of genes.
• Problem : time-consuming task to obtain the 3D-
structure of proteins.
• Alternative strategy in structural biology is to
develop models of protein when the constraints
from X-ray diffraction or NMR are not yet
available.
• Homology modeling is the method that can be
applied to generate reasonable models of protein
structure.
Database approach to homology modelling

As of June 2000, 12,500 protein structures have


been deposited into the Protein Data Bank (PDB)
and 86,500 protein sequence entries were contained
in SwissProt protein sequence database.
• This is a 1:7 ratio – relatively few structures are
known.
• The number of sequence will increase much faster
than the number of structures due to advances in
sequencing.
Sequence similarity methods

• These methods can be very accurate if there is > 50%


sequence similarity.

• They are rarely accurate if the sequence similarity < 30%.

• They use similar methods as used for sequence alignment


such as the dynamic programming algorithm, hidden
markov models, and clustering algorithms.
What is Homology Modeling?
• Predicts the three-dimensional structure of a given protein sequence
(TARGET) based on an alignment to one or more known protein structures
(TEMPLATES)
• If similarity between the TARGET sequence and the TEMPLATE sequence
is detected, structural similarity can be assumed.
• In general, 30% sequence identity is required for generating useful models.
Structural Prediction by Homology Modeling
Structural Databases
SeqFold,Profiles-3D, PSI-BLAST, BLAST & FASTA, Fold-recognition methods (FUGUE)

Reference Proteins
Cα Matrix Matching

Conserved Regions Protein Sequence


Sequence Alignment

Coordinate Assignment

Predicted Conserved Regions


Loop Searching/generation
MODELER
Initial Model

Structure Analysis
Sidechain Rotamers
and/or MM/MD WHAT IF, PROCHECK, PROSAII,..
Refined Model
How good can homology
modeling be?

Sequence Identity
60-100% Comparable to medium resolution NMR
Substrate Specificity

30-60% Molecular replacement in crystallography


Support site-directed mutagenesis
through visualization

<30% Serious errors


Significance of Protein Structure

What does a structure offer in the way


of biological knowledge?
 Location of mutants and conserved residues
 Ligand and functional sites
 Clefts/Cavities
 Evolutionary Relationships
 Mechanisms
The importance of the sequence
alignment

• the quality of the sequence alignment is


of crucial importance
• Misplaced gaps, representing insertions or deletions,
will cause residues to be misplaced in space
• Careful inspection and adjustment on Automatic
alignment may improve the quality of the modeling.
Programs for Model Protein
Construction
• MODELLER 4.0
– guitar.rockefeller.edu/modeller/modeller.html

• SWISS-MOD Server
– www.expasy.ch/swissmod/SWISS-MODEL.html

• SCWRL (SideChain placement With Rotamer Library)


– www.fccc.edu/research/labs/dunbrack/scwrl/
Protein Structural Databases
• Templates can be found using the TARGET sequence as a
query for searching using FASTA or BLAST
– PDB (http://www.rcsb.org/pdb)
– MODELLER
(http://guitar.rockefeller.edu/modeller/modeller.html)
– ModBase (http://pipe.rockefeller.edu/modbase/general-
info.html)
– 3DCrunch
(http://www.expasy.ch/swissmod/SM_3DCrunch.html)
Gaining confidence in template
searching

• Once a suitable template is found, it is a good idea to do


a literature search (PubMed) on the relevant fold to
determine what biological role(s) it plays.

• Does this match the biological/biochemical function


that you expect?
Other factors to consider in selecting
templates
• Template environment
– pH
– Ligands present?
• Resolution of the templates
• Family of proteins
– Phylogenetic tree construction can help find the
subfamily closest to the target sequence
• Multiple templates?
Target-Template Alignment

• No current comparative modeling method can recover


from an incorrect alignment
• Use multiple sequence alignments as initial guide.
• Consider slightly alternative alignments in areas of
uncertainty, build multiple models
• Sequence-Structure alignment programs
– Tries to put gaps in variable regions/loops
• Note: sequence from database versus sequence from
the actual PDB are not always identical
Target-Multiple Template Alignment
• Alignment is prepared by superimposing all template
structures
• Add target sequence to this alignment
• Compare with multiple sequence alignment and adjust
Adjusting the alignment
• Using tools such as Joy (www-cryst.bioc.cam.ac.uk/~joy/)
to view secondary structure along the alignment and use this information as
criteria for adjustments
• Avoid gaps in secondary structure elements

0 * 240 * 260 * 280 *


1ad3 : LKPSEVSGHMADLLATLIPQY-M---DQNLYLVVKGGVPETTELLK--ERFDHIMYTGSTAVGKIVMAAAAK- : 200
1cw3 : MKVAEQTPLTALYVANLIKEAGF---PPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 254
1ad3_4 : LKPSEVSGHMADLLATLIPQY-M---DQNLYLVVKGGVPETTELL--KERFDHIMYTGSTAVGKIV-MAAAAK : 200
1cw3_4 : MKVAEQTPLTALYVANLIKEAGF---PPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 254
1ad3_5 : LKPSEVSGHMADLLATLIPQY-M---DQNLYLVVKGGVPETTELLKER--FDHIMYTGSTAVGKIV-MAAAAK : 200
1cw3_5 : MKVAEQTPLTALYVANLIKEAGF---PPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 254
1ad3_6 : LKPSEVSGHMADLLATLIPQY-M---DQNLYLVVKGGVPETTELLKER--FDHIMYTGSTAVGKIV-MAAAAK : 200
1cw3_6 : MKVAEQTPLTALYVANLIKEAGF---PPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 254
1ad3_ce : LKPSEVSGHMADLLATLIPQYM----DQNLYLVVKGGV-PETTELLKE-RFDHIMYTGSTAVGKIVMAAAA-K : 200
1cw3_ce : MKVAEQT---PLTALYVANLIKEAGFPPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 254
6K E 3 a a 6i 6 6V G p 6 D 6 5TGST 6G466 AA
Secondary Structure Prediction

 The Predict Protein server


 http://www.embl-heidelberg.de/predictprotein/
 Adding secondary structure prediction algorithms can
help make decisions on whether helices should be
shortened/extended in areas of poor sequence identity.
 PHD program
Constructing Multi-domain protein models

• Building a multi-domain protein using templates


corresponding to the individual domains
• proteinA aaaaaaaaaaaaa---------------------
• proteinB -----------------bbbbbbbbbbbbbbb
• Target aaaaaaaaaaaaabbbbbbbbbbbbbbb
Multiple model approach

 Reminder: Consider the effects of different substitution


matrices, different gap penalties, and different
algorithms. (Vogt et al. J. Mol. Biol. 1995, 249:816-
831.)
 Construct multiple models
 Use structural analysis programs to determine best
model

Jaroszewski, Pawlowski and Godsik, J. Molecular Modeling, 1998, 4:294-309


Venclovas, Ginalski and Fidelis. PROTEINS, 1999, 3:73-80 (Suppl)
Model Building
• Rigid-Body Assembly
– Assembles a model from a small number of rigid
bodies obtained from aligned protein structure
– Implemented in COMPOSER
• Segment Matching

• Satisfaction of Spatial Restraints


– MODELLER
– guitar.rockefeller.edu/modeller/modeller.html
Modeller
• Main input are restraints on the spatial structure of AA and ligands to be
modeled.
• Output is a 3D structure that satisfies these restraints
• Restraints are obtained from related protein structures (homology modeling)
- obtained automatically, NMR structures, secondary struture packing and
other experimental data
What are the Restraints ?
distances, angles, dihedral angles, pairs of dihedral angles
and some other spatial features defined by atoms or pseudo
atoms.
Sidechain Conformation
• Protein sidechains play a key role in molecular
recognition and packing of hydrophobic cores of
globular proteins
• Protein sidechain conformations tend to exist in a
limited number of canonical shapes, usually called
rotamers
• Rotamer libraries can be constructed where only
3-50 conformations are taken into account for
each side chain
Sidechains on surface of protein

• Exposed sidechains on surface can be highly flexible


without a single dominant conformation
• So ultimately if these solvent exposed sidechains do not
form binding interactions with other molecules or
involved in say, a catalytic reaction, then accuracy may
not be crucial—also look at the B-factors
• Can refine the sidechains with molecular mechanics
minimization
– Sampling?
– Scoring?
Errors in Homology Modeling

a) Side chain packing b) Distortions and shifts c) no template


Errors in Homology Modeling

d) Misalignments e) incorrect template


Marti-Renom et al., Ann. Rev. Biophys. Biomol. Struct., 2000, 29:291-325.
Detection of Errors
• First check should include a stereochemical check on
the modeled structure—PROCHECK, WHATCHECK,
DISTAN– which will show deviations from normal
bond lengths, dihedrals, etc.

• Visualization– follow the backbone trace and then


subsequently move out to Cα-Cβ orientation.
PROCHECK

http://www.biochem.ucl.ac.uk/~roman/
procheck/procheck.html

You might also like