You are on page 1of 34

Structural and Functional Bioinformatics

Lecture 4
• Steps in Homology Modeling

• Step 1: Template Selection


• Step 2: Sequence Alignment
• Step 3: Model Building
Step 3: Model Building

Break protein folds into conserved core, loops, and


side chains
•Overlap template structures and generate backbone
•generation of canonical loops (data based)
•side chain generation based on known preferences
•ab initio loop building (energy based)
•overall model optimization (energy minimization)
Expectations of comparative modeling

Easy – 100-40% sequence id - strong sequence


similarity, strong structure similarity,
75 obvious function analogy
Difficult – 40%-25% - twilight zone
50 sequence similarity, increasing
structure divergence, function
diversification
25
Fold prediction – below 25% seq id.
no apparent sequence similarity
0 extreme function divergence
Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/

Core
Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/

• satisfaction of spatial restraint


- Threading
Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/

• satisfaction of spatial restraints


- distance geometry
Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/
Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/

Emphasis on alignments and then built by Modeller.


Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/
Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/
Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/

• All have similar accuracy.


• Most important is good alignment and good target selection!
• Modeling should allow flexibility & automation
• Multiple templates increase your odds of getting a good
model
PROTEIN THREADING

• Protein threading, also known as fold recognition, is a


method of protein modeling which is used to model
those proteins which have the same fold as proteins of
known structures, but have no homologous proteins of
known structures.
PRINCIPLE OF THREADING
• The prediction is made by "threading" each amino acid in the
target sequence to a position in the template structure, and
evaluating how well the target fits the template.

• After the best-fit template is selected, the structural model of


the sequence is built based on the alignment with the chosen
template.

• Protein threading is based on two basic observations: that the


number of different folds in nature is fairly small
(approximately 1300); and that 90% of the new structures
submitted to the PDB in the past three years have similar
structural folds to ones already in the PDB.
Classification of protein structure

• The Structural Classification of Proteins (SCOP) database


provides a detailed and comprehensive description of the
structural and evolutionary relationships of known structure.

• Proteins are classified to reflect both structural and


evolutionary relatedness. Many levels exist in the hierarchy,
but the principal levels are family, super family and fold, as
described below.
1. Family (clear evolutionary relationship)

• Proteins clustered together into families are clearly


evolutionarily related. Generally, this means that pair wise
residue identities between the proteins are 30% and greater.
• However, in some cases similar functions and structures
provide definitive evidence of common descent in the
absence of high sequence identity; for example,
many globins form a family though some members have
sequence identities of only 15%.
2. Superfamily (probable common evolutionary
origin)

• Proteins that have low sequence identities, but whose


structural and functional features suggest that a common
evolutionary origin is probable, are placed together in
superfamilies.
• For example, actins, the ATPase domain of the heat shock
protein, and hexa kinase together form a superfamily.
3. Fold (major structural similarity)

• Proteins are defined as having a common fold if they have the


same major secondary structures in the same arrangement and
with the same topological connections.

• Proteins placed together in the same fold category may not have a
common evolutionary origin: the structural similarities could
arise just from the physics and chemistry of proteins favoring
certain packing arrangements and chain topologies.
Method of Protein Threading

• A general paradigm of protein threading consists of the following


four steps:
• The construction of a structure template database
• Select protein structures from the protein structure databases as
structural templates. This generally involves selecting protein
structures from databases such as PDB, FSSP, SCOP, or CATH,
after removing protein structures with high sequence similarities.
The design of the scoring function
• Design a good scoring function to measure the fitness between
target sequences and templates based on the knowledge of the
known relationships between the structures and the sequences.

• The quality of the energy function is closely related to the


prediction accuracy, especially the alignment accuracy.
Threading alignment

• Align the target sequence with each of the structure templates


by optimizing the designed scoring function.

• This step is one of the major tasks of all threading-based


structure prediction programs that take into account the pair
wise contact potential; otherwise, a dynamic programming
algorithm can fulfill it.
Threading prediction
• Select the threading alignment that is statistically most
probable as the threading prediction.

• Then construct a structure model for the target by placing the


backbone atoms of the target sequence at their aligned
backbone positions of the selected structural template.
Comparison with homology modeling
• Homology modeling and protein threading are both template-
based methods and there is no rigorous boundary between them
in terms of prediction techniques. But the protein structures of
their targets are different.

• Homology modeling is for those targets which have


homologous proteins with known structure(usually/maybe of
same family), while protein threading is for those targets with
only fold-level homology found. In other words, homology
modeling is for "easier" targets and protein threading is for
"harder" targets.
Comparison with homology modeling

• Homology modeling treats the template in an alignment as a


sequence, and only sequence homology is used for prediction.
Protein threading treats the template in an alignment as a
structure, and both sequence and structure information extracted
from the alignment are used for prediction.

• When there is no significant homology found, protein threading


can make a prediction based on the structure information. That
also explains why protein threading may be more effective than
homology modeling in many cases.
Comparison with homology modeling

• In practice, when the sequence identity in a sequence sequence


alignment is low (i.e. <25%), homology modeling may not
produce a significant prediction.

• In this case, if there is distant homology found for the target,


protein threading can generate a good prediction.
Comparison with homology modeling

• It differs from the homology modeling method of structure


prediction as it is used for proteins which do not have their
homologous protein structures deposited in the Protein Data
Bank (PDB), whereas homology modeling is used for those
proteins which do.

• Threading works by using statistical knowledge of the


relationship between the structures deposited in the PDB and
the sequence of the protein which one wishes to model.
Evaluating the Model Errors
• Errors in side chain packing
• Template distortions because of crystal
packing forces
• Loop generation
• Misalignments
• Incorrect templates

The quality of a model can be accessed by


using different tools and servers like
Ramachandran Plot, Procheck, Verify 3D,
Errat
Evaluation Servers

COLORADO3D http://genesilico.pl/
PROCHECK http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html
VERIFY3D http://fold.doe-mbi.ucla.edu/
PROSAII http://www.came.sbg.ac.at/
WHATCHECK http://swift.cmbi.kun.nl/WIWWWI/modcheck.html
ERRAT http://servicesn.mbi.ucla.edu/ERRAT/
Ramachandran Plot
• Ramachandran’s plot is a protein structure validation tool for
checking the detailed residue-by-residue stereo-chemical
quality of a protein structure.
• A good homology model should have >90% of the residues in
the favorable region. Ramachandran plot was constructed for
each protein model using PROCHECK web-server.
PROCHECK
• PROCHECK (Laskowski et al., 1993) is used to estimate
the stereo-chemical quality of a model.

• Overall, PROCHECK program finds covalent geometry,


planarity, dihedral angles, chirality, non-bonded interactions,
main-chain hydrogen bonds, disulphide bonds, stereo
chemical parameters, and residue-by-residue analysis.
VERIFY 3D
• VERIFY 3D (Eisenberg et al., 1997) uses energetic and
empirical methods to produce averaged data points for each
residue to evaluate the quality of protein structures.

• Using this scoring function, if more than 80% of the residue


has a score of >0.2 then the protein structure is considered of
high quality.
ERRAT
• ERRAT (Colovos and Yeates, 1993) is a so-called “overall
quality factor” for non bonded atomic interactions, and higher
scores mean higher quality.

• The normally accepted range is >50 for a high quality model.

You might also like