Lecture 4

Structural and Functional Bioinformatics
Lecture 4
• Steps in Homology Modeling
• Step 1: Template Selection

• Step 2: Sequence Alignment
• Step 3: Model Building
Step 3: Model Building
Break protein folds into conserved core, loops, and

side chains
•Overlap template structures and generate backbone
•generation of canonical loops (data based)
•side chain generation based on known preferences
•ab initio loop building (energy based)
•overall model optimization (energy minimization)
Expectations of comparative modeling
Easy – 100-40% sequence id - strong sequence

similarity, strong structure similarity,
75 obvious function analogy
Difficult – 40%-25% - twilight zone
50 sequence similarity, increasing
structure divergence, function
diversification
25
Fold prediction – below 25% seq id.
no apparent sequence similarity
0 extreme function divergence
Modeling Servers
SwissModel http://swissmodel.expasy.org/SWISS-MODEL.html
Modeller http://salilab.org
Geno3D http://geno3d-pbil.ibcp.fr
ESyPred http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
3D-jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/
Core
Modeling Servers
• satisfaction of spatial restraint

- Threading
Modeling Servers
• satisfaction of spatial restraints

- distance geometry
Modeling Servers
Modeling Servers
Emphasis on alignments and then built by Modeller.

Modeling Servers
Modeling Servers
Modeling Servers
• All have similar accuracy.

• Most important is good alignment and good target selection!
• Modeling should allow flexibility & automation
• Multiple templates increase your odds of getting a good
model
PROTEIN THREADING
• Protein threading, also known as fold recognition, is a

method of protein modeling which is used to model
those proteins which have the same fold as proteins of
known structures, but have no homologous proteins of
known structures.
PRINCIPLE OF THREADING
• The prediction is made by "threading" each amino acid in the
target sequence to a position in the template structure, and
evaluating how well the target fits the template.
• After the best-fit template is selected, the structural model of

the sequence is built based on the alignment with the chosen
template.
• Protein threading is based on two basic observations: that the

number of different folds in nature is fairly small
(approximately 1300); and that 90% of the new structures
submitted to the PDB in the past three years have similar
structural folds to ones already in the PDB.
Classification of protein structure
• The Structural Classification of Proteins (SCOP) database

provides a detailed and comprehensive description of the
structural and evolutionary relationships of known structure.
• Proteins are classified to reflect both structural and

evolutionary relatedness. Many levels exist in the hierarchy,
but the principal levels are family, super family and fold, as
described below.
1. Family (clear evolutionary relationship)
• Proteins clustered together into families are clearly

evolutionarily related. Generally, this means that pair wise
residue identities between the proteins are 30% and greater.
• However, in some cases similar functions and structures
provide definitive evidence of common descent in the
absence of high sequence identity; for example,
many globins form a family though some members have
sequence identities of only 15%.
2. Superfamily (probable common evolutionary
origin)
• Proteins that have low sequence identities, but whose

structural and functional features suggest that a common
evolutionary origin is probable, are placed together in
superfamilies.
• For example, actins, the ATPase domain of the heat shock
protein, and hexa kinase together form a superfamily.
3. Fold (major structural similarity)
• Proteins are defined as having a common fold if they have the

same major secondary structures in the same arrangement and
with the same topological connections.
• Proteins placed together in the same fold category may not have a
common evolutionary origin: the structural similarities could
arise just from the physics and chemistry of proteins favoring
certain packing arrangements and chain topologies.
Method of Protein Threading
• A general paradigm of protein threading consists of the following

four steps:
• The construction of a structure template database
• Select protein structures from the protein structure databases as
structural templates. This generally involves selecting protein
structures from databases such as PDB, FSSP, SCOP, or CATH,
after removing protein structures with high sequence similarities.
The design of the scoring function
• Design a good scoring function to measure the fitness between
target sequences and templates based on the knowledge of the
known relationships between the structures and the sequences.
• The quality of the energy function is closely related to the

prediction accuracy, especially the alignment accuracy.
Threading alignment
• Align the target sequence with each of the structure templates

by optimizing the designed scoring function.
• This step is one of the major tasks of all threading-based

structure prediction programs that take into account the pair
wise contact potential; otherwise, a dynamic programming
algorithm can fulfill it.
Threading prediction
• Select the threading alignment that is statistically most
probable as the threading prediction.
• Then construct a structure model for the target by placing the

backbone atoms of the target sequence at their aligned
backbone positions of the selected structural template.
Comparison with homology modeling
• Homology modeling and protein threading are both template-
based methods and there is no rigorous boundary between them
in terms of prediction techniques. But the protein structures of
their targets are different.
• Homology modeling is for those targets which have

homologous proteins with known structure(usually/maybe of
same family), while protein threading is for those targets with
only fold-level homology found. In other words, homology
modeling is for "easier" targets and protein threading is for
"harder" targets.
• Homology modeling treats the template in an alignment as a

sequence, and only sequence homology is used for prediction.
Protein threading treats the template in an alignment as a
structure, and both sequence and structure information extracted
from the alignment are used for prediction.
• When there is no significant homology found, protein threading

can make a prediction based on the structure information. That
also explains why protein threading may be more effective than
homology modeling in many cases.
• In practice, when the sequence identity in a sequence sequence

alignment is low (i.e. <25%), homology modeling may not
produce a significant prediction.
• In this case, if there is distant homology found for the target,

protein threading can generate a good prediction.
• It differs from the homology modeling method of structure

prediction as it is used for proteins which do not have their
homologous protein structures deposited in the Protein Data
Bank (PDB), whereas homology modeling is used for those
proteins which do.
• Threading works by using statistical knowledge of the

relationship between the structures deposited in the PDB and
the sequence of the protein which one wishes to model.
Evaluating the Model Errors
• Errors in side chain packing
• Template distortions because of crystal
packing forces
• Loop generation
• Misalignments
• Incorrect templates
The quality of a model can be accessed by

using different tools and servers like
Ramachandran Plot, Procheck, Verify 3D,
Errat
Evaluation Servers
COLORADO3D http://genesilico.pl/
PROCHECK http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html
VERIFY3D http://fold.doe-mbi.ucla.edu/
PROSAII http://www.came.sbg.ac.at/
WHATCHECK http://swift.cmbi.kun.nl/WIWWWI/modcheck.html
ERRAT http://servicesn.mbi.ucla.edu/ERRAT/
Ramachandran Plot
• Ramachandran’s plot is a protein structure validation tool for
checking the detailed residue-by-residue stereo-chemical
quality of a protein structure.
• A good homology model should have >90% of the residues in
the favorable region. Ramachandran plot was constructed for
each protein model using PROCHECK web-server.
PROCHECK
• PROCHECK (Laskowski et al., 1993) is used to estimate
the stereo-chemical quality of a model.
• Overall, PROCHECK program finds covalent geometry,

planarity, dihedral angles, chirality, non-bonded interactions,
main-chain hydrogen bonds, disulphide bonds, stereo
chemical parameters, and residue-by-residue analysis.
VERIFY 3D
• VERIFY 3D (Eisenberg et al., 1997) uses energetic and
empirical methods to produce averaged data points for each
residue to evaluate the quality of protein structures.
• Using this scoring function, if more than 80% of the residue

has a score of >0.2 then the protein structure is considered of
high quality.
ERRAT
• ERRAT (Colovos and Yeates, 1993) is a so-called “overall
quality factor” for non bonded atomic interactions, and higher
scores mean higher quality.
• The normally accepted range is >50 for a high quality model.

Lecture 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4

Uploaded by

Copyright:

Available Formats

Structural and Functional Bioinformatics

• Step 1: Template Selection

Break protein folds into conserved core, loops, and

Easy – 100-40% sequence id - strong sequence

• satisfaction of spatial restraint

• satisfaction of spatial restraints

Emphasis on alignments and then built by Modeller.

• All have similar accuracy.

• Protein threading, also known as fold recognition, is a

• After the best-fit template is selected, the structural model of

• Protein threading is based on two basic observations: that the

• The Structural Classification of Proteins (SCOP) database

• Proteins are classified to reflect both structural and

• Proteins clustered together into families are clearly

• Proteins that have low sequence identities, but whose

• Proteins are defined as having a common fold if they have the

• A general paradigm of protein threading consists of the following

• The quality of the energy function is closely related to the

• Align the target sequence with each of the structure templates

• This step is one of the major tasks of all threading-based

• Then construct a structure model for the target by placing the

• Homology modeling is for those targets which have

• Homology modeling treats the template in an alignment as a

• When there is no significant homology found, protein threading

• In practice, when the sequence identity in a sequence sequence

• In this case, if there is distant homology found for the target,

• It differs from the homology modeling method of structure

• Threading works by using statistical knowledge of the

The quality of a model can be accessed by

• Overall, PROCHECK program finds covalent geometry,

• Using this scoring function, if more than 80% of the residue

• The normally accepted range is >50 for a high quality model.

You might also like