Professional Documents
Culture Documents
Lecture 3
Lecture 3
Lecture 3
For 3D structure prediction there exist two basic
approaches:
(1)compare the structure with proteins with known
structure, or
(2)to predict the structure just from the sequence
including physical laws and empirical knowledge.
For (1.1) alignment methods like BLAST are used and for
(1.2) different threading methods are introduced.
03/08/20 09:08
Bioinformatics how to
…
use publicly available free tools to predict protein structure by
comparative modeling
Proteins are 3D objects with
complex shapes
No assumptions, this
is nature telling us
how it is
GNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPA
QNTAHLDQFERIKTLGTGSFGRVMLVKHKETGNH
FAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPF
LVKLEYSFKDNSNLYMVMEYVPGGEMFSHLRRIG
RFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPE
NLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEY
LAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPF
FADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNL
LQVDLTKRFGNLKDGVNDIKNHKWFATTDWIAIY
QRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSIN
EKCGKEFSEF
prediction
How can we make such
assumptions?
• Statistical reliability of the prediction
• E-value - the number of hits one can "expect" to see just
by chance when searching a database of a particular size
(closer to zero the better)
• Z-score – score expressed as a distance from the mean
calculated in standard deviations (the bigger the better)
Similar, but not homologous
• . . . . .
99 IRLKSYCNDQSTGDIKVIGGDDLSTLTGKNVLIVEDIIDTGKTMQTLLSLVRQY.NPKMVKVASLLVKRTPRSVGY 173
: ||. ||| || |. || | : | | | | || | || |:| | ||.| |
214 VPLKTDANDQ.IGDSLY....SAMTVDDFGVLAVRVVNDHNPTKVT..SKVRIYMKPKHVRV...WCPRPPRAVPY 279
•
Different, but homologous
• Histone H5 and transcription factor E2F4, identity 7%, similar fold, similar
function (DNA binding)
• PTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQIKLSIRRLLAAGVLKQTKGVGASGSFRL
| | | | |
• GLLTTKFVSLLQEAKD-GVLDLKLAADTLA------VRQKRRIYDITNVLEGIGLIEKKS----KNSIQW
COMPARATIVE or HOMOLOGY MODELING
The aim is to build a 3-D model for a protein of unknown structure
(target) on the basis of sequence similarity to proteins of known
structure (templates).
COMPARATIVE or HOMOLOGY MODELING
The aim is to build a 3-D model for a protein of unknown structure
(target) on the basis of sequence similarity to proteins of known
structure (templates).
Identify template(s)
– Initial alignment
• Improve alignment
• Backbone
generation
• Loop modelling
• Side chains
• Refinement
• Validation
Steps in Homology Modeling
Core
Template Search
BLAST http://www.ncbi.nlm.nih.gov/BLAST/
FastA http://www.ebi.ac.uk/fasta33/
SSM http://www.ebi.ac.uk/msd-srv/ssm/
PredictProtein http://www.predictprotein.org/
123D; SARF2; PDP http://123d.ncifcrf.gov/
GenTHREADER http://bioinf.cs.ucl.ac.uk/psipred/
UCLA-DOE http://fold.doe-mbi.ucla.edu/
→ BLAST
!!
This is the most crucial step in the process.
The process of homology modeling can not
recover from a bad alignment.
Sequence Alignment
EMBOSS http://www.ebi.ac.uk/emboss/align/
Tcoffee http://www.igs.cnrs-mrs.fr/Tcoffee
ClustalW http://www.ebi.ac.uk/clustalw/
BCM http://searchlauncher.bcm.tmc.edu/multi-align/
POA http://www.bioinformatics.ucla.edu/poa/
STAMP http://www.ks.uiuc.edu/Research/vmd/
SwissModel http://www.expasy.org/spdbv/
Core
Sequence Alignment
EMBOSS http://www.ebi.ac.uk/emboss/align/
Tcoffee http://www.igs.cnrs-mrs.fr/Tcoffee
ClustalW http://www.ebi.ac.uk/clustalw/
BCM http://searchlauncher.bcm.tmc.edu/multi-align/
POA http://www.bioinformatics.ucla.edu/poa/
STAMP http://www.ks.uiuc.edu/Research/vmd/
SwissModel http://www.expasy.org/spdbv/
This tool is used to compare 2 sequences. When you want an alignment that
covers the whole length of both sequences, use needle. When you are trying to
find the best region of similarity between two sequences, use water.
Sequence Alignment
EMBOSS http://www.ebi.ac.uk/emboss/align/
Tcoffee http://www.igs.cnrs-mrs.fr/Tcoffee
ClustalW http://www.ebi.ac.uk/clustalw/
BCM http://searchlauncher.bcm.tmc.edu/multi-align/
POA http://www.bioinformatics.ucla.edu/poa/
STAMP http://www.ks.uiuc.edu/Research/vmd/
SwissModel http://www.expasy.org/spdbv/
This program is more accurate than ClustalW for sequences with less than
30% identity, but it is slower...
Sequence Alignment
EMBOSS http://www.ebi.ac.uk/emboss/align/
Tcoffee http://www.igs.cnrs-mrs.fr/Tcoffee
ClustalW http://www.ebi.ac.uk/clustalw/
BCM http://searchlauncher.bcm.tmc.edu/multi-align/
POA http://www.bioinformatics.ucla.edu/poa/
STAMP http://www.ks.uiuc.edu/Research/vmd/
SwissModel http://www.expasy.org/spdbv/
Deep View
Swiss-PdbViewer
by
Nicolas Guex, Alexandre Diemand , Manuel C. , &
Torsten Schwede
Threading
A.A. Substitution Matrix
A C D E F G H I K L M N P Q R S T V W Y
A 5 -2 0 1 -2 0 0 -1 0 -1 0 0 1 0 -1 1 0 0 -2 -2
C -2 8 -2 -3 -3 -2 0 -2 -3 -3 0 -2 -3 -3 -2 -1 -1 -2 -1 -2
D 0 -2 5 2 -2 0 1 -3 0 -2 -1 2 0 1 -2 0 0 -2 -3 -2
E 1 -3 2 5 -3 0 -1 -2 1 -2 -2 1 1 2 0 1 1 -1 -2 -1
F -2 -3 -2 -3 6 -3 1 0 -3 2 2 -3 -2 -3 -2 -1 -2 0 3 3
G 0 -2 0 0 -3 5 -1 -2 0 -2 -2 0 0 -1 0 0 -1 -1 -2 -3
H 0 0 1 -1 1 -1 5 -1 1 -1 0 1 0 1 2 0 1 -1 0 1
I -1 -2 -3 -2 0 -2 -1 5 -2 2 2 -2 -2 -3 -2 -1 0 2 9 0
K 0 -3 0 1 -3 0 1 -2 5 -1 -2 1 0 1 2 0 0 -1 -2 -2
L -1 -3 -2 -2 2 -2 -1 2 -1 5 3 -2 -2 0 -1 -1 0 2 0 0
M 0 0 -1 -2 2 -2 0 2 -2 3 5 -1 -2 0 -2 -1 0 1 -2 -1
N 0 -2 2 1 -3 0 1 -2 1 -2 -1 5 -2 1 0 2 0 -2 -3 -1
P 1 -3 0 1 -2 0 0 -2 0 -2 -2 -2 8 0 0 0 0 -1 -3 -3
Q 0 -3 1 2 -3 -1 1 -3 1 0 0 1 0 5 2 1 0 -1 -1 -2
R -1 -2 -2 0 -2 0 2 -2 2 -1 -2 0 0 2 5 1 0 -1 0 -1
S 1 -1 0 1 -1 0 0 -1 0 -1 -1 2 0 1 1 5 2 -1 0 0
T 0 -1 0 1 -2 -1 1 0 0 0 0 0 0 0 0 2 5 0 -1 -2
V 0 -2 -2 -1 0 -1 -1 2 -1 2 1 -2 -1 -1 -1 -1 0 5 -1 0
W -2 -1 -3 -2 3 -2 0 9 -2 0 -2 -3 -3 -1 0 0 -1 -1 6 3
Y -2 -2 -2 -1 3 -3 1 0 -2 0 -1 -1 -3 -2 -1 0 -2 0 3 6
F↔F 6 F↔Y 3 F ↔ K -3
Alignment Matrix
V A T T P D K S W L T V
A
S
0
-1
5
1
0
2
0
2
1
0
0
0
0
0
1
5
-2
0
-1
-1
0
2
0
-1
Sequence A:
T 0 0 5 5 0 0 0 2 -1 0 5 0 VATTPDKSWLTV
P -1 1 0 0 8 0 0 0 -3 -2 0 -1
E -2 1 1 1 1 2 1 1 -2 -2 1 -2
R -1 -1 0 0 0 -2 2 1 0 -1 0 -1 Sequence B:
A 0 5 0 0 1 0 0 1 -2 -1 0 0
S -1 1 2 2 0 0 0 5 0 -1 2 -1
ASTPERASWLGTA
W -1 -2 -1 -1 -3 -3 -2 0 6 0 -1 -1
L 2 -1 0 0 -2 -2 -1 -1 0 5 0 2
G -1 0 -1 -1 0 0 0 0 -2 -2 -1 -1
T 0 0 5 5 0 0 0 2 -1 0 5 0
A 0 5 0 0 1 0 0 1 -2 -1 0 0
VATTPDK-SWLTV- VATTPDK-SWL-TV
|*||** ||| |*||** ||| |* Core
-ASTPERASWLGTA -ASTPERASWLGTA
score 39 score 45
Multiple Sequence Alignment