Professional Documents
Culture Documents
Molecular Modelling
Uğur Sezerman
Biological Sciences and Bioengineering Program
Sabancı University, Istanbul
Motivation
Intro. To Struc.
(Tooze and Branden)
Secondary Structure
Prediction
AGVGTVPMTAYGNDIQYYGQVT…
A-VGIVPM-AYGQDIQY-GQVT…
AG-GIIP--AYGNELQ--GQVT…
AGVCTVPMTA---ELQYYG--T…
AGVGTVPMTAYGNDIQYYGQVT…
----hhhHHHHHHhhh--eeEE…
Chou-Fasman Parameters
Name Abbrv P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)
Alanine A 142 83 66 0.06 0.076 0.035 0.058
Arginine R 98 93 95 0.07 0.106 0.099 0.085
Aspartic Acid D 101 54 146 0.147 0.11 0.179 0.081
Asparagine N 67 89 156 0.161 0.083 0.191 0.091
Cysteine C 70 119 119 0.149 0.05 0.117 0.128
Glutamic Acid E 151 37 74 0.056 0.06 0.077 0.064
Glutamine Q 111 110 98 0.074 0.098 0.037 0.098
Glycine G 57 75 156 0.102 0.085 0.19 0.152
Histidine H 100 87 95 0.14 0.047 0.093 0.054
Isoleucine I 108 160 47 0.043 0.034 0.013 0.056
Leucine L 121 130 59 0.061 0.025 0.036 0.07
Lysine K 114 74 101 0.055 0.115 0.072 0.095
Methionine M 145 105 60 0.068 0.082 0.014 0.055
Phenylalanine F 113 138 60 0.059 0.041 0.065 0.065
Proline P 57 55 152 0.102 0.301 0.034 0.068
Serine S 77 75 143 0.12 0.139 0.125 0.106
Threonine T 83 119 96 0.086 0.108 0.065 0.079
Tryptophan W 108 137 96 0.077 0.013 0.064 0.167
Tyrosine Y 69 147 114 0.082 0.065 0.114 0.125
Valine V 106 170 50 0.062 0.048 0.028 0.053
Computational Approaches
Ab initio methods
Threading
Comperative Modelling
Fragment Assembly
Ab-initio protein structure prediction as
an optimization problem
Chen Keasar
conformation
BGU
A dream function
☺ Has a clear minimum in the native structure.
☺ Has a clear path towards the minimum.
☺ Global optimization algorithm should find the
native structure.
Chen Keasar
BGU
An approximate function
☺ Easier to design and compute.
Native structure not always the global minimum.
Global optimization methods do not converge. Many
alternative models (decoys) should be generated.
No clear way of choosing among them.
Decoy set
Chen Keasar
BGU
Fold Optimization
Sometimes:
Penalize for buried polar or surface
hydrophobic residues
Learning from Lattice
Models
atom
extended
atom
half a
residue
residue
CATH website
www.cathdb.info
Genetic Algorithm used as
a search tool
We are searching for the minima of our fitness function composed of
profile and contact energy terms.
In this problem value encoding have been used. Parents are represented as
strings of positions. Population Size is 50.
A sample parent (string of positions) is figured below:
12345 10 11 12 13 14 23 24 25 26 27 28 29 30 31 32 55 56 57 58
Branch and Bound algorithm have been used to produce random initial
parents.
Mutation:
Mutation operator is the shifting of the structure’s position either to the right
or left by some units.
Crossover:
Two-point cross-over is applied where , selected suitable structures are
exchanged between two parents.
Our Aim
A C D E F G H I K L M N P Q R S T V W Y -
-0.33 -0.67 0.68 0.01 -1.33 0.01 0.34 -1 0.01 -1.33 -0.67 2.34 -0.67 0.01 -0.33 0.34 0.01 -1 -1.33 -0.67 4.01
0.34 -2 -0.33 -1 -3.33 -0.67 -1.33 -3 -0.33 -3.33 -2.33 0.01 2.68 -0.33 -1.67 3.01 1.01 -2.33 -4 -2.33 0.01
-1 -3 2.01 6.01 -3 -3 0.01 -4 1.01 -3 -2 0.01 -1 2.01 0.01 -1 -1 -3 -3 -2 0.01
-1 -3 0.01 2.68 -3.67 -2.33 0.01 -3.33 4.34 -3 -2 0.01 -1 2.01 2.01 -0.33 -1 -3 -3 -2 0.01
-1.33 -2 -4 -3.67 0.34 -4 -3.67 4.01 -3 3.01 2.34 -3.33 -3.33 -2.67 -3.67 -3 -1 3.01 -2.67 -1 0.01
-2 -2 -4 -3 1.01 -4 -3 2.01 -3 5.01 3.01 -4 -4 -2 -3 -3 -1 1.01 -2 -1 0.01
-2 -2 -4 -3 1.01 -4 -3 2.01 -3 5.01 3.01 -4 -4 -2 -3 -3 -1 1.01 -2 -1 0.01
0.01 -2 -0.67 -0.67 -3 -1 -0.67 -3.33 1.01 -3 -2 0.34 -1.67 0.34 1.68 3.01 1.01 -2.33 -3.67 -1.67 0.01
-3 -5 -5 -3 1.01 -3 -3 -3 -3 -2 -1 -4 -4 -1 -3 -4 -3 -3 15.01 2.01 0.01
1.68 -1 -3.33 -2.33 -1.67 -2.67 -3.33 2.34 -2.33 0.01 0.34 -2.33 -2.33 -2.33 -2.67 -1 0.01 3.34 -3 -1.33 0.01
-1.67 -3.33 -0.67 0.01 -3.33 -2 0.34 -3.67 2.01 -3.33 -2 1.68 -2.67 0.68 4.34 -0.33 -0.67 -3 -3.33 -1.33 0.01
-1.67 -2.67 -1.67 0.34 0.01 -2.67 0.34 -2 0.01 -1 0.01 -1.33 -2 3.34 -0.33 -1 -1.33 -2.33 -0.33 0.68 0.01
-0.33 -1.67 -0.67 -0.67 -2 -1.33 2.34 -2.67 -0.33 -2.33 -1.33 0.68 -1.33 0.01 -0.67 2.01 1.68 -2 -3.33 -0.67 0.01
-0.67 -1 -1.67 -1.33 -0.33 -2 -1.67 0.34 -1.33 1.34 0.68 -1.33 -1.67 -1 -1.33 -0.33 1.34 0.34 -1.67 -1 2.01
-1 -2.33 0.01 2.01 -2 -2 0.01 -2.67 1.34 -2 -1.33 -0.33 -1.33 1.01 2.34 -0.67 -0.67 -2 -2 -1 2.01
-0.33 -0.67 0.68 0.01 -1.33 0.01 0.34 -1 0.01 -1.33 -0.67 2.34 -0.67 0.01 -0.33 0.34 0.01 -1 -1.33 -0.67 4.01
Positions
Profile scores
Contact Potential Energy
Eight helixes of the following sequences are selected and
each sequence is threaded to the other one and the shifts
from the real structures are shown below.
Target Sequences
1noa T T T T T T 1 T T T T T T T -1 T T T TTTTTT5TT T T T T -2 -2 T T T
Target Sequences
Template
sequences
Conclusion for fitting to a
given fold
-1000
-1200
-1400
-1600
-1800
Energy Values
-2000
-2200
-2400
Other members
-2600 of 1ubi's family
1e0q
-2800 1ubi
1f9j
-3000
0 100 200 300 400 500 600
Protein ID
All Beta
1acx Threading Results
-1000
-1200
-1400
-1600
-1800
Energy Values
-2000
-2200
1klo 1zfo
-2400 1c01
-2600
-2800
1acx
-3000
0 100 200 300 400 500 600 700
Protein ID
All Alpha
1bhd Threading Result
-1000
-1200
-1400
-1600
-1800
Energy Values
-2000
-2200
1hg6 1qld 2pcf
1dfu
-2400
-2600 1bhd
-2800
-3000
0 100 200 300 400 500 600 700
Protein ID
CONCLUSION
o Find template
o Generate model:
- add loops
- add sidechains
o Refine model
Prediction of Protein
Structures
pL>0.9xpc
pc
pc
50
47 pR>0.9xpc
pR>0.9xpc
Recursive Smith-Waterman Local
Alignment Algorithm with Affine
Gap Penalty
•A(i, j ) = max X ∈ { A, B , C } { X (i-1, j-1) + S(i,j)}
•B(i, j ) = max { A(i-1, j ) + go + ge, B(i-1, j ) + ge, C (i-1, j ) + go + ge}
•C (i, j ) = max { A(i, j-1) + go + ge, B(i, j-1) + go + ge, C (i, j-1) + ge}
Build 3 matrices:
A for the matches;
B for the gaps on template;
C for gaps on target.
⌧S(i,j) : Pairwise Similarity Score
⌧go : Gap opening penalty
⌧ge : Gap extension penalty
Tracing back : Include the paths that have score > 0.9xMax
Recursive Smith-Waterman Local
Alignment Algorithm with Affine
Gap Penalty
S(i,j) = sc × SSS(i,j) + ac × SS(i,j) + tc × TS(i,j)
...ALVKLI...
S ( j ) = −∑ P(i, j ) × log P(i, j )
i =1
...A-IEII...
...AL-KLI... 1
C (i ) =
1 + S (i )
S (i ) : Entropy at position j of template
P : Family Profile Matrix
C (i ) : Conservati on score at position j of template
T (i ) = 1 if i = T; else 0
Turn Prediction Servers
P(., j ) : Turn profile of Target at position j
tc : Turn Similarity Coefficient
Gap Penalties
...L... 2
...-... go = − × go
3
2
ge = − × ge
3
Tural Aksel
Bora Uyar
Eylül Harputlugil