You are on page 1of 25

Statistical Mechanical Models for Understanding Protein Folding

Summer Project by Ronak M Soni Under the guidance of Dr. Purusattam Ray, IMSc

What are proteins?


A protein is a special type of heteropolymer. Speciality is that it was chosen by natural selection. Chosen for biological functions (more on this later). Made of amino acid residues.

Structure of a protein

Protein Structures (contd.)


The particular sequence of residues called primary structure. The regular shapes which sections of the protein collapse into called secondary structure. The irregular shape of the whole protein when you smooth over secondary structure is called tertiary structure.

Why are proteins important?


The information in DNA is encoded into an mRNA. mRNA goes to a ribosome and gets converted into a protein. And then the proteins all rose up as one and spake unto the cells: let there be height!

These are all proteins!


a) b) c) d) ATP synthase, which produces ATP molecules. RNA polymerase, which encodes DNA into RNA. GroEl-GroES complex, which helps in protein folding. Ribosome, which converts mRNAs into proteins.

What is protein folding?


Going back to structure of a protein, you can see that the protein can exist in a variety of configurations at each -Carbon, called conformers. Only a small subset of conformers, usually the most compact ones, biologically active. Proteins naturally selected to be good at folding from extended random coil to compact native state.

Why is protein folding important?

These would be completely useless if they werent folded in this particular way.

The Statistical Mechanical Models: Introduction


Is it possible to predict secondary and tertiary structures from primary structure? Models range from prediction by comparison with known structures to just guessing energy functions and coding it to see what happens. Models Ill discuss are closer to the latter end of the spectrum.

Statistical Mechanics Basics


Ensemble average = Time-average (Ergodic Hypothesis) Even in non-Ergodic systems, they are correlated. At temperature T, probability of finding a state at energy E is

P( E ) e

E kT

Protein Conformation as SAW


The conformation of any molecule with n atoms connected in a line can be thought of as a random walk that doesnt intersect with itself, or a self-avoiding walk (SAW). Random walk one of the basic problems in statistical mechanics, so all models Ill talk about come from this angle.

How the models work


Looks at a protein conformation as an SAW in which not all paths are equally favoured. Each path has an energy attached to it, and probability that this path is taken scaled by that energy as

P( E ) e

E kT

Assumptions made in the models about protein folding


1. Native state is thermodynamically most stable (lowest conformational energy). 2. Action of water molecules so frequent and so random that it can be modelled as a uniform force. 3. Given a primary structure, it is always possible to predict secondary and tertiary structures. 4. Folding is same in cell and in test tube.

Why do proteins fold?


Energy considerations: 1. Hydrogen bonding of polar bonds in backbone as well as side-chain. 2. Sheltering from water of hydrophobic sidechains. The N-H and C=O bonds stabilised by hydrogen bonding:

The Models
Conditioned Self-Avoiding Walk (CSAW) Model, proposed by K. Huang and J. Leil. Interacting Growth Walk (IGW) Model, proposed by S. L. Narasimhan et al. Pruned-Enriched Rosenbluth Method (PERM), proposed by P. Grassberger.

CSAW
Think of folding as a progression from SAW to completely constricted walk. Steps: 1. Take an ensemble of SAWs (every atom is a turning point). 2. Choose a C and rotate every member of the ensemble at that atom by an arbitrary amount. 3. If energy of new conformation less, accept it with a probability of 1. Else, with P(E)=exp(-E/kT). 4. Repeat till result is aesthetically pleasing.

Energy function of CSAW


Energy used by Huang and Leil, who proposed CSAW was

E g1K1 g 2 K 2
K1 is no. of protein atoms surrounding hydrophobic residues. K2 is no. of hydrogen bonds (which also includes open backbone molecules). g1 and g2 constants.

Pros and Cons of Using CSAW


Takes a lot of processing power, because every atom is modelled. Gives a very detailed picture, but also too hard to find simple properites of the native state which you may be interested in. It also folds to conformations that are kinetically more accessible, thereby reducing the no. of assumptions made.

IGW
Grow a protein molecule on a lattice, choosing lower energy walks to have higher probability. Each residue like a point particle, either of H (hydrophobic) or P (hydrophilic) type. Energy due to non-bonded nearest neighbour interactions.

( HH , HP , PP ) (1,0,0)

IGW (contd.)
Obviously, you cant just choose the lowest energy walks at every step, because sometimes frustration may be desirable for an overall lower energy. But its too processor-intensive to find every n-length chain and then choose the lowest energy one. So, do it after every k steps, and find the optimum k.

PERM: Simple Sampling


SS is another growing algorithm on a lattice where you just take every member of your ensemble and add a new monomer to every lattice point around the end of the chain. You shouldnt avoid occupied lattice points, as the probabilities wouldnt be right then. So, you discard every chain with a selfintersection. But the number of chains to discard increases exponentially with number of monomers.

PERM: Rosenbluth-Rosenbluth Method


Rosenbluth-Rosenbluth method tries to solve this problem by avoiding occupied lattice points and adding the others with a weight. The weights can sometimes go so high or so low that any prediction of measurement becomes useless.

PERM: Pruned-Enriched Rosenbluth Method


Pruned-Enriched Rosenbluth Method counters this by taking chains with weights above a certain cutoff and doubling their number and halving their weight (enrichment). Also, halve the number of chains with weights below a certain cutoff and double the remainings weights (pruning).

Results of these models


CSAW can very accurately predict the relations between radius of gyration and number of residues at various stages in the process. IGW very good at predicting native state energy. The math associated with PERM is very complex, and the researchers have found various results with corrections.

References
Echenique, P. Introduction to protein folding for physicists, arXiv:0705.1845v1 [physics.bioph]. Cooper, A. Thermodynamics of Protein Folding and Stability, Protein: A Comprehesive Treatise, Volume 2, pp. 217-270 (1999). Leil, J. and Huang, K. CSAW: A Dynamical Model of Protein Folding, arXiv:condmat/0601244v1 [cond-mat.stat-mech]. Leil, J. and Huang, K. Elastic energy of proteins and the stages of protein folding, Europhys Lett, 88 (2009), 68004. Leil, J. and Huang, K. Protein Folding: A Perspective From Statistical Physics, arXiv:1002.5013v1 [cond-mat.stat-mech] 26 Feb 2010. Tcherkasskaya, O. and Uversky, VN. 2001. Denatured collapsed states in protein folding: Example of apomyoglobin. Proteins: structure, function, and genetics, 44, 244?254. Arteca, G. 1996. Different molecular size scaling regimes for inner and outer regions of proteins. Phys. rev. e, 54, 3044?3047. Hong, L, and Lei, J. 2009. Scaling law for the radius of gyration of proteins and its dependence on hydrophobicity. J. polymer sci. b: Polymer phys., 47, 207-214. Narasimhan, S. L. et al. Protein folding simulations with Interacting Growth Walk model, arXiv:condmat/0112021v3 [cond-mat.stat-mech]. Narasimhan, S. L. et al.A new monte carlo algorithm for growing compact Self Avoiding Walks, arXiv:cond-mat/0108097v4 [cond-mat.stat-mech]. Grassberger, P. Pruned-enriched Rosenbluth Method: Simulations of polymers of chain length up to 1 000 000, Physical Review E, 56 no. 3, 3682-3693.

You might also like