You are on page 1of 6

ARTICLE IN PRESS

Physica A 384 (2007) 122–127


www.elsevier.com/locate/physa

Minireview: The compact phase in polymers and proteins


F. Seno, A. Trovato
CNISM, Unità di Padova, Dipartimento di Fisica ‘G. Galilei’, Università di Padova, Via Marzolo 8, 35131 Padova, Italy
Available online 29 April 2007

Abstract

Proteins are linear molecules. However, the simple model of a polymer viewed as spheres tethered together does not
account for many of the observed characteristics of protein structures. Here we review some recent works tackling this
problem. In particular, we will show that there is a growing evidence suggesting that the compact structures of folded
proteins are selected in their gross topological features based on geometry and symmetry rather than on sequence
consideration. They are poised at the edge of compaction, thus accounting for their flexibility. Different aspects of protein
behavior can be rationalized by studying how the energy landscape of a single chain in the marginally compact phase can
be modified.
r 2007 Elsevier B.V. All rights reserved.

PACS: 87.14.Ee; 82.35.Lr; 87.15.Aa

Keywords: Polymer; Proteins; Compact phase; Hydrogen bonds

1. Introduction

Proteins are linear flexible heteropolymers, made up of 20 different amino-acid species. Most natural
proteins in solution have roughly spherical compact shapes, and thus are usually referred to as globular
proteins. The fundamental fact about globular protein sequences is their ability to attain a native three-
dimensional folded conformation in physiological conditions [1]. Correct and reproducible folding into the
native state, uniquely determined by the primary sequence, is essential for biological functions.
Protein folding is an even more remarkable process from a physical perspective. The large flexibility of the
main backbone chain implies proteins have at their disposal a huge conformational space, with a number of
accessible conformations which increases exponentially with chain length and becomes astronomically large
for the actual size of proteins. As first pointed out by Levinthal [2], it is thus quite puzzling that a protein be
able to fold always into the same native state in very short times, ranging from 103 to 101 s, without being
trapped in an endless search.
Large globular proteins indeed do need the aid of molecular chaperones in order to fold correctly ‘‘in vivo’’
[3], but the reversible folding/unfolding of small proteins ‘‘in vitro’’ is an equilibrium thermodynamic process
[4], taking place in solution without the help of any cell machinery. The central questions in the field have thus

Corresponding author. Tel.: +39 8277159; fax: +39 8277102.


E-mail addresses: seno@pd.infn.it (F. Seno), trovato@pd.infn.it (A. Trovato).

0378-4371/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.physa.2007.04.075
ARTICLE IN PRESS
F. Seno, A. Trovato / Physica A 384 (2007) 122–127 123

been: Which kind of energy landscape could possibly allow a given amino-acid sequence to reconcile the
Levinthal’s paradox? Which sequences would then display such an energy landscape?

2. Protein sequences are special

The basic observation that the folding of globular proteins is predominantly driven by the aversion to water
of some amino-acid side chains, leading to the formation of a hydrophobic core in the compact folded state
[5], prompted theorists to tackle the problem using well known concepts from polymer physics. Long linear
homopolymers under bad solvent conditions undergo a second order phase transition, the y-transition, from a
high temperature swollen phase to a low temperature collapsed phase [6]. Such collapsed phase is favored in
energy but has a lower entropy than the swollen phase. Nevertheless, even in the collapsed phase there is still a
huge amount of degenerate compact conformations. Those conformations are separated by high barriers in a
highly rough energy landscape [7,8].
Moreover, heterogeneity induces frustration in the system, increasing the roughness of the landscape
according to the usual paradigm of disordered systems such as spin glasses. The Levinthal’s puzzle is then
solved ‘‘only’’ by very few ‘good folder’ sequences displaying a large energy gap between their native ground
states and other competitive low energy compact conformations. These sequences, selected by a ‘minimal
frustration principle’ [9], present a smooth funneled free energy landscape that guides correct folding to the
native state.
Almost all the theoretical framework has been worked out by means of simplified lattice models employing
suitably coarse-grained amino-acid alphabets. The notion that general universal properties should not depend
on the small scale details of the system under consideration is central in statistical mechanics and here it
receives a further confirmation given the success of the ‘energy landscape perspective’ in accounting for a
qualitative description of protein folding properties [10].
In practice, however, simplified models fail in describing correctly quantitative features (for instance in
structure prediction), unless the knowledge of the native structure is assumed ‘a priori’ as in the so called
Go-like models [11], or unless evolutionary information from different homologous sequences is used for
prediction purposes. Moreover, it turns out it is extremely crucial to employ homologous sequences in order to
capture correctly local conformational properties [12–14].

3. Protein structures are special

The question of how a given sequence can produce a funneled free energy landscape implies an analysis
done sequence by sequence. Upon shifting perspective towards structure [15], it can be realized that protein
native conformations display even more remarkable properties than sequences folding onto them.
Almost all native states exhibit extremely regular geometric patterns known as secondary structures [16],
consisting of locally ordered motifs, mainly a-helices and extended b-sheets. They were first predicted by
Pauling and Corey by simply looking for regular structures stabilized by intra-backbone hydrogen bonds
[17,18]. Local steric effects [19] also favor the polypeptide chain adopting either the helical or the extended
conformation. Interestingly, the mechanisms of hydrogen bonds and sterics are unrelated to each other and
yet lead to the same consequences.
The compact tertiary arrangement of secondary structures determines the unique three-dimensional native
conformation of a biologically active protein. The geometrical disposition (‘‘topology’’) of the different
secondary structure elements in such arrangement and the way they are interconnected by loops
(‘‘architecture’’) are used to cluster native state structures into folds [20].
A few facts are quite remarkable: (a) the total number of distinct folds is only of the order of a few
thousands [21]; (b) longer globular proteins form domains which fold autonomously [22]; (c) the fold topology
of single domains is conserved by natural evolution [23] and the duplication and recombination of single
domains is thought to be a major determinant of the evolution of multi-domains proteins in higher organisms
[24]; (d) many proteins share the same native state fold [25] and the mutation of one amino acid into another
only rarely leads to radical changes in the native state structure; and (e) multiple protein functionalities can
arise within the context of a single fold [26].
ARTICLE IN PRESS
124 F. Seno, A. Trovato / Physica A 384 (2007) 122–127

All these facts suggest that simplified protein models should be able to account for the existence of a limited
number of available protein-like structures, prior to any consideration of a specific amino-acid sequence.
A phase of physical matter should exist, housing protein native folds as the preferred ground state structures
exhibiting local secondary motifs.

4. Hydrogen bonds and tubes: marginally compact phase

A proper modeling of secondary structure elements should in principle take care of their main stabilizing
interaction, that is intra-backbone hydrogen bonds (note that this feature does not depend on sequence).
Strikingly enough, one does not need explicit hydrogen bond modeling, not even at a coarse-grained level, in
order to account for the emergence of local secondary motifs.
In fact, exact enumerations of short maximally compact homopolymers in lattice models shows that a
significant amount of secondary structure (as properly defined in the lattice geometry) is formed simply
as a result of compactness [27]. This result was very recently confirmed by Monte Carlo simulation of longer
chains [28].
Nevertheless, the same property does not hold in off-lattice simulations [29,30]. In this case explicit
hydrogen bond modeling is a necessary ingredient to obtain ground state structures rich of helical content
[31,32]. This implies that the local order imposed by lattice geometry is not an irrelevant detail if one is
interested in secondary structure formation.
Alternatively, helices and planar hairpins and sheets can be obtained as ground states of short chains in an
off-lattice polymer model without explicit hydrogen bonding, provided the correct symmetry pertaining to a
chain molecule is captured [33].
Usual polymer modeling by tethering hard spheres along the chain [34] do not comply with the basic notion
that a chain molecule is inherently anisotropic because of the special direction defined by the tangent to the
chain. The correct symmetry is captured by describing the chain as a self-avoiding tube of non-zero thickness.
It is possible to define an energy function based on the coordinates of the tube centerline by employing a
suitable three-body potential realizing the thick tube constraint [35].
The optimal local packing of a self-avoiding tube is achieved by a special helical shape, defined by a precise
ratio between the pitch and the radius c ¼ 2:512 [36]. The latter value is strikingly similar to the one found in
a-helices in globular proteins [36].
If compactness is induced by an attractive interaction between pairs of points on the tube centerline, ground
state conformations of short tubes depend on the ratio between the tube thickness and the range of the
attractive interaction. In between a maximally compact (for small thickness) and a swollen (for large
thickness) regime, both characterized by a vast amount of degeneracy, an intermediate ‘‘marginally compact’’
regime opens up when the thickness is tuned with the interaction range. The degeneracy is greatly reduced and
the ground state conformations at the edge of compactness are helices and planar hairpins and sheets [37,38].
This mechanism might be responsible for a substantial reduction of the conformational search to be
performed by a protein in the folding process. It does not depend on the amino-acid sequence and it is
consistent with the general tendency of a polypeptide chain to populate secondary structure conformers due to
local steric effects, as identified in the Ramachandran plot of residues in loops, i.e., in protein portions which
do not participate in secondary structures [39].
Upon changing the symmetry of the system from a chain of spheres to a thick tube, secondary motifs
emerge neatly in the absence of hydrogen bonding, although only locally or for very short tubes. Secondary
structures seem to be a robust although local property of the marginally compact phase of generic thick
polymers [33]. Interestingly, synthetic oligomers have been shown to fold into helices without the presence of
hydrogen bonds [40]. Polypeptide chains, however, need hydrogen bonds to form stable secondary structure
elements [41].
For longer tubes the marginally compact phase becomes an anisotropic globular phase displaying a nematic
liquid–crystalline ordering, as detected by both analytic mean field computations and Monte Carlo
simulations, where different tube portions try to align against each other as a result of overall compaction
[33,42]. Similar anisotropic globular phases were observed, induced by local stiffness [43–45], or by interacting
dipoles attached to the chain [46,47], or by the anisotropy of globule surface energy [48].
ARTICLE IN PRESS
F. Seno, A. Trovato / Physica A 384 (2007) 122–127 125

5. Pre-sculpted energy landscape

The results discussed in the previous section suggest that a minimal coarse-grained model describing the
common sequence-independent properties of a polypeptide chains ought to incorporate at least the following
ingredients: tube symmetry, hydrogen bonds, effective hydrophobic pairwise attraction, and local conforma-
tional bias towards protein-like conformers. The two latter properties depend indeed on the amino-acid sequence
and should then be introduced in the modeling of a homo-polypeptide chain as common to the whole chain.
Hydrogen bonds were modeled at the C a -level by using geometrical constraints derived from a statistical
analysis of protein native structures in the Protein Data Bank (PDB) [49]. Notably, the employed hydrogen
bond rules are not very different from those used in an independent approach [50]. Local conformational bias
was introduced in the simplest way as a bending rigidity penalizing tight bond angles (occurring in a-helices)
with respect to extended bond angles (typical of b-strands) [42].
Such a simplified model was indeed shown to exhibit remarkable protein-like features [42,49]. Having fixed
the tube thickness and hydrogen bond energy scale, a marginal compact phase opens up for intermediate
hydrophobicity values, characterized by a marked reduction in the degeneracy of ground state conformations.
The corresponding low energy conformations are nicely protein-like since they display a high secondary
structure content. This fact has been seen for length up to 48 residues where beautiful tertiary arrangements
reminiscent of some of the actual fold architecture and topologies [51] were found. The choice of the bending
rigidity can bias the low energy conformations of the model from being rich in a-helices to being rich in
b-structure, but an appropriate choice exist for which mixed a=b-structures are local minima in the energy
landscape. Note that cooperative hydrogen bonding and a different energy scale favoring local over non-local
hydrogen bond formation are crucial within the approach.
The model explicitly showed for the first time that steric effects (as captured by the tube symmetry),
hydrogen bond geometry (captured at the C a -level), hydrophobicity, and local conformational bias (modeled
in this case via bending rigidity) are sufficient to predict the existence of a limited set of ground state
structures, without the need of considering any specific amino-acid sequence.
This pre-determined menu of putative folds can be used as a fixed backdrop by evolution in the selection of
natural protein sequences. The role of the sequence is to choose one of the available folds, implying the notion
of negative design against competing folds [42]. Within the proposed pre-sculpted energy landscape, the
selection of ‘‘good folder’’ sequences looks much easier than in the original sequence-based funnel perspective.
Funnels corresponding to each of the putative folds are already pre-sculpted in the free energy landscape of a
homo-polypeptide chain. In other words, much of the entropy reduction needed to overcome the costly
conformational search is provided by just the physico-chemical properties common to all polypeptide chains in
the marginally compact phase. More freedom is thus left to nature in order to evolve protein sequences which at
the same time are good folders and fulfill other conditions needed to accomplish their biological function, such
as the existence of active catalytic and binding sites or the fruitful interaction with other biomolecules [42].
An obvious counterpart of the pre-sculpted energy landscape idea was realized as early as in 1992 [52]; any
native structure achieving the minimum energy for a given polypeptide sequence, corresponds to a local
minimum in the energy landscape of poly(L-alanine) for which the backbone and C b structural agreement is
very close [52]. Also the role of chain stiffness and overall compactness in reproducing bond vector correlation
function in protein native structures had been already investigated [53].
Much along the same ideas, it was recently shown that randomly generated compact homo-polypeptide
conformations all have similar folds in the PDB, and conversely all compact single-domain folds in the PDB
have structural analogues among random compact conformations [54]. Even more strikingly, also the presence
of active-site-like geometries seems to be a consequence of the packing of compact secondary structure
elements [54]. However, in such a study a local bias towards the assigned secondary structure was employed,
and the length and location of each secondary element were randomly selected based on statistics from PDB
[54]. It remains to be seen whether such statistics itself can be reproduced by means of reduced models without
introducing any ‘ad hoc’ bias from PDB.
All these results strongly support the hypothesis that the protein energy landscape is not depending in its
gross features on the details of side chain packing, but rather shaped by properties common to all proteins in
the marginally compact phase of a generic polypeptide chain [42].
ARTICLE IN PRESS
126 F. Seno, A. Trovato / Physica A 384 (2007) 122–127

6. Perturbing the landscape

The reduced coarse-grained model introduced in Ref. [49] is characterized by low-energy protein-like
conformations when poised in the marginally compact region of the phase diagram. It is extremely useful as a
framework where seemingly distinct aspects of protein behavior can be reconciled [55].
In the context of the design of protein natural sequences it is possible to show that a crude scheme with
just two kinds of amino acids, which takes into account the hydrophobic (H) and polar (P) character of
the amino acids [56,57], is sufficient to carry out a successful design of sequences with a variety of target
structures [58,55].
Intrinsically unstructured proteins [59] are biological molecules which under physiological conditions do not
exhibit extensive structural order in solution, but adopt relatively rigid conformations in the presence of
natural ligands, thus undergoing a loss of conformational entropy upon binding [60]. This process was
successfully modeled in its basic features by considering the interaction of a homo-polypeptide chain with
suitable target geometries which mimic molecular recognition mechanisms in the crudest way. The way the
pre-sculpted energy landscape of an isolated homo-polypeptide chain is affected by the presence of the target
geometry dictates the structure of the resulting conformation upon binding [55].
In the context of protein aggregation, the same ingredients which determine the pre-determined set of
globular protein native folds, in the case of multiple chains drive their aggregation into extensive b-structures,
either sandwiches of b-sheets or b-helices, irrespective of their conformation when isolated [42,55]. Such
structures might constitute the building blocks of cross-b amyloid fibrils, protein aggregates involved in
several neurodegenerative human diseases, in agreement with the experimental observation that such
structures can be formed ‘in vitro’ by many different proteins [61,62].
Repeat proteins contain domains which are composed of repeating homologous structural units (repeats),
tightly stacking together to form a hydrophobic core common to the different repeats. A hydrophobic-polar
sequence composed of regularly repeated patterns yields as a ground state a b-helical structure remarkably
similar to a known architecture in the PDB [63].
All the examples briefly discussed in this section show how the energy landscape of an isolated homo-
polypeptide chain in the marginally compact phase can be affected by either the introduction of sequence,
and/or the presence of multiple chains, or the presence of a binding substrate.
Future developments should take into account a more detailed way of taking into account local sequence-
dependent structural bias [64,65].

Acknowledgments

We are grateful to J.R. Banavar, T.H. Hoang, and A. Maritan for several insightful discussions. FS thanks
H. Orland for pointing Ref. [52] to his attention.

References

[1] T.E. Creighton, Proteins: Structures and Molecular Properties, W.H. Freeman, New York, 1993.
[2] C. Levinthal, J. Chem. Phys. 65 (1968) 44.
[3] A.L. Horwich, Proc. Natl. Acad. Sci. USA 96 (1999) 11033.
[4] C.B. Anfinsen, Science 181 (1973) 223.
[5] W. Kauzmann, Adv. Protein Chem. 14 (1959) 1.
[6] P.G. de Gennes, Scaling Concept in Polymer Physics, Cornell University Press, Ithaca, New York, 1979.
[7] R. Du, A.Y. Grosberg, T. Tanaka, M. Rubinstein, Phys. Rev. Lett. 84 (2000) 2417.
[8] V.G. Rostiashvili, G. Migliorini, T.A. Vilgis, Phys. Rev. E 64 (2001) 051112.
[9] J. Bryngelson, P.G. Wolynes, Proc. Natl. Acad. Sci. USA 84 (1987) 7524.
[10] V.S. Pande, A.Yu. Grosberg, T. Tanaka, Rev. Mod. Phys. 72 (2000) 259.
[11] H. Abe, N. Go, Biopolymers 20 (1981) 1013.
[12] R. Bonneau, D. Baker, Annu. Rev. Biophys. Biomol. Struct. 30 (2001) 173.
[13] G. Chikenji, Y. Fujitsuka, S. Takada, Proc. Natl. Acad. Sci. USA 103 (2006) 3141.
[14] S.Y. Kim, W. Lee, J. Lee, J. Chem. Phys. 125 (2006) 194908.
[15] G.D. Rose, P.J. Fleming, J.R. Banavar, A. Maritan, Proc. Natl. Acad. Sci. USA 103 (2006) 16623.
ARTICLE IN PRESS
F. Seno, A. Trovato / Physica A 384 (2007) 122–127 127

[16] M. Levitt, C. Chothia, Nature 261 (1976) 552.


[17] L. Pauling, R.B. Corey, H.R. Branson, Proc. Natl. Acad. Sci. USA 37 (1951) 205.
[18] L. Pauling, R.B. Corey, Proc. Natl. Acad. Sci. USA 37 (1951) 729.
[19] G.N. Ramachandran, V. Sasisekharan, Adv. Protein Chem. 23 (1968) 283.
[20] C.A. Orengo, J.M. Thornton, Annu. Rev. Biochem. 74 (2005) 867.
[21] C. Chothia, Nature 357 (1992) 543.
[22] P.L. Privalov, Adv. Protein Chem. 35 (1982) 1.
[23] C.P. Ponting, R.R. Russell, Annu. Rev. Biophys. Biomol. Struct. 31 (2002) 45.
[24] C. Chothia, J. Gough, C. Vogel, S.A. Teichmann, Science 300 (2003) 1701.
[25] B.W. Matthews, Annu. Rev. Biochem. 62 (1993) 139.
[26] L. Holm, C. Sander, Proteins 28 (1997) 78.
[27] H.S. Chan, K.A. Dill, Proc. Natl. Acad. Sci. USA 87 (1990) 6388.
[28] R. Oberdorf, A. Ferguson, J.L. Jacobsen, J. Kondev, Phys. Rev. E 74 (2006) 051801.
[29] N.D. Socci, W.S. Bialek, J.N. Onuchic, Phys. Rev. E 49 (1994) 3440.
[30] L.M. Gregoret, F.E. Cohen, J. Mol. Biol. 219 (1991) 109.
[31] J.P. Kemp, Z.Y. Chen, Phys. Rev. Lett. 81 (1998) 3880.
[32] A. Trovato, J. Ferkinghoff-Borg, M.H. Jensen, Phys. Rev. E 67 (2003) 021805.
[33] D. Marenduzzo, A. Flammini, A. Trovato, J.R. Banavar, A. Maritan, J. Polym. Sci. Pol. Phys. 43 (2005) 650.
[34] M. Doi, S.F. Edwards, The Theory of Polymer Dynamics, Clarendon Press, New York, 1993.
[35] O. Gonzalez, J.H. Maddocks, Proc. Natl. Acad. Sci. USA 96 (1999) 4769.
[36] A. Maritan, C. Micheletti, A. Trovato, J.R. Banavar, Nature 406 (2000) 287.
[37] J.R. Banavar, A. Maritan, C. Micheletti, A. Trovato, Proteins 47 (2002) 315.
[38] J.R. Banavar, A. Flammini, D. Marenduzzo, A. Maritan, A. Trovato, J. Polym. Sci. Pol. Phys. 43 (2005) 650.
[39] N.C. Fitzkee, P.J. Fleming, G.D. Rose, Proteins 58 (2005) 852.
[40] D.J. Hill, M.J. Mio, R.B. Prince, T.S. Hughes, J.S. Moore, Chem. Rev. 101 (2001) 3893.
[41] J.R. Banavar, et al., Phys. Rev. E 73 (2006) 031921.
[42] J.R. Banavar, T.X. Hoang, A. Maritan, F. Seno, A. Trovato, Phys. Rev. E 70 (2004) 041905.
[43] S. Doniach, T. Garel, H. Orland, J. Chem. Phys. 105 (1996) 1601.
[44] U. Bastolla, P. Grassberger, J. Stat. Phys. 89 (1997) 1061.
[45] S. Lise, A. Maritan, A. Pelizzola, Phys. Rev. E 58 (1998) R5241.
[46] E. Pitard, T. Garel, H. Orland, J. Phys. I 7 (1997) 1201.
[47] J. Borg, M.H. Jensen, K. Sneppen, G. Tiana, Phys. Rev. Lett. 86 (2001) 1031.
[48] C. Novak, V.G. Rostiashvili, T.A. Vilgis, Europhys. Lett. 74 (2006) 76.
[49] T.X. Hoang, A. Trovato, F. Seno, J.R. Banavar, A. Maritan, Proc. Natl. Acad. Sci. USA 101 (2004) 7960.
[50] A. Kolinski, Acta Biochim. Pol. 51 (2004) 349.
[51] T.X. Hoang, A. Trovato, F. Seno, J.R. Banavar, A. Maritan, J. Phys. Condens. Mater. 18 (2006) S297.
[52] T. Head-Gordon, F.H. Stillinger, M.H. Wright, D.M. Gay, Proc. Natl. Acad. Sci. USA 89 (1992) 11513.
[53] M.H. Hao, S. Rackovsky, A. Liwo, R.M. Pincus, H.A. Scheraga, Proc. Natl. Acad. Sci. USA 89 (1992) 6614.
[54] Y. Zhang, I.A. Hubner, A.K. Arakaki, E. Shakhnovich, J. Skolnick, Proc. Natl. Acad. Sci. USA 103 (2006) 2605.
[55] T.X. Hoang, L. Marsella, A. Trovato, F. Seno, J.R. Banavar, A. Maritan, Proc. Natl. Acad. Sci. USA 103 (2006) 6883.
[56] K.F. Lau, K.A. Dill, Macromolecules 22 (1989) 3986.
[57] S. Kamtekar, J.M. Schiffer, H.J. Xiong, J.M. Babik, M.H. Hecht, Science 262 (1993) 1680.
[58] T.X. Hoang, A. Trovato, F. Seno, J.R. Banavar, A. Maritan, Biophys. Chem. 115 (2005) 289.
[59] P.E. Wright, H.J. Dyson, J. Mol. Biol. 293 (1999) 7960.
[60] H.J. Dyson, P.E. Wright, Curr. Opin. Struct. Biol. 12 (2002) 54.
[61] F. Chiti, C.M. Dobson, Annu. Rev. Biochem. 75 (2006) 333.
[62] A. Trovato, F. Chiti, A. Maritan, F. Seno, PLoS Comput. Biol. 2 (2006) 1608.
[63] A. Trovato, T.X. Hoang, J.R. Banavar, A. Maritan, F. Seno, J. Phys. Condens. Mater. 17 (2005) S1515.
[64] S.C.E. Tosatto, J. Comput. Biol. 12 (2005) 1316.
[65] T. Hamelryck, J.T. Kent, A. Krogh, PLoS Comput. Biol. 2 (2006) 1121.

You might also like