Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Edward N. Trifonov and Igor N. Berezovsky- Evolutionary Aspects of Protein Structure and Folding

Edward N. Trifonov and Igor N. Berezovsky- Evolutionary Aspects of Protein Structure and Folding

Ratings: (0)|Views: 19|Likes:
Published by Lokosoo

More info:

Published by: Lokosoo on Dec 17, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/03/2013

pdf

text

original

 
Evolutionary aspects of protein structure and folding
Edward N Trifonov
Ã
and Igor N Berezovsky
y
The traditional reconstruction of molecular events of the pastbased on sequence conservation becomes very vague beyondonetotwobillionyearsago.Therearecertainmolecularfeatures,however, such as polymer ¯exibility and loop closure, that areconserved merely because of their physical nature. This allowsone to penetrate the earliest stages of protein evolution.
 Addresses
Ã
Genome Diversity Center, Institute of Evolution, University of Haifa,Haifa 31905, Israele-mail: trifonov@research.haifa.ac.il
y
Department of Structural Biology, The Weizmann Institute of Science,POB 26, Rehovot 76100, Israel; e-mail: Igor.Berezovsky@weizmann.ac.ilCorrespondence: Edward N Trifonov
Current Opinion in Structural Biology 
2003,
13
:110±114This review comes from a themed issue onFolding and bindingEdited by Jane Clarke and Gideon Schreiber0959-440X/03/$ ± see front matter
ß
2003 Elsevier Science Ltd. All rights reserved.
DOI 10.1016/S0959-440X(03)00005-8
Introduction
There are many important aspects of protein evolutionand folding[1±4,5
],each of them deserving athorough review[11,12
,15,16]. In this paper, wefocus on the earliest stages of protein evolution[17

]andtheir impact on the structure and folding of contemporaryproteins. Biological systems are believed to have evolveden route from simple to complex, from small to large,guided by a multitude of laws of Nature. As both nucleicacids and proteins are polymers, they obey the laws of polymer physics. This generally neglected and recentlyrevived association has challenged the very basics of ourunderstanding of protein structure and evolution.From the perspective of polymer statistics, every polymerchain may occasionally return to itself. That is, somepoints of the free chain trajectory may come within ashort reach of one another, forming a closed loop. Theclosed loops have a typical size characteristic of a giventype of polymer. The more ¯exible the chain, the smallerthe loops. Accordingly, the polypeptide chains of globularproteins contain numerous closed loops, with a contourlength of10±50 residues, dependingon whether the loopsare structured or unstructured. The majority of the loopscomprise 25±35 amino acid residues. The same laws of chain statistics apply to DNA molecules; the optimalDNA loop (ring) has a contour length of 300±600 basepairs. The DNA molecules are substantially more rigidthan polypeptides, which explains the larger contourlength of the DNA loops (rings). These rings wouldencode a protein comprising 100±200 amino acid resi-dues, which is typical of modern protein folds.In this review, we focus on the implications of the abovepolymer-statistical considerations for protein structure,evolution and folding.
Length increments in protein evolution
An evolving protein chain may grow by increments of one,severalormanyresiduesinsertedintothechainoraddedtoits ends[16].There are many molecular mechanisms by whichthesechangescanbebroughtabout.Webelievethatproteins (and their respective genes) have passed throughseveralevolutionarystages,eachwithitsowncharacteristicincrement. The existence or nonexistence of such char-acteristic size increments had, unfortunately, never beenafocus of studies on protein evolution, which consequentlywas not considered to be a process with distinct steps orstages. However, one candidate unit size increment wasdetected as early as in 1929 by Svedberg in his ultracen-trifugation experiments. He wrote: ``The proteins
. . .
can,with regard to molecular weight, be divided into foursubgroups
. . .
. The molecular masses characteristic of thethree higher sub-groups are - as a ®rst approximation -derived from molecular mass of the ®rst sub-group bymultiplying by the integers two, three,
. . .
''[18]. The ®rstestimate of this size increment, also by Svedberg, wasabout 160 amino acid residues. This is within range of recent estimates of protein domain sizes[19±21], 100±200residues, irrespective of the type of fold (domain). Appar-ently,theobservationofSvedbergre¯ectsoneofthelateststages of protein evolution (see below) Ð formation of multidomain protein structures. Another distinct sizeincrement range, 25±35 residues, the contour length of the closed loops in proteins, as mentioned above, was ®rstdetected only very recently[22

,23±26]. The structuralsigni®cance and evolutionary implications of these twoscale units of protein size are discussed below.
Flexibility of polymer chains and loop closure
A freely suspended ¯uctuating polymer chain may adoptnumerous conformations in space. Its path may changedirection at any point, so that, after some number of monomer steps, the ¯exible chain looses its originalorientation. The ¯exibility is measured by the so-calledpersistence length (
a
), such that the average directioncosine drops
times after passing this length[27]. Forexample, according to experimental estimates by Flory[28],for mixed unstructured polypeptide chains
a
4±5
110
Current Opinion in Structural Biolog
2003,
13
:110±114 www.current-opinion.com
 
monomer units (amino acid residues), whereas for morerigid molecules of double-helical DNA
a
$
150 mono-mer units (base pairs)[29]. A related measure of thepolymer ¯exibility is the contour length of the closedloops occasionally formed by the free polymer chain.According to theoretical estimates[30], the optimal loopsize is proportional to the persistence length and is about3.5
a
. The advantage of this particular measure of polymer¯exibility is that it links ¯exibility with a distinct andobservable structural element of the polymer (a closedloop of certain frequent size), whereas the persistencelength has no direct structural connotation. The 3.5
a
loops in mixed unstructured polypeptide chains have acontour length of 10±25 amino acid residues, which, aftercorrection for the presence of rigid
a
helices, increases to20±50 residues[22

]. An estimate of the minimal contourlength for polypeptide chain closed loops (13 amino acidresidues) was made by de Gennes[31]in calculationsmotivated by Monod, who expressed interest in thoseloopsthatformtheactivecentersofproteins.Theoptimalclosed loop contour length for DNA is about 525 basepairs (3
:
5
Â
150).Experimentally,this value was obtainedby DNA circularization measurements in 1981[32],whereas the 3.5
a
loops in proteins have been discoveredonly very recently[22

]. Remarkably, both DNA andproteins, as different as they are, possess the property of loop closure, with inevitable and rather straightforwardstructural and functional consequences.
Closed loops in globular proteins
Analysis of the closed loops in crystallized protein struc-tures reveals that the contour length of 20±50 residues isdominant, with the majority at 25±30 residues[22

].Independent studies by Shakhnovich and co-workers[7],and Lamarine
et al.
[33]con®rm this speci®c sizerange. It also turns out that these loops are formed oneafter another all the way along the protein chain andappear as linear nonoverlapping combinations[22

]. InFigure 1, the trefoil fold (PDB code 1i1b) is displayed in adisassembled form, such that all the closed loops of whichitisformedareshownseparately.Allothermajorfoldscanbe represented in this way[22

,23]. Could this strikingarrangement be a re¯ection of possible selection pres-sures to maintain the optimal loop fold structure? Evi-dence of this has been revealed by protein sequenceanalysis[24,25].Hydrophobic amino acids in prokaryotic protein sequences show a distinct tendency to follow oneanother at the same separation of about 25±30 residues.This evidence in favor of speci®c selection suggests thatglobular proteins are linear assemblies of descendants of hypothetical ancestral loop elements.In the early evolution of proteins, when polypeptidechains became suf®ciently long, the closed loops musthave been formed with the immediate advantages of conformational certainty and stability. Such stable indi-vidual loop-like molecules of 20±35 residues do occurnaturally[34]and can be readily designed[34,35]. The stability of the early loops would have been enforced byselection for tighter end-to-end contacts, primarily vander Waals interactions[36
]. This is re¯ected in thehigher frequency of bulky[37
]and hydrophobic residues[24,33]at the loop ends. Electrostatic contacts would befrequently disrupted by competing interactions withwater molecules[38]. The closed loop and the end-to-end van der Waals contacts make a loop-n-lock element[36
].Figure 2illustrates the loop-n-lock structure of the TIM-barrel fold.Guided by the above evolutionary scenario, one mayattempt to ®nd traces of the sequence segments corre-sponding to the hypothetical ancestral loop-like ele-ments. Indeed, several such sequence prototypes of 25±35residueshaverecently been derived fromcompletebacterial proteomes[26]. Each of them represents a largegroup of related sequence fragments, which may belongto rather different protein families. Moreover, most suchfragmentsdetectedinthesequences oftheproteincrystaldatabase (PDB) appear as closed loops. The biologicaland evolutionary relevance of the units of 25±30 aminoacid residues ®nds some support in independent studies.This size range is indicated in the log-log plot of thelength distribution of insertions and deletions in proteins[39]. Analysis of the positions of hot spots for the recom-binatorial swapping of protein sequence segments sug-gests a size of 20±30 amino acid residues for `schemas',
Figure 1
Representation of the trefoil fold (PDB code 1i1b) in a disassembledform, such that all the closed loops of which it is formed are shownseparately. Most of the polypeptide chain is involved in the formation of closed loops of between 15 and 37 residues, which are connected byshort linkers of 2±12 residues (gray). The coordinates (residue numbers)of the closed loops are: 16±30, red; 42±62, green; 70±99, blue;101±115, orange; and 120±136, magenta. The loop ends weredetermined on the basis of close C
a
±C
a
contacts, as in[22

].Reproduced with permission from[44].
Evolutionary aspects of protein structure and folding
Trifonov and Berezovsky 111
www.current-opinion.com
Current Opinion in Structural Biolog
2003,
13
:110±114
 
apparent protein building blocks[40

]. Ancestral exons[41], which correlate with centripetal modules[42], have also been estimated to be 15±30 residues long[41,42].The geometrical technique of Voronoi tessellation[43]reveals that, in three dimensions, the number of neigh-bors as a function of sequence distance between themreaches a maximum at about 27 amino acid residues.Remarkably, the very different approaches describedabove point to the same unit size.The discovery of universal closed loops in proteins andsubsequent developments rapidly unfold like ¯oweringbuds, which is also an appealing image for globularproteins themselves (seeFigures 1 and 2), as seen fromthis novel viewpoint[44].
Folds
In the chain return scenario for DNA, loops of optimalsize are formed by covalent closure. The advantages of circular DNA molecules compared to linear ones arestability, protection of the gene ends, and continuity of replication and transcription processes. DNA circulariza-tion[32]revealed the optimal ring closure size as a broadmaximuminthecircularizationef®ciencyat300±600basepairs. The same maximum is also seen in recombinationexperiments with bacterial insertion sequences[45].This size of the presumed early genes at the ring closure stageof evolution also scales the respective proteins to 100±200aminoacidresidues.Remarkably,size-wise,thisperfectly®ts to protein folds (domains)[19±21].Another reason for protein domains being this size is the possible conse-quence of the optimal surface/volume ratio of hydrophilicand hydrophobic residues in globular proteins[46]. Inlandmark work[46], the most basic principles of globularprotein structure were formulated for the ®rst time: the``minimum condition'' for the stable globular protein wasintroduced Ð the hydrophobic nucleus should be cov-ered by the hydrophilic envelope; van der Waals inter-actionsarethemajorforcesforglobular proteinformation;and the ``sharply limited size'' of the stable globularprotein is predicted and estimated (about 130 residues)on the basis of the hydrophobic/hydrophilic balance.Irrespective of the reasons for the domain size optimum,it, apparently, could not increase further. Thus, largerproteins can only be formed by connecting domains of optimal size in one polypeptide chain, originally viarecombinatorial fusion of the DNA rings[47]and laterby excision/reinsertion events[16].
Folding
There are several possible mechanisms of protein foldingcurrently under theoretical and experimental assessment,notablyframeworkandnucleation-condensationmechan-isms, and possibly a uni®ed scenario. ``Whatever thedistinctions of names, stable tertiary and secondary inter-actions must form concurrently''[48].What is the role of  the loop-like elements in the folding process? A possibleconnection could be their role as elementary foldingunits. The notion of independently folding elementaryunits is an important concept for the protein foldingproblem[49±52].They may or may not remain intact once formed, and their fate during protein folding may berather diverse[1,2,12
,48]. With their distinct features,namely the size range and chain return property[22

,44], the closed loops may well serve in thatcapacity[24,53
], offering an elementary act of folding.The loop fold nature of the protein as a concept is onlyentering the ®eld of protein folding. The implications of the concept for protein structure and folding, as well asthe role the closed loops may play in folding, remain to beelucidated experimentally.
Conclusions
It would be only fair to give physics its decisive role inestablishing basic structural elements of proteins ± closedloops and folds. The process of protein evolution, in its
Figure 2
Loop-n-lock structure of the TIM-barrel fold (PDB code 4tim). Thepolypeptide chain comprises eight closed loops Ð loop-n-lockelements. The locks closing the loops form a donut-like core of theprotein (yellow space-filled residues). The coordinates (residuenumbers) of the closed loops (counterclockwise, starting from rightmiddle) are: 9±40, red; 41±62, green; 62±90, orange; 95±126, blue;128±166, red; 165±209, cyan; 209±231, magenta; and 230±249, orange.The loop ends were determined on the basis of close C
a
±C
a
contacts[22
].
112
Folding and binding
Current Opinion in Structural Biolog
2003,
13
:110±114 www.current-opinion.com

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->