Evolutionary aspects of protein structure and folding
Edward N Trifonov
and Igor N Berezovsky
The traditional reconstruction of molecular events of the pastbased on sequence conservation becomes very vague beyondonetotwobillionyearsago.Therearecertainmolecularfeatures,however, such as polymer ¯exibility and loop closure, that areconserved merely because of their physical nature. This allowsone to penetrate the earliest stages of protein evolution.
Genome Diversity Center, Institute of Evolution, University of Haifa,Haifa 31905, Israele-mail: email@example.com
Department of Structural Biology, The Weizmann Institute of Science,POB 26, Rehovot 76100, Israel; e-mail: Igor.Berezovsky@weizmann.ac.ilCorrespondence: Edward N Trifonov
Current Opinion in Structural Biology
:110±114This review comes from a themed issue onFolding and bindingEdited by Jane Clarke and Gideon Schreiber0959-440X/03/$ ± see front matter
2003 Elsevier Science Ltd. All rights reserved.
There are many important aspects of protein evolutionand folding[1±4,5
],each of them deserving athorough review[11,12
,15,16]. In this paper, wefocus on the earliest stages of protein evolution[17
]andtheir impact on the structure and folding of contemporaryproteins. Biological systems are believed to have evolveden route from simple to complex, from small to large,guided by a multitude of laws of Nature. As both nucleicacids and proteins are polymers, they obey the laws of polymer physics. This generally neglected and recentlyrevived association has challenged the very basics of ourunderstanding of protein structure and evolution.From the perspective of polymer statistics, every polymerchain may occasionally return to itself. That is, somepoints of the free chain trajectory may come within ashort reach of one another, forming a closed loop. Theclosed loops have a typical size characteristic of a giventype of polymer. The more ¯exible the chain, the smallerthe loops. Accordingly, the polypeptide chains of globularproteins contain numerous closed loops, with a contourlength of10±50 residues, dependingon whether the loopsare structured or unstructured. The majority of the loopscomprise 25±35 amino acid residues. The same laws of chain statistics apply to DNA molecules; the optimalDNA loop (ring) has a contour length of 300±600 basepairs. The DNA molecules are substantially more rigidthan polypeptides, which explains the larger contourlength of the DNA loops (rings). These rings wouldencode a protein comprising 100±200 amino acid resi-dues, which is typical of modern protein folds.In this review, we focus on the implications of the abovepolymer-statistical considerations for protein structure,evolution and folding.
Length increments in protein evolution
An evolving protein chain may grow by increments of one,severalormanyresiduesinsertedintothechainoraddedtoits ends.There are many molecular mechanisms by
whichthesechangescanbebroughtabout.Webelievethatproteins (and their respective genes) have passed throughseveralevolutionarystages,eachwithitsowncharacteristicincrement. The existence or nonexistence of such char-acteristic size increments had, unfortunately, never beenafocus of studies on protein evolution, which consequentlywas not considered to be a process with distinct steps orstages. However, one candidate unit size increment wasdetected as early as in 1929 by Svedberg in his ultracen-trifugation experiments. He wrote: ``The proteins
. . .
can,with regard to molecular weight, be divided into foursubgroups
. . .
. The molecular masses characteristic of thethree higher sub-groups are - as a ®rst approximation -derived from molecular mass of the ®rst sub-group bymultiplying by the integers two, three,
. . .
''. The ®rstestimate of this size increment, also by Svedberg, wasabout 160 amino acid residues. This is within range of recent estimates of protein domain sizes[19±21], 100±200residues, irrespective of the type of fold (domain). Appar-ently,theobservationofSvedbergre¯ectsoneofthelateststages of protein evolution (see below) Ð formation of multidomain protein structures. Another distinct sizeincrement range, 25±35 residues, the contour length of the closed loops in proteins, as mentioned above, was ®rstdetected only very recently[22
,23±26]. The structuralsigni®cance and evolutionary implications of these twoscale units of protein size are discussed below.
Flexibility of polymer chains and loop closure
A freely suspended ¯uctuating polymer chain may adoptnumerous conformations in space. Its path may changedirection at any point, so that, after some number of monomer steps, the ¯exible chain looses its originalorientation. The ¯exibility is measured by the so-calledpersistence length (
), such that the average directioncosine drops
times after passing this length. Forexample, according to experimental estimates by Flory,for mixed unstructured polypeptide chains
Current Opinion in Structural Biology