This action might not be possible to undo. Are you sure you want to continue?
J. Mol. Biol. (2002) 315, 479±484
Why Are Proteins So Robust To Site Mutations?
Darin M. Taverna1 and Richard A. Goldstein2*
Biophysics Research Division and Department of Chemistry University of Michigan, Ann Arbor, MI 48109-1055, USA
There have been repeated observations that proteins are surprisingly robust to site mutations, enduring signi®cant numbers of substitutions with little change in structure, stability, or function. These results are almost paradoxical in light of what is known about random heteropolymers and the sensitivity of their properties to seemingly trivial mutations. To address this discrepancy, the preservation of biological protein properties in the presence of mutation has been interpreted as indicating the independence of selective pressure on such properties. Such results also lead to the prediction that de novo protein design should be relatively easy, in contrast to what is observed. Here, we use a computational model with lattice proteins to demonstrate how this robustness can result from population dynamics during the evolutionary process. As a result, sequence plasticity may be a characteristic of evolutionarily derived proteins and not necessarily a property of designed proteins. This suggests that this robustness must be re-interpreted in evolutionary terms, and has consequences for our understanding of both in vivo and in vitro protein evolution.
# 2002 Academic Press
Keywords: site substitution; mutagenesis; molecular evaluation; protein stability; protein folding
There has been much interest in probing the relationship between a protein's sequence and its resulting structural, thermodynamic, and functional properties. It is hoped that insights resulting from these pursuits will lead to the ability to predict protein properties based on sequence information as well as how these properties could be altered by changes in the sequence. Such insights are also crucial in developing the ability to design proteins with prescribed or altered structures, stabilities, and functionalities. One of the major methods of investigating the relationship of protein sequence to the corresponding properties is to alter naturally occuring proteins through site mutagenesis. Often the substitution is chosen so as to modify a speci®c interaction, although more exhaustive and random substitutions have been studied. One of the surprising results of such studies is the robustness of protein properties to mutations. Although most
Present address: D. Taverna, Protein Pathways, Inc., 1145 Gayley Ave., Suite 304, Los Angeles, CA 90024, USA. E-mail address of the corresponding author: email@example.com
site substitutions are destabilizing, many result in essentially unchanged stabilities, and a signi®cant fraction of mutations actually result in increased stability over the wild type (e.g. see Reddy et al.1). The conclusion drawn from these studies is that there is an inherent robustness to the mapping of sequence to structure, and that sequence space consists of large regions of possible sequences corresponding to proteins with essentially equivalent properties. The general level of sequence plasticity has also led researchers to conclude that the robust properties must not be under active selection during evolution.2 This plasticity provides optimism for de novo protein design, in that it indicates that there are large numbers of amino acid sequences consistent with a given stable structure; the ability of a protein to fold despite changed interactions means that the interactions do not have to be formulated precisely in advance. Protein design may correspond to ®nding a needle in a haystack, but at least it is a quite sizable needle. In contrast, de novo engineering of well-packed proteins has proven ``surprisingly dif®cult''.3 Why does the sequence plasticity observed in sitedirected mutagenesis not translate into ease in protein engineering? Perhaps we are interpreting the results of these mutagenesis experiments in the wrong context. Proteins are the result of a long
# 2002 Academic Press
480 evolutionary process, involving the dynamics of a population cluster, or pseudospecies, in sequence space. During the past few years we (among others) have looked at how population-based evolutionary dynamics can alter the structures, functions, and thermodynamics of proteins and other biological macromolecules.4 ± 10 Here, we use a simple computational model to demonstrate how population dynamics can explain why proteins are so robust to changes in sequence. We describe simulations of the evolution of lattice proteins, using two different models. In one model, a single sequence performs a random walk in the space of all suf®ciently stable sequences. In the second model, we represent the evolution of a population of lattice proteins as they undergo random mutagenesis, reproduction, and death. Proteins that emerge during the population simulations have a robustness to sequence change similar to that found in biological proteins. In contrast, proteins derived from the single-sequence random walk, even with the same structure and stability, are extremely fragile to sequence changes. The evolution of robust sequences proceeds without any explicit evolutionary pressure for robustness, but rather results directly from the population dynamics. In contrast, de novo sequence design must deal with the more random set of possible sequences. These results suggest that the observed sequence plasticity of biological proteins may occur because proteins have evolved to be robust to these speci®c experiments. If so, we may need to revise the conclusions based on these observations.
Why Are Proteins So Robust To Site Mutations?
We use a computational model of proteins consisting of 25 residues con®ned to a maximally compact, two-dimensional lattice. In addition to studying the properties of protein sequences chosen at random, we implement two different dynamic models of sequence change. In the ®rst model, we allow a single sequence to perform a random walk in sequence space among all viable sequences. Sequences are considered viable if the native state of the protein remains ®xed and if it remains suf®ciently stable, that is, with a free energy of folding, ÁGfolding, smaller than some ®xed parameter, ÁGcrit. In the second model, we consider a population of such proteins where the sequences undergo random mutagenesis, death, and random reproduction, again with the constraint that all proteins must fold into a constant native state and remain suf®ciently stable in order to survive to the next generation. Simulations are performed for both models ®ve times for each of ®ve values of ÁGcrit (0.0, À0.5, À1.0, À1.5, À2.0). The robustness of the resulting sequences to mutation is monitored by recording the dependence of the change in stability (ÁÁGmut) as a function of the protein's stability prior to the mutation (ÁGwt).
In Figure 1, we compare the destabilizing probability for a mutation (the probability that ÁÁGmut > 0) made in a sequence chosen at random with the destabilizing probability of a mutation in a sequence resulting from population evolution (with ÁGcrit 0.0) as a function of ÁGwt. As can be seen, the sequences resulting from the population evolution have a much smaller probability of having a destabilizing mutation compared to random sequences with identical initial stabilities. Figure 2 shows the resulting distributions of ÁÁGmut for the two different types of evolutionary simulations for three different values of ÁGcrit. In the case of sequences performing a random walk, most of the mutations lead to reduced stabilities close to the average of the random sequence distribution; only 0.04 % to 0.4 % of mutations resulting in increased stability (depending upon the value of ÁGcrit). Conversely, in the case of the population trials, there is an appreciable probability (18 % to 28 %) of a mutation being stabilizing with the most likely change in stability near zero, especially for highly stable proteins with large negative values of ÁGcrit. Since our population evolution sequences are mutated with a Poisson distribution, we have a decreasing probability that there will be multiple mutations of a particular sequence in the population. Figure 3 presents the average probability that a multiple mutation will be destabilizing compared with single mutations, under the constraints of a speci®ed ÁGcrit. Note that even signi®cant sequence changes (three out of 25 residues) in the proteins resulting from population evolution have a non-negligible chance of resulting in increased stabilization.
It is not surprising that most mutations in biological proteins result in decreased protein stab-
Figure 1. Probability of a destabilizing mutation P(ÁÁGmut > 0)) from sequences resulting from population evolution with ÁGcrit 0 (continuous line) is compared with random sequences (broken line), as a function of the original stability of the unmutated protein ÁGwt. The destabilization probability for stable random sequences (with ÁGwt < 0) is close to unity.
Why Are Proteins So Robust To Site Mutations?
Figure 2. Density distribution of ÁÁGmut from model proteins undergoing population (continuous line) and single sequence evolution (broken line), for various values of ÁGcrit.
Figure 3. Probability of destabilizing mutation from model proteins undergoing population evolution, according to the number of point mutations, as a function of ÁGcrit (thin continuous line). The average rate for all mutations (thick continuous line) and the high rate of destabilizing mutations for single sequence evolution (broken line) are included for comparison.
ility. If we consider the fact that most random sequences of amino acids do not have a stable folded state, then any mutation in one of the few viable sequences with a stable ground state would most likely move the stability in the direction of the more random sequences; that is, towards being less stable. The surprise is rather that experimental mutations have a signi®cant probability in resulting in unchanged or increased stability. The exact percentages of mutations that are stabilizing vary according to the protein and the nature of the substitutions, ranging from approximately 8 % in mutations of barnase11 and staphylococcal nuclease12 ± 14 designed to eliminate speci®c interactions, to 17 % in interior locations of myoglobin,15 20 % of non-Ala locations in Arc repressor,16 and 29 % for two speci®c solvent-exposed locations in phage T4 lysozyme.17 A more comprehensive set of 356 sitedirected mutations compiled from the literature by Reddy and co-workers showed that 25 % of the mutations increased protein stability.1 While the speci®cs vary, these results are suf®ciently consistent to conclude that robustness would seem to be a general characteristic of systems that have come into being through the Darwinian evolution of populations. The range of these experimental results is close to the 18 % to 28 % (depending upon ÁGcrit) we observed for our lattice proteins evolving through population dynamics, but far from the 0.04 % to 0.4 % observed for random-walk sequences with comparable ÁGcrit. Note that this robustness occurs in the absence of any selective pressure towards robustness. The various sequences in the model with ÁGfolding < ÁGcrit have equal ®tness and equal probability of contributing to the next generation. Robustness towards mutations is just one of a number of properties that emerge from neutral evolution in sequence space, as has been emphasized by a number of authors.4,5,7,10,18 ± 24
So how can evolution, where robustness is not an explicit selection criterion, result in such unexpectedly robust proteins? Insight into this phenomenon comes from the pioneering work by Eigen.25 In analytical studies of RNA evolution, he found that evolution selected for a network of genotypes, what he called quasispecies. The relevant ®tness of the quasispecies is a function of the ®tness of all of the genotypes, so the population of any one genotype would be enhanced by being surrounded by ®t neighbors. This effect depends on the possibility of back-mutations: if one genotype contributes to a neighboring genotype in one generation, there is a probability that the neighboring genotype will return the favor in a future generation. Through this mechanism, population dynamics result in an evolutionary selection of genotypes biased by the ®tness of their neighbors; that is, on their robustness to mutations. In the sense of ®tness landscapes, nature may choose broad ®tness plateaus of well-connected neighbors even in the presence of higher, yet poorly connected ®tness peaks. This evolutionary heritage is encoded in the genotype, resulting in a sequence plasticity that distinguishes these sequences from random sequences chosen to have the same phenotype. Bornberg-Bauer & Chan, for instance, found that evolutionary dynamics would result in a bias in the population towards ``prototype'' sequences with the maximum number of ``neutral neighbors''.6 The work described here concentrates on protein stability, but it should be true of any protein property that is important for survival of the organism. This evolutionary trend towards robustness may be a general characteristic of biological systems.26,27 There are a number of important consequences of this effect. Firstly, the lessons of sequence plasticity in biological proteins may be inapplicable to arti®cally designed proteins. It may be necessary to
482 have a de novo sequence exquisitely designed to have properties similar to biological proteins. This also suggests that taking advantage of the observed robustness by modifying existing proteins may be a more effective route. Alternatively, in vitro evolution studies may provide proteins with the same degree of sequence plasticity as natural proteins. More optimistically, proteins may have compromised possible interactions and properties in developing this robustness, which suggests that more effective if less robust proteins may be available. In addition, these results suggest that the observed sequence plasticity may have nonobvious consequences for our understanding of proteins and their evolution. For instance, Baker and co-workers observed that sequence changes in the IgG binding domain of protein L often resulted in proteins that folded faster than the wild-type protein, and concluded that this indicates that the folding rate is not under strong selective pressure.2 The model presented here results in the opposite conclusion, that properties of the protein under stronger selective pressure are more likely to be ``buffered'' and thus robust to mutations. In other words, robustness to site mutations would paradoxically be an indication of stronger selective pressure on these characteristics. Finally, we note that there is growing interest in the relationship between robustness and evolvability; that is, between the ability to buffer genotypic variations and the ability of an organism to modify to new situations and environments.28 If so, the tendency of population dynamics to increase sequence plasticity might have had signi®cant impact on the evolutionary process, including the development of new functionalities of existent proteins.
Why Are Proteins So Robust To Site Mutations? two-dimensional model to provide a more realistic ratio of buried to exposed sites. We assume that the energies of any sequence in conformation k is given by a simple contact energy of the form: k g
ei ej Uij
Here, Uk is equal to 1 if residues i and j are not covaij lently connected but are on adjacent lattice sites in conformation k, and g ei ej is the contact energy between amino acid ei at location i and ej at location j in the sequence. We use the contact energies derived by Miyazawa & Jernigan based on a statistical analysis of the database of known proteins that implicitly includes the effect of interactions of the protein with the solvent.31 In our simpli®ed proteins, there are 132 pairs of residues that can possibly come into contact, with 16 of these contacts present in any given compact structure. Using equation (1) we can calculate the energy of a given protein sequence in all 1081 possible conformations. We make the assumption that the thermodynamic hypothesis is obeyed and that the lowest-energy structure is the native state;32 the other 1080 possible structures represent the ensemble of unfolded states. Not all possible protein sequences are viable. In general, a protein must ful®l a number of conditions relating to stability, functionality, and foldability. Here, we concentrate on stability. For each sequence, we calculate the free energy of folding: ÁGfolding Ef kT ln Z À exp ÀEf akT 2
We consider a highly simpli®ed representation of evolving proteins. Our model proteins consist of chains of n 25 monomers, con®ned to a 5 Â 5 two-dimensional, maximally compact square lattice with each monomer located at one lattice point. This provides us with 1081 possible conformations represented by the 1081 self-avoiding walks on this lattice, neglecting structures related by rotation, re¯ection, or inversion. The non-compact states were neglected in order to allow for a reasonable number of stable sequences. Alternatively, we would expect the non-compact states to be neglectible as long as the contact energies were suf®ciently attractive. The fact that most protein structures are reasonably compact makes this assumption not too unreasonable. There are important differences between the two-dimensional and three-dimensional models, especially in folding simulations where the two-dimensional conformation space may not be ergodic.29,30 While these limitations are critical in folding simulations, we are more interested in the mapping of sequence to structure rather than how the sequence folds to that given structure; the thermodynamic properties described below involve sums over states and should be less affected by the dimensionality of the model. We use the
where Z is the partition function. (For the MiyazawaJernigan potential, we use kT 0.6.) We consider a sequence as representing a viable protein as long as its ÁGfolding is less than some speci®ed ÁGcrit. We implement two different dynamic models of sequence change. In the ®rst model, we choose a sequence at random and make point mutations until we arrive at a suf®ciently stable protein sequence. Starting with this initial stable sequence, residue positions are randomly mutated with the number of mutations chosen from a Poisson distribution with an average of 0.002 mutations per amino acid residue per generation. With this low mutation rate, multiple mutations are rare (the ratio of single mutants to multiple mutants is 200). We calculate the stability of the new sequence; if ÁGfolding is larger than ÁGcrit or the structure has changed, the mutation is rejected and the original sequence retained. Generations where no mutations occurred are not counted. This allows the single sequence to diffuse randomly over the range of acceptable sequences, analogous to random-walk models in which a particle has average zero velocity when a boundary is encountered. This is done ®ve times for each of ®ve values of ÁGcrit (0.0, À0.5, À1.0, À1.5, À2.0). Sequences that arise during these runs are probed for robustness to mutations. We make mutations in the sequence with a Poisson distribution with mean 0.002 mutation per amino acid residue, maintaining a constant rare rate of multiple mutations. We then calculate the probability that a mutation results in a given change in stability (ÁÁGmut) as a function of the stability prior to the mutation (ÁGwt). For the second model, we simulate the effect of population dynamics using an evolutionary scheme, using a method described elsewhere.8 We construct a population of N 3000 identical viable sequences. For each gener-
Why Are Proteins So Robust To Site Mutations? ation, each residue in the protein population has a probability of 0.002 to be mutated to another random residue; both the population size and mutation rate were chosen to be comparable to previous analytical models of evolution processes.33 ± 35 The stability of each protein in the population is then calculated. We use truncation selection where the NH sequences having ÁGfolding < ÁGcrit and a conserved native state structure are considered viable and capable of reproducing; the rest are removed from the population. The next generation of N sequences is chosen from the NH surviving sequences randomly with replacement, representing the stochastic process of reproduction. The population is ®rst allowed to preequilibrate for 30,000 generations. The evolutionary simulations are then continued for an additional 30,000 generations. In the subsequent 30,000 generations, we monitor the stability of the sequences (ÁGwt) as well as the changes in stability that occur with mutations (ÁÁGmut). We perform these calculations ®ve times for the same ÁGcrit constraints used for the single-sequence trials.
9. Williams, P. D., Pollock, D. D. & Goldstein, R. A. (2001). Evolution of functionality in lattice proteins. J. Mol. Graph. Mod. 19, 150-156. 10. Taverna, D. & Goldstein, R. A. (2001). Why are proteins marginally stable? Proteins: Struct. Funct. Genet. In the press. 11. Serrano, L. J. T., Kellis, J., Cann, P., Matouschek, A. & Fersht, A. R. (1992). The folding of an enzyme II: substructure of barnase and the contribution of different interactions to protein stability. J. Mol. Biol. 224, 783-804. 12. Shortle, D., Stites, W. E. & Meeker, A. K. (1990). Contributions of the large hydrophobic amino acids to the stability of staphylococcal nuclease. Biochemistry, 29, 8033-8041. 13. Green, S. M., Meeker, A. K. & Shortle, D. (1992). Contributions of the polar, uncharged amino acids to the stability of staphylococcal nuclease: evidence for mutational effects on the free energy of the denatured state. Biochemistry, 31, 5717-5728. 14. Meeker, A. K., Garcia-Moreno, B. & Shortle, D. (1996). Contributions of the ionizable amino acids to the stability of staphylococcal nuclease. Biochemistry, 35, 6443-6449. 15. Lin, L., Pinker, R. J. & Kallenbach, N. R. (1993). a-Helix stability and the native state of myoglobin. Biochemistry, 32, 12638-12643. 16. Milla, M. E., Brown, B. M. & Sauer, R. T. (1994). Protein stability effects of a complete set of alanine substitutions in arc repressor. Nature Struct. Biol. 1, 518-523. 17. Blaber, M., Zhang, X. J., Lindstrom, J. D., Pepiot, S. D., Baase, W. A. & Matthews, B. W. (1994). Determination of alpha-helix propensity within the context of a folded protein. Sites 44 and 131 in bacteriophage t4 lysozyme. J. Mol. Biol. 235, 600-624. 18. Lipman, D. J. & Wilbur, W. J. (1991). Modelling neutral and selective evolution of protein folding. Proc. Roy. Soc. London, 245, 7-11. 19. Schuster, P., Fontana, W., Stadler, P. F. & Hofacker, I. L. (1994). From sequences to shapes and back: a case study in RNA secondary structures. Proc. Roy. Soc. ser. B. 255, 279-284. 20. Bornberg-Bauer, E. (1997). How are model protein structures distributed in sequence space? Biophys. J. 73, 2393-2403. 21. Babajide, A., Hofacker, I. L., Sippl, M. J. & Stadler, P. F. (1997). Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. Fold. Des. 2, 261-269. 22. Bourdeau, V., Ferbeyre, G., Pageau, M., Paquin, B. & Cedergren, R. (1999). The distribution of RNA motifs in natural sequences. Nucl. Acids Res. 27, 4457-4467. 23. Forst, C. V. (2000). Molecular evolution of catalysis. J. Theor. Biol. 205, 409-431. 24. Reidys, C., Forst, C. V. & Schuster, P. (2001). Replication and mutation on neutral networks. Bull. Math. Biol. 63, 57-94. 25. Eigen, M. (1971). Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften, 10, 465-523. 26. van Nimwegen, E., Crutch®eld, J. P. & Huynen, M. (1999). Neutral evolution of mutational robustness. Proc. Natl Acad. Sci. USA, 96, 9716-9720. 27. Wilke, C. O., Wang, J. L., Ofria, C., Lenski, R. E. & Adami, C. (2001). Evolution of digital organisms at high mutation rates leads to survival of the ¯attest. Nature, 412, 331-333.
We thank Lee Altenberg, Nicolas Buchler, Matthew Dimmic, Walter Fontana, Luca Peliti, Kevin Plaxco, David Pollock, and Peter Wolynes for insights and helpful comments, and Matthew Dimmic, Bin Qian, and Todd Raeker for computational assistance. Financial support was provided by NIH grant numbers LM05770 and GM08270, and NSF shared equipment grant number BIR9512955.
1. Reddy, B. V. B., Datta, S. & Tiwari, S. (1998). Use of propensities of amino acids to the local structural environment to understand effect of substitution mutations on protein stability. Protein Eng. 11, 11371145. 2. Kim, D. E., Gu, H. & Baker, D. (1998). The sequences of small proteins are not extensively optimized for rapid folding by natural selection. Proc. Natl Acad. Sci. USA, 95, 4982-4986. 3. DeGrado, W. F., Summa, C. M., Pavone, V., Nastri, F. & Lombardi, A. (1999). De novo design and structural characterization of proteins and metalloproteins. Annu. Rev. Biochem. 68, 779-819. 4. Fontana, W. & Schuster, P. (1998). Continuity in evolution: on the nature of transitions. Science, 280, 1451-1455. 5. Bastolla, U., Roman, H. E. & Vendruscolo, M. (1999). Neutral evolution of model proteins: diffusion in sequence space and overdispersion. J. Theor. Biol. 200, 49-64. 6. Bornberg-Bauer, E. & Chan, H. S. (1999). Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc. Natl Acad. Sci. USA, 96, 10689-10694. 7. Ancel, L. W. & Fontana, W. (2000). Plasticity, evolvability and modularity in RNA. J. Expt. Zool. 288, 242-283. 8. Taverna, D. & Goldstein, R. A. (2000). The distribution of structures in evolving protein populations. Biopolymers, 53, 1-8.
28. Kirschner, M. & Gerhart, J. (1998). Evolvability. Proc. Natl Acad. Sci. USA, 95, 8420-8427. 29. Abkevich, A. I., Gutin, A. M. & Shakhnovich, E. I. (1995). Impact of local and non-local interactions on thermodynamics and kinetics of protein folding. J. Mol. Biol. 252, 460-471. 30. Pande, V. S., Grosberg, A. Y. & Tanaka, T. (1997). Statistical mechanics of simple models of protein folding and design. Biophys. J. 73, 3192-3210. 31. Miyazawa, S. & Jernigan, R. L. (1985). Estimation of effective interresidue contact energies from protein
Why Are Proteins So Robust To Site Mutations? crystal structures: quasi-chemical approximation. Macromolecules, 18, 534-552. Govindarajan, S. & Goldstein, R. A. (1998). On the thermodynamic hypothesis of protein folding. Proc. Natl Acad. Sci. USA, 95, 5545-5549. Kimura, M. (1979). The neutral theory of molecular evolution. Sci. Am. 241, 98-126. Ohta, T. (1987). Simulating evolution by gene duplication. Genetics, 115, 207-213. Ohta, T. (1988). Multigene and supergene families. Oxford Surv. Evol. Biol. 5, 41-65.
32. 33. 34. 35.
Edited by J. Thornton (Received 6 August 2001; received in revised form 22 October 2001; accepted 23 October 2001)
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.