Protein Chemistry
“to place before mankind the common sense of the subject,
in terms so plain and firm as to command their assent”
Thomas Jefferson
Letter to Henry Lee, 1825
Structure in
Protein Chemistry
Second Edition
Jack Kyte
Emeritus Professor of Chemistry
University of California at San Diego
Vice President Denise Schanck
Senior Editor Robert L. Rogers
Associate Editor Summers Scholl
Senior Publisher UK Jackie Harbor
Production Editor Simon Hill
Copyeditor Heather Whirlow Cammarn
Cover Designer Aktiv
Typesetter Phoenix Photosetting
Printer RR Donnelley
© 2007 by Garland Science, a member of the Taylor & Francis Group, LLC
This book contains information obtained from authentic and highly regarded
sources. Reprinted material is quoted with permission, and sources are indicated. A
wide variety of references are listed. Reasonable efforts have been made to publish
reliable data and information, but the author and the publisher cannot assume
responsibility for the validity of all materials or for the consequences of their use.
Published in 2007 by Garland Science, a member of the Taylor & Francis Group, LLC,
270 Madison Avenue, New York, NY 10016, USA and
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN, UK.
10 9 8 7 6 5 4 3 2 1
Table of Contents
Preface vii
Stereo Drawings xi
NEWT xiii
ExPASy xiii
1. Purification 1
Partition into Stationary Phases and Chromatography 2
Assay 13
Purification of a Protein 20
Molecular Charge 32
Electrophoresis 36
Criteria of Purity 45
Heterogeneity 47
Crystallization 49
2. Electronic Structure 55
p and s 55
Acids and Bases 62
Tautomers 69
Amino Acids 74
3. Sequences of Polymers 85
Sequencing of Polypeptides 85
Cloning, Sequencing, Expressing, and Mutating of Deoxyribonucleic Acids 95
Posttranslational Modification 113
Oligosaccharides of Glycoproteins 126
7. Evolution 345
Molecular Phylogeny from Amino Acid Sequence 346
Molecular Phylogeny from Tertiary Structure 362
Domains 376
Molecular Taxonomy 392
9. Symmetry 451
Rotational and Screw Axes of Symmetry 451
Space Groups 456
Oligomeric Proteins 466
Isometric Oligomeric Proteins 485
Helical Polymeric Proteins 499
Heterologous Oligomeric Proteins 508
Index 839
Preface
Structure in Protein Chemistry is designed for a senior tions actually mean. The abstract image also permits her
undergraduate or graduate course covering the struc- to understand more clearly the evolution of proteins, the
tures of proteins and biophysical chemistry. The course folding of proteins, and the assembly of oligomeric and
created by this textbook is intended to bridge the gap polymeric proteins. Consequently, crystallographic
between the research literature and the courses in intro- molecular models of proteins must be discussed as soon
ductory chemistry and biochemistry that the student has as possible and as comprehensibly as possible in any
already taken. There are suggested readings at the end of successful presentation of the biophysical chemistry of
each section. In these selected publications, the concepts proteins.
just discussed in that section are applied in an experi- Structure in Protein Chemistry begins with descrip-
mental setting. There are also more than 4800 citations tions of how proteins are purified to provide the student
within the text itself that should direct the student to the with an understanding of where the proteins themselves
scientific literature. The format of the book is intended to and their crystals come from. To permit him to recognize
resemble that of a biochemical journal to ease the transi- intimately the polypeptide that folds to produce the crys-
tion. At the completion of the course, the student should tallographic map of electron density, the electronic and
be equipped to take charge of his own education by crit- atomic details of its covalent bonds are then described,
ically reading the biochemical literature on his own. To and the methods for elucidating its sequence of amino
do this he must be able to understand the experiments acids and defining its posttranslational modifications are
performed and be able to reach the same conclusions as explained. A comprehensive presentation of the
do the authors of each publication or to realize that the methods of crystallography, which permits the student
authors are mistaken in their interpretations. It is my to understand critically its strengths and weaknesses,
intention to develop in the student the ability to draw his and a thermodynamic discussion of the properties of
own conclusions from only the experimental results. To noncovalent forces—ionic interactions, hydrogen bond-
this end, there are problems after most of the sections to ing, and the hydrophobic effect—as they are expressed in
reinforce the concepts that have just been presented in aqueous solution are a prelude to an exhaustive descrip-
the text. These problems are usually based on actual tion of the atomic details of the structures of proteins as
experimental results, which are to be evaluated by the observed in crystallographic molecular models. The
student, ideally in the absence of assistance or misdirec- resulting understanding of their molecular structures at
tion from the authors of the publications from which the the atomic level and the noncovalent forces that produce
results were taken. those structures forms the basis for discussions of the
Refined crystallographic molecular models provide evolution of proteins, of the symmetry of the oligomeric
most of our knowledge of the structures of proteins. and polymeric associations that produce them, and of
Their importance and validity are self evident, and they the chemical, mathematical, and physical basis of the
provide the foundation on which almost all of the other techniques used to study their structures such as image
experimental observations in the field must rest. They reconstruction, nuclear magnetic resonance spec-
also create, in the imagination of the chemist, a reliable troscopy, proton exchange, optical spectroscopy,
abstract image of what the structure of a protein con- electrophoresis, covalent cross-linking, chemical modifi-
sists—its atomic details, its folded polypeptide back- cation, immunochemistry, hydrodynamics, and the
bone, its a helices and b structure, the packing of its scattering of light, X-radiation, and neutrons. The appli-
secondary structure, its globular or elongated shape, its cation of these procedures to the study of the folding of
irregular surface, its hydration, and the symmetric polypeptides and the assembly of oligomers and helical
arrangement of its subunits. This abstract image of a polymers is then described. Finally, biological
molecule of generic protein is synthesized by her imagi- membranes and the structures of their proteins are
nation from all of the particular crystallographic molec- discussed.
ular models she has viewed. Its fully developed mental To present a comprehensive view of the biophysical
existence permits her to understand the molecular basis chemistry of proteins, this text combines concepts of
of all of the other physical and chemical observations bonding and chemical reactivity, descriptions of macro-
that are made of proteins and thus what these observa- molecular structure, principles of thermodynamics, and
viii Preface
explanations of biophysical methods and their results. crystallographic molecular models over the last two
The concepts of bonding and chemical reactivity are pre- decades has included many with heterologous
sented in standard structural drawings of individual oligomeric associations where few were available at the
molecules or chemical reactions in which electronic and time that the First Edition was being written.
mechanistic aspects are emphasized as they are in Consequently, a new section discussing oligomeric pro-
courses in organic chemistry. The descriptions of macro- teins that are constructed heterologously has been
molecular structure are illustrated with stereo images of added. As part of this section, the major classes of these
crystallographic molecular models that are drawn by the proteins are discussed, including proteins involved in
author so that details appropriate to the particular points cellular control, motility, the cytoskeleton, the extracel-
made in the text are emphasized by choosing the appro- lular matrix, cellular adhesion, and cell-cell interactions.
priate views of the structures. The principles of chemical There is also a completely new section on the roles of
thermodynamics are applied in relationships among the metallic cations in the structures of proteins.
equilibrium constants and fundamental state functions. There are other instances in which major advances
The explanations of biophysical methods rely on the have led to extensive additions to the text. Descriptions
mathematical equations defining the physical properties and drawings of the crystallographic molecular models
being measured. The results of the experiments them- of representatives of the various classes of integral mem-
selves are found in graphs and tables derived from the brane-bound proteins, which were mostly unavailable
experimental literature. It is this combination of chemi- for the First Edition, have been added. There is now a
cal drawings, stereo images, mathematical equations, comprehensive description of mass spectrometry and its
graphs, and tables that makes this book both unique and application to the direct sequencing of proteins, the elu-
comprehensive. It also places severe demands on the cidation of the structures of posttranslational modifica-
student. She must have a firm background in physics, tions, and the determination of the molar masses of
mathematics, analytical chemistry, organic chemistry, proteins. The section on sequencing and modifying DNA
and physical chemistry to understand the material. In has been extensively expanded to include developments
the broadest sense the intention of the course is to in this rapidly advancing area. The number of posttrans-
educate protein chemists. A protein chemist should be lational modifications included in the section covering
able to evaluate critically the results of any of the this topic has been significantly increased, a reflection of
methods applied to the study of proteins. the new discoveries in this area. In particular, the
The foregoing describes both the First Edition and recently elucidated role of inteins in the posttransla-
the Second Edition of Structure in Protein Chemistry but tional rearrangements of the polypeptide backbone is
the Second Edition is a major revision of the first. All of described. There is a new discussion of the results of
the sections in each of the chapters in the Second Edition crystallographic molecular models of atomic resolution
of Structure in Protein Chemistry have been updated (Bragg spacing less than 0.1 nm) because many of these
extensively to include the relevant observations and new have also become available since the First Edition was
discoveries in the field that have been made since the written. The section on hydrogen bonding in proteins
First Edition was written. has been significantly improved by including the results
The significant progress that has been made since of double mutant cycles, a procedure that has been
that time has required that some sections of the book be developed since the First Edition was written. How the
completely rewritten. For example, because of the explo- most widely used algorithms for searching data banks of
sion of knowledge in the area of protein folding, the amino acid sequences work is described. There is a new,
section on the kinetics of folding has been completely detailed discussion on how an icosahedral assembly is
redone. Likewise, there has been a dramatic increase in expanded by incorporating segments of a hexagonal
the number of crystallographic molecular models of array, which is the strategy that viruses have used to
oligomeric proteins so that examples are now available of increase the size of their coats. The use of physical meas-
all of the point groups for the symmetric assembly of urements of a protein in solution to adjust its crystallo-
asymmetric objects. As a result, because oligomeric pro- graphic molecular model, also a new development, is
teins and isometric oligomeric proteins can now be dis- now discussed in the context of comprehensive descrip-
cussed more systematically, the sections covering their tions of the techniques that are used to make these
structures have also been completely reorganized and adjustments. For example, scattering curves from solu-
rewritten, and stereo drawings of crystallographic molec- tions of a protein are now used to adjust its crystallo-
ular models of proteins representing each point group graphic molecular model to the structure that it assumes
are included. when it is in solution.
Completely new sections have also been added to In several instances, descriptions of procedures
the book. A new section on the structural details of the have been made more comprehensive to improve the
interactions between proteins and nucleic acids has student’s understanding. The section on nuclear mag-
been added, in part to recognize the significant progress netic resonance has been significantly updated to
that has been made in this area. The explosion of new describe the improvements that have been made in this
Preface ix
field since the First Edition, but the physical basis and the Chemical Society and tied up all of the many loose ends
techniques of nuclear magnetic resonance spectroscopy with acumen. I would again like to thank all of the
itself are now more comprehensively discussed so that a reviewers of the First Edition because much of their
more complete understanding of the method is gained. assistance has been carried into the Second Edition.
The limited description of electron paramagnetic reso- Russell Doolittle and Harvey Itano read large portions of
nance spectroscopy in the First Edition has been the manuscript of the First Edition and provided excel-
expanded to create a new section in which examples of lent suggestions. Individual sections of the manuscript of
its recent use are presented. The use of image recon- the First Edition were reviewed critically by Frank
struction and cryo-electron microscopy to produce Huennekens, Bruno Zimm, Charles Perrin, Steven
structures of helical polymeric proteins and membrane- Clarke, Ajit Varki, David Matthews, John Edsall, Cyrus
bound proteins is more comprehensively discussed than Chothia, Arthur Lesk, David DeRosier, Nigel Unwin,
it was in the First Edition. Stephen Harrison, Fred Hartman, John Simon, George
All of these changes together have created a text Fortes, Rachel Klevit, Ken Dill, Robert Baldwin, Howard
that is not only an update but also a significant expansion Shachman, Dennis Haydon, and Guido Guidotti. I would
of the First Edition. like to thank all of the reviewers of the Second Edition.
It is a pleasure to thank everyone who has helped Individual sections of the manuscript of the Second
me in the preparation of this book. First and foremost I Edition were reviewed by Larry Cummings, Martin
thank my wife Francey. She has entered into the com- Webb, Iain Nicholl, Jeffrey Carbeck, Lloyd Waxman,
puter in the proper places and the proper order all of the Partho Ghosh, Charles Perrin, Kenneth Walsh, Tama
almost impossible to follow changes and insertions that Hasson, Steven Clarke, Ajit Varki, Brian Matthews, the
were haphazardly written in pencil and red pen over the late Carl-Ivar Brändén, Dave Matthews, Ken Dill, V.
typescript of the First Edition or written out in my hand Adrian Parsegian, Michael Page, Malcolm MacArthur,
as inserts on sheets of scrap paper, while at the same Patrick Argos, Stephen Harrison, William Trogler, Russell
time correcting my spelling, grammar, and punctuation. Doolittle, Henryk Eisenberg, Pierre Goloubinoff, Robert
Without her assistance, it would have taken me at least Fletterick, Georg Schulz, Michael Rossmann, Ron
an additional year to finish the job less successfully. Milligan, Fred Hartman, Donald Engelman, David
Daniel Louvard was kind enough to provide me with an Johnson, Walter Englander, C. Nick Pace, Franz Schmid,
office at the Institut Curie and access to its library in the Arshad Desai, Stephen White, and Douglas Rees. Each of
years 1995–1996 and 2000–2001 so that I could pursue them provided detailed criticism, many helpful com-
the project while away from La Jolla. I would also like to ments, and reassurance.
thank Heather Whirlow Cammarn, my copyeditor, who
converted the manuscript into the style of the American Jack Kyte
Stereo Drawings
Almost all of the stereo drawings of crystallographic will usually complain that although everyone else can
molecular models included in Structure in Protein learn to use one of these viewers, he cannot. It is also my
Chemistry were produced by the program Molscript cre- experience that everyone learns to use one. When I have
ated by Per J. Kraulis. If you have the time and enjoy put a question on an examination such Problem 4.5,
working on a computer, you should learn how to use the where one is asked to write down the sequence of the
program, which is described at http://www.avatar.se/ protein by examining a drawing of a crystallographic
molscript/doc/molscript.html. It is now standard prac- molecular model that she has never seen before, every-
tice to publish drawings of crystallographic molecular one in the class gets at least 90% of the sequence correct,
model in this format. To appreciate the results of crystal- which would have been impossible unless everyone was
lographic studies, one must be able to view these images. able to see the image in stereo. It is essential that anyone
Although a few individuals can view them effortlessly by interested in the structures of proteins learn to view
crossing their eyes, the rest of us need a stereo viewer. drawings of crystallographic molecular models in stereo.
The stereoviewer that I use and have recommended for The drawings in this text have been placed vertically
my students is the PEAK™ Pocket Stereo Viewer with 2¥ rather than in their usual horizontal orientation and each
magnification (124 mm legs). Suppliers of this viewer can has been placed on the outside edge of a page. This has
be found using Google. It has been my experience that a been done to allow each image to be spread as flat as pos-
student who has never viewed a stereo drawing before sible for the best viewing.
NEWT
There are hundreds of proteins discussed in the text of investigation, one particular species is chosen as a source
Structure in Protein Chemistry. Each of these proteins is for a particular protein for a particular reason known
present in many different species of organisms, but usu- only to the investigator. The practical result of these
ally the details that are being discussed are specific diverse choices is that the names of hundreds of species
enough that the protein from only one of these many of organisms are used in this book. I have chosen to
species is described, even though what is described name each of them with the usual Latin names of their
would fit the protein from any one of these species. genus and species, without explaining what the species
Furthermore, a particular protein from a particular are because I wanted to make the point that it doesn’t
species is always used in a particular experiment. The matter where a protein comes from. The names
names of both the protein being discussed and the Escherichia coli and Saccharomyces cervisiae and the
species from which it was derived are usually stated in adjectives murine, equine, bovine, canine, and human
the text. It turns out that protein chemists, because they are probably already familiar to you but very few of the
realize that the same protein from different species of other names will be. Even though there is no need to
organisms is basically the same, don’t really care from know, if you would like to know to what the name of a
what species of organism the protein comes, and a genus and species refer, go to http://www.ebi.ac.uk/
remarkably large collection of species of organisms are newt/display, and enter the name of the species.
used as sources for proteins. Usually, in a particular
ExPASy
Hundreds of thousands of proteins from hundreds of dif- should become familiar with this site on the web, not
ferent species of organisms have been sequenced. The only for the sequences it makes available but also for the
sequences of their amino acids are tabulated in large free programs that are available at the site to analyze
data banks. The most easily used of the data banks is the those sequences.
Swiss-Prot/TrEMBL at http://www.expasy.org/. You
The Protein Data Bank at http://betastaging.rcsb.org/ models that are then listed, choose “Download Files”,
pbd/Welcome.do contains the atomic coordinates of and then choose “PDB File”. The atoms are listed by the
most of the crystallographic molecular models that have name of the amino acid, the position of that amino acid
been constructed. At the moment there are 36,000 sepa- in the sequence of the protein, and their locations within
rate molecular models entered in the data bank. You that amino acid by using the abbreviations given in
should look at some of the lists of the full coordinates to Figure 4.14. The list is that of the x, y, and z coordinates
get a feeling for what such a file contains. Enter the name of each atom in Ångstroms. Each file constitutes the raw
of a protein for which there is a stereo drawing in the text data on which the molscript program operates.
of this book, click on the name of one of the molecular
Chapter 1
Purification
The living world that teems around us, the world of membranes, are often scattered through the cytoplasm.
species, individual organisms, organs, tissues, and cells, In a eukaryotic cell the largest of these is the nucleus,
can be viewed as the manifestation of a vast fluid array of containing most of the nucleic acid in the cell.
protein molecules, each appearing and disappearing in The strategy that has been applied most frequently
the proper place at the proper time. This array of protein to the study of proteins is to identify a particular biologi-
molecules is the outcome of a long history. Each protein cal feature of a living organism and then purify the pro-
within the array is itself the product of evolution by nat- tein or proteins responsible for it. Typically, when a
ural selection, which has had more than two billion years complex, beautiful, intricately organized biological spec-
and much of the surface of the earth to explore, by imen, such as a tissue or a suspension of cells, is submit-
random, irrational trial and error, strategies with which ted to the first step in any purification procedure, it is
to accomplish the function of that protein. There are sev- immediately sundered beyond recognition and becomes
eral consequences of this fact. First, chemical principles a nondescript jumble of its organelles and broken frag-
in addition to those of which we are aware have been dis- ments of its membranes and their integuments sus-
covered and exploited. Second, completely different pended in an aqueous solution of proteins, nucleic acids,
chemical mechanisms often have been applied haphaz- metabolites, and salts. This event is referred to as
ardly to achieve similar purposes. Third, there are puz- homogenization. It is usually accompanied by the dilu-
zling features that are inefficient, useless, or meaningless. tion of the proteins in the initial specimen by addition of
Fourth, the result of this process does not resemble any- a buffered aqueous solution. Following the homogeniza-
thing the human mind would have designed, even if it tion, insoluble fragments are removed by centrifugation
were aware of all of the available chemical strategies. One to produce a clear solution, the protein concentration of
consequence of these facts is that argument by exclusion which is 1–10%. This solution contains most of the pro-
is useless because it cannot be assumed that the mecha- teins that were once the living cytoplasm of the speci-
nism by which a biological problem was solved is only men. It is from this solution that particular proteins can
one or more of the mechanisms of which we can con- be isolated. The purification of a protein is the separation
ceive. of that protein from all of the others in a homogenate. A
One fruitful approach in our attempt to understand particular protein must be purified before its molecular
life has been to study, individually or in small groups, the structure can be studied.
proteins that produce it to gain insight into the role of Usually, the only interest that one has in a particu-
each one in the overall scheme. An argument could be lar protein arises from its participation in some process
made that a cell does seem to be no more than the sum of biological importance. It might be an enzyme respon-
of its parts and that a significant understanding of how it sible for catalyzing a particular reaction; it might be a
accomplishes its purpose can be gained by studying structural protein creating the macroscopic shape of the
those parts individually. Because the proteins are the cell; it might be a protein that binds a hormone or neu-
parts of a cell that perform almost all of the chemical and rotransmitter; or it might be a protein that binds to DNA
structural transformations that occur within it, they have and controls its transcription. To distinguish one protein
attracted the most attention. from the others in a complex mixture, an assay for the
The most dynamic region in a living organism is the protein of interest, based on its particular function, is
cytoplasm of the cells or cell from which it is made. About required.
20–30% of the total mass of cytoplasm is protein dis- The most widely used procedure for purifying pro-
solved in a solution the solvent of which is water. The teins is chromatography. This technique separates mol-
cytoplasm is enclosed within a thin, fragile, continuous ecules of protein by differences in the rate at which they
membrane. About 60–80% of the dry weight of this mem- move along a cylinder of a porous solid phase as a liquid
brane is protein dissolved in a solution, the solvent of phase percolates through it. If the solid phase is properly
which is lipid. This membrane is surrounded and sup- chosen, each protein travels through the cylinder at a dif-
ported by a tough protective integument of polysaccha- ferent rate and each emerges in the solution coming out
ride; polysaccharide and protein; or polysaccharide, of the cylinder at a different time. In this way, one can be
lipid, and protein. Organelles, enclosed within their own separated from the others. In order to distinguish the
2 Purification
protein of interest from the others as they emerge from separate phase are amplified by the process of chro-
the chromatographic column, the assay for that protein matography.
is used. As the protein becomes purified, the preparation When a chemical substance A, which will be
displays greater and greater activity in the specific assay referred to as the solute, is added to a vessel containing
for a given amount of total protein. two immiscible phases and the system is allowed to
Once the protein has been purified, analytical come to equilibrium, the solute A will distribute between
methods must be used to demonstrate that only one pro- the two phases in a characteristic manner. The solute can
tein is the major component in the final preparation and be an inorganic ion, a small organic molecule, a protein,
that this protein is responsible for the biological function a nucleic acid, a polysaccharide, or any other similar sub-
of interest. The analytical procedure most suited to this stance. The two phases can be, for example, two immis-
demonstration is electrophoresis. Electrophoresis sepa- cible liquids, a liquid and a solid, or a gas and a liquid; the
rates proteins by both their charge and their shape, and only requirement is that those two phases be brought
if used with discontinuous stable boundaries, elec- into sufficient contact to permit the distribution of
trophoresis can have high resolution. solute A between them to reach equilibrium and that
Once a protein of known function has been purified they then be separated in some way that does not redis-
to homogeneity, it can be crystallized. As in organic tribute the solute. The simplest examples are a two-
chemistry, crystallization is a way of harvesting a partic- phase, solvent–solvent extraction or the suspension of
ular substance in a highly purified form. Ideally, every some finely divided solid in a liquid followed by its
protein that was purified would be crystallized and removal from the liquid by filtration.
stored in this form, as are organic molecules. In this After the equilibration and separation of the two
form, each suspension of crystals would represent a pure phases, the moles of solute A in each of them can be
chemical compound. In practice, because crystals are determined. In the cases that are generally encountered,
often difficult to make and yields in crystallizations are at least one of the phases is a fluid that can be freed
poor, purified proteins are usually left in solution or pre- entirely of the other phase. This fluid will be arbitrarily
cipitated for storage. It is these solutions, precipitates, or called the mobile phase. In the special case when a pro-
suspensions of crystals that are the raw material for stud- tein is solute A, the mobile phase is invariably an aque-
ies of the structures and functions of the proteins they ous solution of moderate ionic strength buffered at a
contain. The purpose of this chapter is to describe how a specific pH. In any situation, however, the molar con-
particular protein is purified from a complex mixture of centration of solute A in the mobile phase can be readily
proteins such as the homogenate of either a tissue or a measured. The second phase, arbitrarily referred to as
suspension of cells. the stationary phase, can be an immiscible liquid, a
Adsorption to stationary phases and chromatogra- solid, or a solid in which a liquid is entrapped. Because of
phy are the bases for both the purification of proteins the peculiarities of this stationary phase, the best way to
and many of the assays used to identify particular pro- express the concentration of solute A that has become
teins, so these processes will be considered first. physically associated with the stationary phase, [A]¢S, is in
moles (liter of bed)-1, where the volume of the bed is the
volume filled by the stationary phase when it has
Partition into Stationary Phases and settled.*
Chromatography Three general types of behavior1 have been
observed in such a partition (Figure 1–1). The simplest
The goal of any procedure used to purify a particular sub- behavior, type A, occurs when the concentration of the
stance from a complex mixture is to separate that sub- solute A in the stationary phase increases in direct pro-
stance from all of the other components in the mixture. portion to its concentration in the mobile phase. This
When adsorption or chromatography is used for this pur- type of behavior is encountered in solvent–solvent
pose, differences in the preferences of solutes in a solu- extractions or in chromatography by molecular exclu-
tion for another phase are exploited. The simplest sion. In the latter example, it results from the fact that the
example of such a strategy is an affinity adsorbent. stationary phase is nothing more than trapped, and
Suppose a small molecule that could be tightly bound by thereby immobilized, mobile phase. Behavior of type B
only one particular protein in a solution was covalently (Figure 1–1) is encountered when the stationary phase
attached to a solid surface. By binding them specifically, saturates with solute A. It results from the presence of
this affinity adsorbent would collect molecules of only only a finite number of sites on the stationary phase that
that one protein on its surface. The rest of the molecules are all equivalent in their individual affinities for solute A
of protein in the solution could be washed away, and the
molecules of the desired protein could then be released.
Unfortunately, such highly specific adsorbents are not * Concentrations in moles per liter of bed are indicated by primed
usually available, so small differences in affinity among notation. Concentrations in moles per liter of stationary phase or
proteins or among other molecules in a solution for a moles per liter of mobile phase are in the usual unprimed notation.
Partition into Stationary Phases and Chromatography 3
stationary phase in such a way that the contact between define the same parameter. The width of the distribution
the two phases is maximized and equilibration of the of solute A at half height, w",A, is the width, in units of
solute between them is encouraged. Examples are paper distance for definition 1 or volume for definition 2,
chromatography, in which the liquid mobile phase between the two points at which the concentration of
moves down the paper while flowing among the cellulose solute A is half its maximum concentration at the peak.
fibers that form the stationary phase; thin-layer chro- The resolution, RAB, between two solutes is a measure of
matography, in which the liquid mobile phase creeps up the completeness with which they are separated, a prop-
a thin layer of the solid, dry stationary phase drawn by erty that increases as the difference in their relative
the capillary force arising from its movement between mobility increases and decreases as their widths at half
finely divided particles; column chromatography, in height increase. The larger the differences in the various
which the fluid mobile phase percolates through a finely Rf,i and the smaller the various w",i, the more successful
divided, solid stationary phase compacted in a cylinder; will be the separation of the different solutes i.
and gas–liquid chromatography, a type of column chro- Expressions for Rf,A, w",A, and RAB as functions of param-
matography in which a gas containing the solutes is eters that can be manipulated are of value in the under-
passed through a finely divided solid phase coated with a standing and design of chromatographic separations.
liquid of low volatility. All of these are examples of zonal There are two approaches to describing the
chromatography. phenomenon of chromatography in theoretical terms.4
Zonal chromatography is chromatography in which It can be treated as the continuous process that it is, and
the mixture of solutes to be separated is introduced in a differential equations can be formulated to describe the
thin zone at one end of the bed of stationary phase and differential changes in solute positions and concentra-
the mobile phase is then set in motion. The molecules of tions with time. These differential equations, however,
solute in the mixture meander through the system, drawn do not have simple solutions, nor do they lead to an intu-
forward by the movement of the mobile phase but itive understanding of the process. The alternative
retarded by the stationary phase in which each spends a approach is based on the concept of the theoretical plate,
certain fraction of its time. The fraction of the time each which was developed originally to describe the separa-
solute spends in the stationary phase is determined by its tion performed by a fractional distillation column.5,6
affinity for the stationary phase, and this is determined by Although this is a discontinuous model for a continuous
its bulk distribution behavior (Figure 1–1). Since the mol- process, the treatment is formulated in terms of an easily
ecules of each solute spend a different fraction of their understood mechanism and does provide, in at least one
time in the immobility of the stationary phase, each solute case, that of countercurrent distribution chromatogra-
moves through the system at a different rate and the com- phy, an exact solution to the problem. Martin and Synge7
ponents of the mixture are isolated one from the other into were the first to apply this model to the process of chro-
separate zones, which are also referred to as peaks or matography.
bands. The separated solutes are collected either by divid- Suppose that a chromatographic separation always
ing the stationary phase itself and extracting them, as in operates at concentrations of solute A such that the
paper chromatography, thin-layer chromatography, or amount associated with the stationary phase and the
countercurrent distribution chromatography, or by con- mobile phase in the chromatographic system is a linear
tinuously collecting the mobile phase as it emerges at function of its concentration in the mobile phase (behav-
the opposite end of the bed of the chromatographic ior of type A, Figure 1–1). If so, at equilibrium
system, as in column chromatography or gas–liquid
chromatography. Any visual display of the distribution [ A ]¢S = a ¢A [ A ]¢M (1–1)
over the field of the chromatographic system of one or
more of the substances being separated is referred to as a
chromatogram. where [A]¢M is the concentration of solute A in the mobile
The important properties of the chromatogram are phase in units of moles (liter of bed)-1, where the volume
the relative mobilities of the solutes, the widths of the of the bed, VT, is the volume filled by the stationary and
peaks of the concentrations of the solutes at their half mobile phases together as they are packed into the chro-
heights, and the resolution of those peaks one from the matographic system; and where a A¢ is a partition coeffi-
other. The relative mobility, Rf,A, of a particular solute A cient. The units for [A]¢S are, as defined earlier, moles (liter
is either (1) the distance that the peak of its distribution of bed)-1.
has traveled through the system divided by the distance The bed of the chromatographic system is formally
traveled by the mobile phase or (2) the total volume of divided into a series of equivalent theoretical plates. A set
the mobile phase in the bed of the system, referred to as of theoretical plates is a set of contiguous compartments
the void volume, V0, divided by the total volume that has of equal volume formed by a set of evenly spaced planes
passed through the system before the peak of the distri- passing through the bed normal to the direction in which
bution of solute A emerges, referred to as its elution the mobile phase flows. The height equivalent to a the-
volume, Ve,A. Definitions 1 and 2 are two different ways to oretical plate, h, is the distance the mobile phase must
Partition into Stationary Phases and Chromatography 5
move, at the rate of normal flow, past the stationary while the width of its distribution at half height will be
phase until the concentration of solute in the fluid
emerging from the theoretical plate is equal to the con- h n 8 (ln2) a ¢A
centration the solute would have had if the fluid entering w",A = (1–5)
the theoretical plate had come into equilibrium with the 1 + a ¢A
stationary phase that fills the theoretical plate. For exam-
ple, if the fluid entering the upstream boundary of the In thin-layer chromatography and paper chro-
theoretical plate had a concentration of solute A equal to matography, the flow of mobile phase up the thin layer
[A]¢M,ent and the stationary phase already had solute A or down the paper is stopped before the downstream
immobilized within it at a concentration of [A]¢S,im, the boundary of the mobile phase reaches the end of the sta-
formal downstream boundary of the theoretical plate tionary phase. The number of steps n that have occurred
would occur at the point where the concentration of the is defined by the fact that the boundary has moved a
solute in the mobile phase, [A]¢M,lv, had reached a value distance nh from the origin at which the solutes were
applied. If the relative mobility of solute A is defined as
[ A ]¢M,ent + [ A ]¢S,im the distance solute A has moved, dA, divided by the dis-
[ A ]¢M,lv = (1–2) tance the boundary has moved, dM, then
1 + a ¢A
nh 1
R f,A = = (1–6)
where all concentrations are expressed in moles (liter of nh (1 + a ¢A ) 1 + a ¢A
bed)-1.
With this definition, the continuous process of
zonal chromatography is equivalent to the following dis- By combining Equations 1–3, 1–5, and 1–6
continuous sequence of events. A number of moles of
solute A equal to mTOT,A is added to the first theoretical
plate and allowed to come to equilibrium between the
w",A = dM h ( ) (
8 ln2 Rf,A 1 – Rf,A ) (1–7)
a ¢A (1 + a ¢B) + a ¢B (1 + a ¢A )
(1–9)
}
nh
dA = (1–4) Because a ¢A and aB¢ are fixed properties of the stationary
1 + a ¢A
phase, the solvent, and the solutes, this equation demon-
6 Purification
strates that resolution is increased either by decreasing If the resolution between the distribution of solute A and
the height of a theoretical plate or by running the chro- the distribution of solute B is defined as
matography over a greater distance.
In column chromatography the effluent emerging 2 Ve,A – Ve,B
from the end of the column is collected and the concen- R AB ∫ (1–13)
w",A + w",B
tration of solute A in this effluent is monitored as a func-
tion of the total volume that has emerged since the
chromatogram was begun. If the column of stationary then
phase contains p theoretical plates, the effluent collected
( )
and monitored is, by definition, the mobile phase enter-
ing plate p + 1. As mobile phase emerges from the end of 8 (ln2) 2 a ¢A – a ¢B
R AB = (1–14)
the system, the concentration of solute A that it contains p 2 + a ¢A + a ¢B
increases, reaches a maximum, and then declines. This
results from the approach of the peak of the distribution
of solute A to plate p + 1, its arrival at plate p + 1, and its If the solvent, ionic strength, temperature, and pH of the
passage beyond plate p + 1. The volume at which the mobile phase and the volume and chemical structure of
maximum passes through plate p + 1 is the elution the stationary phase remain the same so that the values
volume of solute A, Ve,A. It corresponds to the volume of of a ¢ are unchanged, the resolution of the separation can
mobile phase that must pass through the system to bring be improved by increasing the number of theoretical
the maximum of the distribution of solute A into plates, p, that the column contains. The most obvious
plate p + 1. Because it takes p steps for a volume equal to way to accomplish this is to increase the length of the
the void volume V0 to emerge from the column but the chromatographic column, but this can become both
peak of the distribution of solute A will have entered only cumbersome and expensive.
theoretical plate p(1 + aA¢ )-1 after p steps, the peak of the Because the height of a theoretical plate, h, is
distribution of solute A will enter plate p +1 only after a defined as the distance of passage required for equilib-
volume equal to V0(1 + a A¢ ) has passed through the rium to be reached, h decreases and p increases as the
system. It follows that flow rate of the chromatographic column is decreased, at
least until diffusion between the plates becomes a signif-
icant factor. In most cases, however, diffusion is severely
(
Ve,A = V0 1 + a ¢A ) (1–10) hindered by the structure of the stationary phase itself
and almost never becomes important, and the slower the
flow, the better the resolution. This is particularly impor-
and
tant in the chromatography of proteins, especially when
they are unfolded, because their slow rates of diffusion
V0 1 significantly decrease rates of equilibration with the sta-
R f,A ∫ = (1–11)
Ve,A 1 + a ¢A tionary phase.
The height of the theoretical plate decreases as the
diameter of the particles in a solid stationary phase
This is the fundamental equation governing column decreases,7 and it is advantageous to use particles of solid
chromatography. It connects the volume at which the phase that are as small as possible. The small size of the
solute A emerges from the end of the chromatographic particles increases the surface area available for equili-
column with its bulk partition coefficient for the material bration and decreases the distances over which the
composing the stationary phase. The relationship solute molecules must diffuse. The realization of this fact
between the relative mobility Rf,A and the partition coef- has led to the recent development of the high-pressure
ficient a A¢ is identical to that governing thin-layer chro- liquid chromatography foreseen by Martin and Synge.7
matography and paper chromatography (Equation 1–6). In such systems, the high pressure is inconsequential to
This is reassuring because it is reasonable that the same the process of separation but is required to force the
process occurs in all types of chromatography. The valid- liquid mobile phase through the small, finely divided
ity of this equation was verified experimentally by Martin solid particles of the stationary phase at a realistic rate.
and Synge.7 The particles themselves are spherical in shape and of
The width of the peak of concentration at half uniform diameter to promote uniform flow of as rapid a
height, in units of eluted volume, can be shown to be a rate as possible over the bed. Because the smaller parti-
function of the number of theoretical plates:4,7,10,11 cles of the solid phase decrease the height of a theoreti-
cal plate, more theoretical plates can exist in a given
8 (ln2) length of bed. This advantage can be exploited either to
w",A = Ve,A (1–12) increase the resolution or to decrease the length of the
p chromatographic column or both.
Partition into Stationary Phases and Chromatography 7
Because high pressures are used to increase the rate mobile phase. For example, the ionic strength of the
of flow through a shorter column, the major advantage of entering mobile phase can be increased continuously
high-pressure liquid chromatography is the speed with over time so that it is a linear function of the volume
which the chromatograms can be run. For example, introduced into the system. Mechanical devices are
when peptides are separated on chromatography by available to produce linear gradients or gradients that are
cation exchange with sulfonated polystyrene,12 at low exponential or logarithmic or some other function of the
pressure (<500 psi), the chromatography takes about volume by mixing two or more solutions that differ in the
25 h; when peptides are separated by reverse-phase property to be varied. When a gradient of pH is required,
adsorption chromatography,13 at high pressure the situation becomes somewhat more complicated
(>1000 psi), the chromatography takes only 1 h, even because the pH of a solution is usually controlled with a
though the resolution in each case is about the same. buffer. Not only is the pH a logarithmic function of the
With reverse-phase adsorption chromatography, the sol- concentrations of the conjugate acid and base of the
vents used are also more transparent to ultraviolet light, buffer, but changing the concentrations of conjugate
so peptides can be followed simply by their absorbance acid and base often affects the ionic strength. There is no
with a continuous-flow spectrophotometer. requirement, however, that the gradient be some
Improvements in the size, uniformity, and rigidity particular function of a particular property; the only
of the particles of the stationary phase have permitted requirement is that the property be varied continuously
similar increases in the rate at which chromatography of and monotonically.
proteins can be performed. These developments are The method of gradient chromatography is an
referred to commercially as fast protein liquid chro- important tool because it permits the partition coeffi-
matography. In both high-pressure liquid chromatogra- cient of solute A, a ¢A, to be decreased during the chro-
phy and fast protein liquid chromatography, the matographic run. This often is essential because if the
principles remain the same as before, often the solid partition coefficient for a particular solute is too large, it
phases remain the same as before, and the technological emerges from the system with such a large elution
improvements of the original techniques are based on volume, Ve,A, that the width of its band is unacceptably
previously noted predictions of the original theory. large. To produce satisfactory chromatography, the par-
The discontinuous model presented here for chro- tition coefficient must be less than 10 in most situations,
matography has been developed for regions of the parti- but frequently the values of the partition coefficient of
tion curves (Figure 1–1) where solute A distributes with a solutes in a complicated mixture can spread over a large
constant partition coefficient, a ¢A. It turns out that the range for one particular mobile phase of constant com-
most usual deviation from such ideal behavior is for the position. By using a gradient formulated so that all of the
stationary phase to display saturation (curves B and C, partition coefficients for the solutes decrease continu-
Figure 1–1). The more prominent this behavior becomes, ously, even those solutes with the highest affinity for the
the poorer the resolution of the chromatogram stationary phase eventually have low enough partition
becomes.4 As a rule, uniform stationary phases of high coefficients to emerge from the system within a reason-
capacity, by promoting the linearity of the partition func- able time. Usually, a gradient of ionic strength, cosolvent,
tion, provide the highest resolution. or pH is employed. It is constructed in such a way that
The fact that, unless the number of theoretical the chosen property continuously changes in a direction
plates is increased, peak height decreases in almost that will cause the solutes to have smaller and smaller
inverse proportion to a ¢A (Equations 1–11 and 1–12) pre- affinities for the stationary phase and elute earlier than
cludes the use of conditions where the solute has a high they would under isocratic conditions. For example, if a
affinity for the stationary phase. Usually, conditions solute is being adsorbed to a nonpolar stationary phase,
such as solvent, temperature, ionic strength, and pH of a gradient that increases in the concentration of a misci-
the mobile phase and the chemical structure of the sta- ble nonpolar solvent in water is used to decrease gradu-
tionary phase are manipulated to bring the values of a A¢ ally the affinity of the solute for the stationary phase.
for the solutes to be separated into a useful range, usually The stationary phase in a chromatographic system
between 1 and 10. A variation in one of these properties is the chromatographic medium. The solid matrix com-
of the mobile phase, however, can also be incorporated posing a chromatographic medium is almost always a
into the chromatography itself. polymer. Both natural polymers, for example, cellulose,
To this point, only isocratic zonal chromatography and unnatural polymers, for example, polymers of poly-
has been described. Isocratic zonal chromatography is styrene cross-linked with divinylbenzene, are used. The
chromatography in which the mobile phase introduced basic polymer is often cross-linked appropriately to
continuously into the chromatographic system remains increase its rigidity and manufactured in the form of
of constant composition. It is possible, however, to vary small spherical beads of uniform size to improve flow
continuously and monotonically the composition of the rates. For chromatography of small molecules such as
mobile phase entering a column. This systematic varia- metabolites or peptides, beads of polystyrene or silica gel
tion produces a gradient of one or more properties of the are used; for chromatography of proteins, cellulose or
8 Purification
beads of dextran, allyldextran, polyethers, agarose, or between solid phase and solute. It is this molecular con-
polymethacrylate are used.* Each of these intentionally tact that distinguishes chromatography by adsorption
inert polymeric matrices is then modified chemically. from chromatography by ion exchange.
The type of modification performed determines the Media for chromatography by ion exchange are
molecular property used by the chromatographic system solids formed from all of the usual neutral polymers to
to separate the solutes. which charged organic functional groups have been
Media for chromatography by adsorption are solid covalently attached (Figure 1–2). Anion-exchange
phases with which the solutes physically associate by media, or basic media, are solid phases to which func-
noncovalent forces. Certain amorphous or heteroge- tional groups of positive charge at neutral pH have been
neous solids such as hydroxylapatite and silica gel have covalently attached, and cation-exchange media, or
long been used as chromatographic media for chro- acidic media, are solid phases to which functional groups
matography by adsorption. Amorphous hydroxylapatite of negative charge at neutral pH have been attached. A
has been used extensively in protein purification. distinction can be made between weakly basic or acidic
Unfortunately, it is prone to significant irreversible and strongly basic or acidic ion-exchange media based
adsorption,2 and it is heterogeneous and saturates read- on whether the fixed charges can or cannot be neutral-
ily, which causes it to have nonlinear distribution behav- ized, respectively, by variation of the pH within the
ior. All of these properties limit its resolution. Although it ranges normally employed for chromatography. This is
separates nonpolar solutes successfully, silica gel has the an important distinction because the density of charge,
unfortunate property of strongly adsorbing hydrogen- and hence the capacity of the medium, can be changed
bonding solutes, which precludes its use with most bio- by changing the pH when weakly basic or weakly acidic
logical substances. Reverse-phase chromatographic ion-exchange media are used but not when strongly
media, however, have found wide use in protein chem- basic or strongly acidic ion-exchange media are used.
istry in the separation of small molecules such as pep- Examples of weakly basic functional groups are tertiary
tides and metabolites. Such media are composed of amines such as those on [2-(diethylamino)ethyl]cellu-
spherical beads of silica gel that have been heavily alky- lose (DEAE-cellulose); examples of strongly basic func-
lated with hydrocarbons of uniform length, for example, tional groups are quarternary ammonium cations such
octadecyl or octyl groups. This blocks the sites of hydro- as those on N,N-diethyl-N-(2-hydroxypropyl)ammo-
gen bonding and creates an apolar surface on the beads nioethyl agarose (QAE agarose) or trimethylammo-
that adsorbs apolar functional groups on the otherwise nioethyl polymethacrylate; examples of weakly acidic
polar solutes. Such a chromatographic medium, how- functional groups are carboxylates, such as those on car-
ever, shows little affinity for completely polar solutes boxymethyl cellulose, or phosphates, such as those on
unless a significant portion of the silica gel has lost its phosphocellulose; and examples of strongly acidic func-
apolar coating. tional groups are sulfonates, such as those on sulfonated
Beaded, cross-linked dextran, agarose, or poly- polystyrene (Figure 1–2) or sulfonated polymethacrylate.
methacrylate are covalently modified to produce chro- The fixed charges on the stationary phase are
matographic media for the chromatography of proteins responsible for the tendency of ionic solutes of an oppo-
by adsorption. The functional groups that are attached to site charge to associate with it. A counterion is a mobile
these hydrophilic matrices during the covalent modifica- ion that is dissolved in the surrounding solution and has
tion are hydrophobic groups such as phenyl, methyl, a charge opposite in sign to the fixed charges on the sta-
butyl, propyl, or tert-butyl groups. These hydrophobic tionary phase; a co-ion is a mobile ion that is dissolved in
groups associate directly with hydrophobic groups on the surrounding solution and has a charge of like sign to
the surface of a molecule of protein that are the side the fixed charges on the stationary phase. Solutes con-
chains of the amino acids valine, leucine, isoleucine, and taining simple univalent ionic functional groups do not
phenylalanine. form physical contacts with the isolated univalent fixed
In media for chromatography by adsorption, the charges of opposite sign that are attached to the station-
affinity of the molecules of the solute for the stationary ary phase when chromatography by ion exchange is per-
phase arises from their direct physical attachment to the formed in aqueous solution. Rather, such charged
molecular surface of the stationary phase. These tran- solutes (for example, nucleotides, amino acids, or pro-
sient associations are noncovalent in nature and can be teins) can be considered to be trapped as mobile counte-
considered as hydrophobic contacts or hydrogen bond- rions surrounding the covalently fixed charges in an
ing—designations that imply direct molecular contact ionic double layer.14 The two layers in an ionic double
layer are a layer of covalently fixed charges on the surface
of the polymer forming the stationary phase and a layer
* The commercial forms of these beaded, cross-linked polymers
of solution, adjacent to that surface, that is enriched in
each have their own uninformative names, but it is possible to
learn their compositions if one is perseverant. Although one or the counterions and depleted of co-ions. The molecular sur-
other chromatographic medium may have the same composition, face of the layer of fixed charge is considered to be the
each manufacturer claims unique benefits for his product. boundary between the layers of the double layer.
Partition into Stationary Phases and Chromatography 9
polystyrene -
SO3
O
sulfonated polystyrene -
P O-
O O
O
OH OH O
O HO O
O HO O O
O O O
HO O HO phosphocellulose
OH OH - P O
OH O-
cellulose O
CH 3 CH 3
H 3C
H 3C CH 3
(N OH
( NH O
O O- O
O
OH O O
O O HO O
O HO O
O O
O O
O O
H 3C
( H 3C N ( OH
carboxymethyl O- NH
H 3C
cellulose H 3C CH 3
2-(diethylamino)ethyl N,N-diethyl-N-(2-hydroxypropyl)-
cellulose aminoethyl cellulose (QAE cellulose)
(DEAE cellulose)
The enrichment of counterions, which in this case solution. Because the imbalance in charge that defines
are the solutes being separated, in the layer of solution the ionic double layer falls off exponentially, the layer of
results from the requirement for maintaining elec- solution in which the imbalance in charge occurs theo-
troneutrality. The layer of solution contains solutes of retically has no outer boundary. It is, however, arbitrarily
both net positive and net negative charge but has an assigned a thickness that is approximately that distance,
excess of solutes of net charge opposite to the charge of from the surface of fixed charges, at which the space
the functional groups in the layer of covalently fixed charge has decreased by a factor of exp (-1). Under the
charges and is depleted in solutes of opposite charge. normal conditions of chromatography, the thickness of
The layer of covalently fixed charges is usually consid- the layer of solution in the double layer would be less than
ered to be localized in a geometric surface representing 10 nm.14 It can be assumed that the boundary that sepa-
the molecular surface of the polymer, and the layer of rates the stationary phase from the mobile phase during
solution enriched in the respective counterions is con- the chromatography, namely the outside surface of the
sidered to have the properties of a space charge extend- bead, lies at a much greater distance than this from the
ing into the surrounding solvent.14 molecular surface of the charged strands of polymer
The reason that the diffuse space charge extends a within the bead because flow occurs around beads of
significant distance into the solution beyond this bound- dimensions at least a thousand times larger. Therefore,
ary is that the positive and negative charges in the solu- the entire ionic double layer must be within the chro-
tion are on mobile, dissolved cations and anions, and the matographic stationary phase.
enthalpic tendency of the counterions to gather at the If this assumption is made, the distribution of
charged surface of the boundary and the tendency of the counterions between the stationary phase and the
co-ions to avoid the charged surface of the boundary is mobile phase becomes formally equivalent to the distri-
counterbalanced by the entropic tendency for each of bution of permeant counterions across a permeable
them to diffuse randomly throughout the surrounding membrane when a charged, impermeant macromole-
10 Purification
cule is present on only one side of the membrane. In the concentration are moles (liter of stationary phase)-1 and
case of chromatography by ion exchange, the charged moles (liter of mobile phase)-1.
polymer of the bead is formally equivalent to the Equation 1–19 predicts that the partition coeffi-
trapped, charged macromolecule. If this is the case, the cient, aA-, for solute A should be inversely proportional to
sum of the fixed charges and the dissolved mobile the concentration of K + in the mobile phase. Because the
charges of the same sign in the stationary phase must internal volumes of the stationary phases in chromatog-
equal the sum of the dissolved mobile charges of the raphy by ion exchange are fairly small and the capacities
opposite sign in the stationary phase. It follows that of most media are large even in terms of equivalents (liter
the concentration within the stationary phase of any of bed)-1, the situation in which [K+]S = [N+]S is probably
solute of charge opposite to the fixed charges must rarely approached, and Equation 1–19 should govern
always be greater than its concentration within the most concentrations of salt employed. The effect of
mobile phase, and it is this bias that can produce signifi- adding a univalent salt to the mobile phase is to decrease
cant values of a ¢i. This bias can be treated by the Donnan the value of the partition coefficient for the anion A-
formalism.15 between the cationic stationary phase and the aqueous
Consider the situation of an anion-exchange mobile phase. In this way, the value of aA- can be adjusted
medium of univalent fixed positive charges, N+ [an exam- by varying the concentration of electrolyte to optimize an
ple would be [N,N-diethyl-N-(2-hydroxypropyl) isocratic separation, or a gradient of the electrolyte can
aminoethyl] cellulose, Figure 1–2], and a univalent be used to vary aA- continuously. If the concentration of
anionic solute, A- (an example would be AMP-), in the electrolyte is low, aA- will be large and the mobility of A-
presence of a dissolved univalent salt, K+Cl-, referred to will be negligible. Therefore, a charged solute can be
as the electrolyte. Assume that the original stationary gathered tightly at the origin of the chromatographic
phase was the chloride salt of N+ and that the solute system from a large volume of a dilute solution at low
before it was added to the stationary phase was the ionic strength, and chromatography can then be initi-
potassium salt of A-. All concentrations are expressed in ated by increasing the concentration of the electrolyte.
terms of moles (liter of phase)-1, hence the unprimed In a weakly basic or acidic ion-exchange medium,
values. From the requirement for electroneutrality the titration of the charges that occurs upon adding acid
or base, respectively, occurs over a broad range of pH
because of electrostatic repulsion among the fixed
[ K + ]S + [ N + ]S = [ Cl – ]S + [ A– ]S (1–15)
cations or anions. This permits the density of charge on
the medium ([N+]S or [O-]S) to be continuously decreased
by incorporating a gradient of pH into the entering
[ K + ]M = [ Cl – ]M + [ A– ]M (1–16)
mobile phase. For example, if the stationary phase has
fixed, protonated tertiary ammonium cations, a gradient
where the subscripts refer to the stationary and mobile of increasing pH would decrease [R3NH+]S as it pro-
phases. Since the electrolytes are at equilibrium within gresses. The decrease in the density of charge ([N+]S or
the theoretical plate [O-]S) produces a decrease in aA- or aA+ (Equation 1–19),
causing the solutes to emerge sooner than they would
under isocratic conditions. When the solutes themselves
[ K + ]M [ Cl – ]M = [ K + ]S [ Cl – ]S (1–17)
are weak acids or bases, however, their ionization may
also vary as the gradient of pH progresses, but in the
opposite sense to the stationary phase; their effective
[ K + ]M [ A– ]M = [ K + ]S [ A– ]S (1–18)
charge will be increasing as the gradient progresses.
There is no question that Equation 1–19, although
In the particular circumstance where the concentration intuitively informative, does not describe real ion-
of solute A- is significantly less than the concentration of exchange processes. At face value it predicts that aA-
Cl - so that [A-] becomes negligible in both Equations should be a function only of the charge density on the
1–15 and 1–16 and the concentration of fixed charges in stationary phase and the concentration of electrolyte,
the stationary phase, [N+]S, is so large that [K+]S in and this is often not the case. Even simple solutes upon
Equation 1–15 becomes negligible, then ion exchange display affinities for the supporting poly-
meric matrix or the functional groups on the fixed
[ A– ]S [ N + ]S charges that sometimes differ greatly from this expecta-
a A– ∫ @ (1–19) tion. The reason for these deviations is almost certainly
[ A– ]M [ K + ]M due to the fact that solutes, brought to high concentra-
tion within the double layer by ion exchange, adsorb
where aA- is defined somewhat differently from the par- physically to these constituents, and as a result chro-
tition coefficient described so far. Instead of units of con- matography by adsorption is superimposed upon the
centration in moles (liter of bed)-1, the units of basic process of chromatography by ion exchange. The
Partition into Stationary Phases and Chromatography 11
clearest example of this is found in the separation of pH and varies over a wide range. If the pH is changed, the
amino acids on sulfonated polystyrene (Figure 1–3).16,17 partition coefficients of proteins upon ion exchange vary,
Even though the solutes in the series alanine, valine, and gradients of pH as well as gradients of ionic strength
leucine, and phenylalanine have almost identical acid are used in their chromatography. In the case of poly-
dissociation constants, and hence ionic charge, they are electrolytes of this type, interactions between the solute
cleanly separated. There is little doubt that the separa- and the stationary phase may also lead to direct adsorp-
tion observed in this series is due to chromatography by tion. Although simple univalent ions when they are at
adsorption performed by the styrene–divinylbenzene normal concentrations almost certainly do not physi-
copolymer of the matrix.16 An ion-exchange medium can cally associate with each other in aqueous solution, poly-
also participate in adsorbing simple cations or anions by electrolytes of opposite charge, such as proteins and
chelation, such as occurs in the binding of alkali metal ion-exchange media, sometimes do. This results from a
cations to polygalacturonic acid.18 cooperative association of the opposite charges on the
A molecule of protein is a macromolecular poly- two polymers that arises from the fact that the charges on
electrolyte, the effective charge of which is a function of the ion-exchange medium are covalently fixed and those
of the opposite sign on the protein are also covalently
fixed. It is always possible that there is a population of
sites on the ion-exchange medium where the distribu-
tion of charge complements the distribution of charge on
the protein, a possibility that will produce physical
adsorption. This, however, is probably a rare phenome-
non; most of the time the molecules of protein are simply
trapped inside the ion-exchange medium as mobile
counterions in the ionic double layer.
Media for chromatography by molecular exclu-
sion* separate molecules on the basis of differences in
their size and shape. The beaded solids used as station-
ary phases are tangled webs of hydrophilic, linear poly-
mers—dextran, agarose, polyacrylamide, polyether, or
polymethacrylate—cross-linked among themselves
randomly along their length. These matrices can be
produced in two ways. First, polysaccharides such as
agarose and dextran spontaneously imbibe water and
swell when the dry solid is exposed to an aqueous solu-
tion. The degree to which the linear polymers are cross-
linked among themselves determines how much water
they will imbibe at saturation. This is designated as their
water regain, Wr, in milliliters (gram of polysaccharide)-1.
This in turn determines the fraction of the volume of the
stationary phase occupied by solid polymer, fpoly:
Vpoly 7poly
Figure 1–3: Separation of amino acids on chromatography by f poly = = (1–20)
cation exchange.16 A mixture of amino acids in the ratios typical of V H2O + Vpoly Wr + 7poly
those found in a protein was submitted to chromatography on a
column (0.90 cm ¥ 100 cm) of sulfonated polystyrene (Figure 1–2)
in the sodium form. The values of the pH and temperatures of the where Vpoly is the volume occupied by polysaccharide,
buffered mobile phases are noted below the horizontal axes, which VH2O is the volume occupied by water, and 7poly is the
register the volume of the mobile phase that has passed through
partial specific volume of the polysaccharide in milli-
the column (in centimeters3) since initiation of the chromatogra-
phy. Changes from one mobile phase to the next were made dis- liters gram-1. Second, polyacrylamide does not swell
continuously at the times noted. Individual fractions of the effluent readily but can be polymerized from acrylamide
emerging from the bottom of the column were collected and monomers and a small amount of the cross-linker
assayed for their concentration of amino acid (millimolar). The rel- N,N ¢-methylenebis(acrylamide), both dissolved at a
ative mobility, Rf, of each amino acid in the initial isocratic separa-
certain concentration in an aqueous solution. This pro-
tion at pH 3.41 would be the void volume of the column divided by
the volume at which its peak of concentration emerged from the duces a rigid gel that can be fragmented. The majority of
column. The width at half height, w", of each peak is its width in
milliliters at a level of concentration half that of the concentration
at its peak. Reprinted with permission from ref 16. Copyright 1951 * This method is also called size exclusion, gel filtration, and gel
Journal of Biological Chemistry. permeation.
12 Purification
the volume inside the beads of any of these stationary enter, which is its elution volume minus the void volume,
phases for chromatography by molecular exclusion is divided by the volume of the mobile phase within the
occupied by water. When water is within the tangled web bed, which is, by definition, the void volume V0.
of the bead, however, it is no longer mobile but station- Parameters other than the partition coefficient, however,
ary. The mobile phase percolates around the beads and are usually used to define the behavior of a solute on
flow occurs only in the interstices among the beads. The chromatography by molecular exclusion. If VT is the total
void volume, V0, is the volume of this space outside of the volume of the bed of the chromatographic system, then
beads. the volume of the bed occupied by the stationary phase
The larger the molecule of solute, the less of the is V0 - VT. The fraction of the volume of the stationary
open space inside the beads of the stationary phase is phase that is available to solute A is designated Kav,A:
available to it. If solute A is too large, it cannot enter the
beads at all, and its peak emerges from the system at the Ve, A – V 0 a ¢A
void volume, V0. Therefore, the elution position on the K av, A ∫ = (1–21)
chromatogram of the completely excluded molecules VT – V0 VT
–1
marks the position of V0. A small molecule (in theory, V0
water itself or something equivalent to it) can enter the
entire open space in each bead, and its elution position
marks Vi, the included volume. Unlike with most other Another parameter is often used to describe the elution
chromatographic separations, there is an end to a molec- during chromatography by molecular exclusion. This is
ular exclusion chromatogram because no solute can see the fraction of the volume within the stationary phase
a larger volume than Vi. The only useful separation that available to a small reference solute, solute R, that is also
occurs in such a system is of those solutes that emerge available to solute A, and it is designated KD,A so that
between V0 and Vi, because all solutes larger than a cer-
tain size travel together at V0 and all solutes smaller than Ve, A – V0
a certain size travel together at Vi (Figure 1–4). Between K D, A ∫ (1–22)
Ve,R – V0
V0 and Vi on the chromatogram, the larger solutes are the
first to emerge.
Because the fluid contained within the beads is where Ve,R is the volume at which solute R elutes. If the
identical in composition to the mobile phase percolating reference solute were able to enter the entire aqueous
around the beads and because a polymer that theoreti- phase within the stationary phase, Vi, then Ve,R would
cally has no affinity for the solutes being separated has be equal to Vi and KD,A would equal Kav(1 – fpoly)-1. The
been chosen, the partition coefficient for solute A, a ¢A, difficulty with this definition is that it depends on
between stationary and mobile phases is the ratio of the the identity of the reference solute.
volume within the stationary phase that solute A can
Suggested Reading
Moore, S., & Stein, W.H. (1951) Chromatography of amino acids on
sulfonated polystyrene resins, J. Biol. Chem. 192, 663–681.
V0 Vi
Problem 1–1: Assume that the total volume of mobile
Ve, B phase in the column described in Figure 1–3 is 45 cm3.
Ve, A
[solute]
this coenzymatic requirement should have been If the reaction is run with S-adenosyl[methyl-14C]methio-
expected because the enzyme catalyzes a cleavage nine, the [14C]cyclosporin produced can be isolated, after
immediately adjacent to an acyl carbon, but such argu- extraction, by thin-layer chromatography.25
ments are usually after the fact. Often the requirement Methylamine–glutamate N-methyltransferase cat-
for a coenzyme is not obvious and is both difficult and alyzes the reaction
frustrating to discover.
It also often happens that, as with a coenzyme, a ¬-glutamate + [14C]methylamine 1
metallic cation, such as Mg2+, Ca2+, Zn2+, Cu2+, Fe2+, or K+, ammonia + N-[14C]methyl-¬-glutamate
is required by a protein to perform its function and must (1–27)
be added to the assay. Although there are often obvious
choices, such as Mg2+ for enzymes having phosphoesters The [14C]methylammonium cation and the [14C]methyl-
as reactants, the requirement for a particular metal is L-glutamate can be separated by isocratic chromatogra-
often unpredicted. phy by cation exchange.26 L-Lysine, N e-methyl-L-lysine,
The most unambiguous assay of the reaction cat- and N e,N e-dimethyl-L-lysine are converted in the pres-
alyzed by an enzyme is one in which the reactants and ence of S-adenosyl[methyl-3H]methionine into mixtures
products are chromatographically separated after the of N e-[3H]methyl-L-lysine, N e,N e-[3H]dimethyl-L-lysine,
reaction and the quantities of each are determined. The and N e,N e,N e-[3H]trimethyl-L-lysine by lysine N-methyl-
introduction of rapid, automated, high-pressure liquid transferase. After removal of unreacted S-adenosyl
chromatographic systems with associated monitoring [methyl-3H]methionine with activated charcoal, the
systems of high sensitivity has made this approach con- three radioactive products can be separated by thin-layer
venient and efficient. If radioactive reactants are avail- chromatography and quantified individually.27
able that can be turned into radioactive products, As in the previous example, where unreacted S-
reactants and products from a large number of assays adenosyl[methyl-3H]methionine was removed by
can be separated in arrays of simple, inexpensive chro- adsorption to activated charcoal, the chemical transfor-
matographic systems and their respective quantities can mation performed by an enzyme often produces a prod-
be determined by scintillation counting. uct that can be exclusively transferred to a separable
Examples of assays in which reactants and products phase. For example, tryptophan-tRNA ligase catalyzes
are chromatographically separated have been used for the reaction
the purifications of the proteins geranyltranstransferase,
cyclosporin synthase, methylamine–glutamate MgATP + ¬-[14C]tryptophan + tRNATrp 1
N-methyltransferase, and lysine N-methyltransferase. AMP + Mg-pyrophosphate + ¬-[14C]tryptophan-tRNATrp
Geranyltranstransferase catalyzes the reaction (1–28)
geranyl diphosphate + isopentenyl diphosphate 1 The L-[14C]tryptophan-tRNATrp can be isolated from the
(E,E)-farnesyl diphosphate + pyrophosphate assay solution as a precipitate, free of L-[14C]tryptophan,
(1–25) by treatment with acid and filtration through filters of
glass fiber.28 The [14C]CO2 released from L-[1-14C]gluta-
A sample of protein to be assayed for this enzymatic mate by glutamate decarboxylase29 or from 4-hydrox-
activity can be mixed with geranyl diphosphate and yphenyl[1-14C]pyruvate by 4-hydroxyphenylpyruvate
[1-14C]isopentenyl diphosphate and incubated for a set dioxygenase30 can be released as a gas from the assay
time. The reaction can then be terminated by adding solutions by treatment with acid and collected in a sepa-
alkaline phosphatase to hydrolyze rapidly the various rate well containing a strong base. The enzyme encoded
diphosphates. After extraction, the resulting [1-14C]far- by the murG gene of Escherichia coli catalyzes the addi-
nesol and [1-14C]isopentenol in each sample can be sep- tion of the N-acetylglucosamine from UDP-N-acetylglu-
arated on small plates by thin-layer chromatography and cosamine to the 4¢ position of the muramoyl group in
separately quantified.24 That the product was entirely the 1¢-O-b-[3(R)-3,7-dimethylhept-6-enyl]-1¢-diphospho-
expected (E,E) isomer of [1-14C]farnesol was demon- 2¢-N-acetylmuramoyl-L-alanyl-D-g-glutamyl-6-carboxy-
strated by gas–liquid chromatography. L-lysyl-D-alanyl-D-alanine. A derivative of the hep-
Cyclosporin synthase catalyzes the reaction tenyldiphospho-N-acetylmuramoyl pentapeptide to
which a molecule of biotin has been covalently attached
¬-glycine + 4 ¬-leucine + 2 ¬-valine + ¬-alanine + can be used in an assay for this enzyme31 along with
∂-alanine + (2S,3R,4R,6E)-2-amino- UDP-N-[14C]acetylglucosamine. The resulting biotiny-
3-hydroxy-4-methyl-6-octenoic acid + lated b(1,4)-N-[14C]acetylglucosaminylheptenyldiphos-
¬-2-aminobutanoic acid + 11 MgATP + pho-N-acetylmura moyl pentapeptide can be separated
7 S-adenosylmethionine 1 cyclosporin + 11MgADP cleanly and quantitatively from the remaining UDP-
+ 11HOPO32- + 7 S-adenosylhomocysteine N-[14C]acetylglucosamine by adsorbing it to a solid phase
(1–26) on which has been attached covalently the protein
Assay 15
avidin, which binds the biotin in the product with high weakly bound agonists that dissociate rapidly after the
affinity. unbound agonist is removed, because during sedimenta-
A special case of assays that depend on transferring tion the concentration of unbound agonist does not
a product or a reactant to a separate phase are those used change so the amount of bound agonist does not either.
to monitor the binding of a small molecule to a protein. The small amount of unbound agonist in the pellet can
Certain proteins, known loosely as receptors, often do be estimated and a correction made to obtain an accu-
not catalyze a chemical reaction but respond to specific rate measurement of the bound agonist. With agonists
small molecules, referred to as agonists, by binding them and antagonists that bind tightly, the complex can be
and then undergoing a change in structure. Receptors are separated rapidly with little loss of bound radioactivity
assayed by their ability to bind either these agonists or on rapid chromatography by molecular exclusion on
similar molecules that also bind but do not elicit the small, disposable columns.38
response, referred to as antagonists. In such binding Binding assays have also been developed for pro-
assays, the receptor and a suitable radioactive agonist or teins that associate with specific nucleotide sequences in
antagonist are mixed together, the binding is allowed to DNA,39 such as promoters or other regulatory elements.
come to equilibrium, and the receptor–agonist or recep- A short fragment of DNA labeled with [32P]phosphate at
tor–antagonist complex is separated from unbound ago- one end and containing the sequence of interest is used
nist or antagonist, respectively. Because receptors are as a reagent. When such a fragment is digested with
usually proteins dissolved in membranes, the separation deoxyribonuclease I and the products are then separated
of bound from unbound ligand often takes advantage of by electrophoresis, a characteristic pattern of shorter
the large size of the fragments of membrane produced by segments of DNA of various lengths is obtained as a
homogenization, which can be separated from the rest of result of random cleavage by the nuclease of the phos-
the solution by filtration or centrifugation. After the sep- phodiesters along the double-stranded DNA. The pres-
aration, the amount of bound radioactivity is then deter- ence of a protein that binds specifically to a particular
mined by scintillation counting. nucleotide sequence in a short fragment of end-labeled
Chemically stable agonists or antagonists of high DNA results in prevention of cleavage of the DNA by the
affinity for a receptor are required to ensure that the nuclease at that site. The fragments resulting from cleav-
binding is at saturation so that all receptors are counted ages in this region disappear from the display, and this
and to prevent dissociation of receptor and agonist or footprint demonstrates that the DNA-binding protein is
receptor and antagonist during the separation of bound present. Such an assay can be used to determine the rel-
and free radioactivity. These reagents are often produced ative concentration of the DNA-binding protein by
by the synthesis of analogues of the natural compounds. examining the patterns produced as a series of dilutions
For example, [3H]dihydroalprenolol is a radioactive is performed in the solution of the protein added to the
synthetic compound that binds tightly (dissociation end-labeled DNA.
constant = 2 nM)32 to the b-adrenergic receptor, which An enzyme that catalyzes a physical or chemical
physiologically responds to epinephrine. Its binding has transformation of DNA can often be assayed by separat-
been used as an assay during the purification of this ing the product of the transformation from the reactant
receptor.33 Often a synthetic compound the binding of by electrophoresis. Deoxyribonucleic acid primase/heli-
which to a receptor is strong has been obtained during a case from T7 bacteriophage catalyzes the unwinding of
search for pharmaceutically useful agents. An example of double-stranded DNA. Double-stranded DNA, one of the
this kind of product is prazosin, which was developed as strands of which has been labeled with [32P]phosphate at
a drug specific for a1-adrenergic receptors and the bind- its 5¢ end, is mixed with a sample of protein to be assayed
ing of which (dissociation constant = 1 nM) could be for this activity, and after a few seconds the reaction is
used as an assay during the purification of the a1-adren- quenched with dodecyl sulfate. The 32P-labeled single-
ergic receptor.34 Often the naturally occurring agonist stranded DNA produced by the unwinding can be sepa-
has an affinity great enough that it can be used in an rated from the 32P-labeled double-stranded DNA by
assay during the purification of the receptor. For this electrophoresis.40
purpose, it is synthesized in a radioactive form. Examples Up to this point, with the exception of that for
would be the use of the binding of 125I-epidermal growth fumarate hydratase, the assays described have been dis-
factor35 (dissociation constant = 20 nM) and the binding continuous ones. The reaction is allowed to proceed for
of [1,2-3H2]progesterone36 (dissociation constant = 1 nM) a certain interval, it is quenched in some way, and the
as assays for their receptors. amount of product formed is then measured, usually by
In all binding assays for receptors, the difficulty is to dissecting the final, quenched solution. Because less
separate the complex between the receptor and the ago- manipulation is required and because the result is imme-
nist from the unbound agonist without losing the bound diate, continuous assays in which the product of the live
agonist through the dissociation of the complex. It is enzyme is monitored as it is formed are more con-
often possible to sediment the complex in a preparative venient. As in the assay for fumarate hydratase, the
ultracentrifuge.37 This strategy is particularly useful for continuous change in absorbance of a reactant or
16 Purification
product is often followed. The reaction catalyzed by octanoyl-SCoA 1 trans-2-octenoyl-SCoA + 2H+ + 2e-
2-methyleneglutarate mutase (1–31)
catalyzed by myosin subfragment 1.46 Both the reactant NADH + oxaloacetate 1 NAD+ + (S)-malate
2-amino-6-mercapto-7-methylpurine ribonucleoside (1–37)
and the enzyme purine-nucleoside phosphorylase are
added to the solution in addition to MgATP and the Phosphomevalonate kinase catalyzes the reaction
ATPase. The inorganic phosphate produced is immedi-
ately and continuously used by the phosphorylase to MgATP + (R)-5-phosphomevalonate 1
cleave the purine ribonucleoside to ribose-1-phosphate MgADP + (R)-5-diphosphomevalonate
and 2-amino-6-mercapto-7-methylpurine that, unlike (1–38)
the ribonucleoside, absorbs strongly at 360 nm (De360 =
11,000 M-1 cm-1). This coupled continuous assay is The MgADP can be monitored continuously as it is pro-
useful for monitoring any one of the many enzymes that duced by adding phosphoenolpyruvate, NADH, and an
have inorganic phosphate as one of their products. excess of both pyruvate kinase and L-lactate dehydroge-
Of all of the changes of absorbance that are nase:51
employed in continuous enzymatic assays, none is more
heavily used than the decrease in A340 of dihydronicoti- MgADP + phosphoenolpyruvate 1 MgATP + pyruvate
namide adenine dinucleotide (NADH; e 340 = 6220 M-1 (1–39)
cm-1)47 or its phosphate (NADPH; e 340 = 6100 M-1 cm-1),
when it is oxidized to nicotinamide adenine dinucleotide pyruvate + NADH 1 (S)-lactate + NAD+
(NAD+) or to its phosphate (NADP+), respectively, or the (1–40)
increase in A 340 that occurs in the reverse reaction. There
is a large class of enzymes, known as dehydrogenases,
This coupled assay is widely used for enzymes that
that use the oxidation–reduction pairs of either NAD+ and
produce MgADP.
NADH or NADP+ and NADPH, and they can be assayed
The 3-(imidazol-4-yl)-2-oxopropyl phosphate pro-
directly and continuously. For example, 3-hydroxyacyl-
duced by imidazoleglycerol-phosphate dehydratase
CoA dehydrogenase catalyzes the reaction
∂-erythro-1-(imidazol-4-yl)glycerol 3-phosphate 1
S-acetoacetylpantetheine + NADH 1
3-(imidazol-4-yl)-2-oxopropyl phosphate + H2O
(S)-S-(3-hydroxybutyryl)pantetheine + NAD+
(1–41)
(1–34)
This coupled assay takes advantage of the fact that the When 3-methyl-2-benzothiazolinonehydrazone has
equilibrium of the malate dehydrogenase reaction been added to the solution of the assay,21 the
(Equation 1–37) lies in the direction of NAD+ and L-dopaquinone reacts rapidly and quantitatively with it
(S)-malate so that if NAD+, malate, and malate dehydro- to produce a dark pink color (e = 29,000 M-1 cm-1), the
genase are mixed together, little oxaloacetate and NADH appearance of which can be monitored continuously. As
are formed. With this in mind, it can be seen that if is medium-chain acyl-CoA dehydrogenase (Equation
(S)-malate, NAD+, and excesses of citrate (si) synthase 1–31), (S)-pantolactone dehydrogenase, which catalyzes
and malate dehydrogenase are present during the the oxidation
progress of the reaction catalyzed by hydroxymethyl-
glutaryl-CoA lyase, the conversion of the acetyl-SCoA (S)-pantolactone 1 2-dehydropantolactone + 2H+ + 2e-
into citrate by citrate (si) synthase (1–51)
acetyl-SCoA + H2O + oxaloacetate 1 citrate + HSCoA is a member of a large class of enzymes that catalyze
(1–46) oxidation–reduction reactions and then transfer the
electrons involved either to or from small proteins or
consumes oxaloacetate and pulls the unfavorable equi- natural compounds the role of which is to receive or
librium of the malate dehydrogenase reaction in the provide electrons. These natural donors or acceptors can
direction of NADH production, and hence an increase in often be replaced by synthetic donors or acceptors.
the A340 of the solution is observed. (S)-Pantolactone dehydrogenase accepts phenazine
The two or more enzymatic steps in a coupled assay methosulfate as an oxidant in place of the acceptor it uses
are sometimes disconnected rather than allowed to pro- naturally, and reduced phenazine methosulfate readily
ceed simultaneously. An example would be an assay54 for oxidizes nitrotetrazolium blue. When both of these com-
ribose-phosphate diphosphokinase: pounds are present in the assay, the appearance of difor-
mazan, which is the product of the oxidation of
MgATP + ∂-ribose 5-phosphate 1 nitrotetrazolium blue, can be followed by its strong
MgAMP + 5-phospho-a-∂-ribose 1-diphosphate absorbance (e570 = 40,200 M -1 cm-1).55 It is possible to
(1–47) monitor the production of coenzyme A by citrate (si)
synthase (Equation 1–46) continuously56 by the addition
The reaction is quenched by boiling, and the amount of 5,5¢-dithiobis(2-nitrobenzoate). This reagent reacts
of 5-phospho-a-D-ribose 1-diphosphate that has with the thiol of the coenzyme A as it is formed to release
accumulated is determined by adding orotate, orotate the bright yellow 2-nitro-5-thiolatobenzoate dianion.
phosphoribosyltransferase, and orotidine-5¢-phosphate This assay is useful for monitoring any enzyme that pro-
decarboxylase: duces coenzyme A.
The colorimetric assays described so far are contin-
5-phospho-a-∂-ribose 1-diphosphate + orotate 1 uous assays in which the chemistry of the colorimetric
orotidine 5A-phosphate + pyrophosphate reagent is compatible with the aqueous solution and
(1–48) neutral pH required to avoid denaturation and inactiva-
tion of the protein being assayed. This is usually not the
orotidine 5A-phosphate 1 UMP + CO2 case with colorimetric reagents. In the instances in which
(1–49) it is not, the assay must be quenched after a convenient
interval before the colorimetry is performed. 2-Hydroxy-
The decrease in A295 due to the loss of orotate is propor- 6-ketonona-2,4-diene-1,9-dioic acid 5,6-hydrolase pro-
tional to the 5-phospho-a-D-ribose 1-diphosphate origi- duces succinate as one of its two products. The
nally present in the quenched samples. The production of succinate is coupled in the assay22 to the
decarboxylation has been incorporated in the assay to reaction catalyzed by succinate–CoA ligase (ADP-form-
draw the reactions to completion. ing):
Colorimetric assays are assays in which a reagent is
added that reacts chemically rather than enzymatically MgATP + succinate + coenzyme A 1
with a product of the enzymatic reaction being MgADP + succinyl-SCoA + HOPO32-
monitored to produce a change in absorbance, often (1–52)
observed visually as a dramatic change in the color of the
solution. Monophenol monooxygenase catalyzes the After 15 min, the solution is heated to 100 ∞C to quench
reaction the enzymatic reaction and the inorganic phosphate is
Assay 19
assayed by its reaction with Malachite green in the pres- diluted aliquots into individual oocytes from the frog
ence of citrate, which produces strong absorbance at Xenopus laevis and scoring the cells for the disappear-
600 nm. Phosphate produced during an enzymatic reac- ance of geminal vesicles.63 With the use of this assay, the
tion can also be determined colorimetrically by the addi- protein could be followed during a purification proce-
tion of ammonium molybdate in dilute sulfuric acid and dure63 and the remarkable fluctuation of its concentra-
a strong reductant, which together produce a blue color tion during the cell cycle could be documented.62
proportional in magnitude to the phosphate present.57 The success of a particular assay is usually judged
Glutamine–pyruvate transaminase will also catalyze the on the bases of its accuracy, sensitivity, and selectivity.
reaction For following the distribution of a protein during its
purification, the accuracy of an assay is not critical—all
¬-glutamine + glyoxylate 1 2-oxoglutarate + glycine that is needed is a way to decide whether or not it is
(1–53) present in a particular fraction—but for kinetic studies of
the reaction catalyzed by an enzyme, accuracy is often
The glycine produced and the L-glutamine remaining critical.64 If only small amounts of a protein are present,
will react with o-phthalaldehyde and a thiol, after the the sensitivity of an assay is also often critical.65 It is
enzymatic conversion has been terminated, to produce usually to increase the sensitivity of an assay that
complexes that absorb in the near ultraviolet.58 The radioactive reactants are used so that the small amounts
glycine complex, however, absorbs at a higher wave- of product produced or ligand bound can be identified.
length (l max = 330 nm). Galactonate dehydratase cat- Fluorescence is often used for the same purpose. For
alyzes the reaction example, continuous assays monitoring the absorbance
of NADH can detect its production at 10 nmol min-1
∂-galactonate 1 2-oxo-3-deoxy-∂-galactonate + H2O mL-1 but those monitoring its fluorescence can detect its
(1–54) production at 0.1 nmol min-1 mL-1. When following the
increase in a particular product produced from a partic-
After the reaction is quenched, the ketonic product is ular reactant or the binding of a particular ligand, an
reacted with semicarbazide59 to produce a semicar- assay is usually selective for a particular protein, but suf-
bazone that absorbs at 250 nm.59 Selenocysteine lyase ficient selectivity is often difficult to achieve. It was only
catalyzes the reaction when agonists and antagonists of high affinity and high
selectivity were synthesized that the various receptors for
selenocysteine + 2RSH 1 ¬-alanine + H2Se + RSSR epinephrine could be separately identified and purified.
(1–55) A suspension of cellular membranes displays a rather
high level of adenosine triphosphatase activity arising
where RSH is a mercaptan such as 2-mercaptoethanol. from a number of different proteins. It was only when that
After the enzymatic reaction is stopped, the H2Se can be portion of this activity for which sodium/potassium-
assayed colorimetrically by its reaction with lead acetate, exchanging ATPase was responsible could be clearly dis-
a reaction that yields a yellow color.60 tinguished66 that it became possible to purify the
Biological assays are assays in which the ability to enzyme.67
evoke a complex biological response by samples added
to cells or whole organisms is determined. For example, Suggested Reading
the assay for a protein referred to as the Hurler corrective
factor measures the ability of this protein to prevent the Winder, A.I. & Harris, H. (1991) New assays for the tyrosine hydrox-
accumulation of sulfated mucopolysaccharide in lyso- ylase and Dopa oxidase activities of tyrosinase, Eur J. Biochem.
198, 317–326.
somes of intact cells. It is this accumulation that causes
Hurler’s syndrome. Samples are added to a series of petri McClure, W.R. (1969) A kinetic analysis of coupled enzyme assays,
Biochemistry 8, 2782–2786.
dishes on which fibroblasts from a patient with Hurler’s
syndrome have been grown and [35S]SO4 is added. After
several days, the accumulation of 35S-sulfated Problem 1–3: Design a coupled assay, based on the
mucopolysaccharide is assessed by washing the cells and release of [14C]CO2, for the enzyme cis-aconitase, which
submitting them to scintillation counting.61 In this catalyzes the reaction
particular assay, the decrease in accumulation of
radioactivity was not directly proportional to the amount citrate 1 isocitrate
of sample added, and this problem was overcome by
constructing a dose–response curve. Problem 1–4: Design a coupled assay based on the
A biological assay was also used for the maturation- reduction of NAD+ for the enzyme fumarate hydratase.
promoting factor, which is a protein involved in control-
ling the cell cycle.62 Samples containing this protein Problem 1–5: Design a coupled assay for phospho-
could be assayed for its activity by injecting sequentially fructokinase, the enzyme that catalyzes the reaction
20 Purification
purification step total protein total activity specific activity yield of activity enrichment
(mg) ( mmol min-1) ( mmol min-1 mg-1) (%) (x-fold)
a
Beaded, cross-linked agarose (Figure 1–7) to which phenyl groups are attached in ether linkage. bFigure 1–3. cBeaded, cross-linked dextran for chromatography by molec-
ular exclusion. d(CH3)3N+CH2 – groups covalently linked to a beaded hydrophilic polyether. eBeaded, cross-linked agarose for chromatography by molecular exclusion.
being assayed. For an enzymatic assay, this value is the of reactant converted) minute-1 (milligram of protein)-1.
number of micromoles of reactant that would be con- The enrichment in the protein of interest during a par-
verted to product every minute if one milliliter of the ticular series of steps is the increase in its specific activ-
solution had been added to the assay. The total activity ity relative to its initial specific activity in the
present after any step in the purification is the activity homogenate.
milliliter-1 multiplied by the total number of milliliters in There is a conventional order in which the various
the pool of fractions. The yield of activity is the percent- steps of the purification are carried out. This order is usu-
age of the initial total activity remaining after each step. ally determined by the amount of material a certain pro-
Although the yield of activity usually decreases as the cedure can accommodate, because the amounts that
purification proceeds, sometimes it increases, for exam- must be processed, if the samples have been concen-
ple if an inhibitor of the activity is removed during a trated after each step, always decrease as the purification
step.72 proceeds because of the decrease in the total amount of
The concentration of protein, in units of mil- protein. Precipitations can be carried out on large vol-
ligrams milliliter-1, in the pool of fractions is also assayed. umes and are usually the first step in a purification. If
The most accurate method for making this determina- appropriate, selective adsorption is used in the next step
tion73 is quantitative amino acid analysis (Figure 1–3), because it is an efficient method for handling large
but this procedure is too tedious and time-consuming samples and the media are usually inexpensive.
for routine assays. The Biuret colorimetric assay74 is the Chromatography by ion exchange is usually used before
most accurate rapid method, but its low sensitivity often chromatography by adsorption because the media used
requires that an unreasonable portion of a precious for the former are usually less expensive and have higher
sample be sacrificed. The Lowry75 colorimetric method, capacity. Chromatography by molecular exclusion is
because of its sensitivity, is the most widely used method usually used as a late step because it is most successful
for determining the concentration of protein in a sample, when the samples, and hence the amount of protein, are
but it suffers from the drawbacks that many solutes other as small as possible.
than protein also produce color and that different pro- The purification of aryl-acylamidase
teins give different yields of color. For example, it was
shown that the concentration of protein in samples of N-acetyl-o-toluidine 1 acetate + o-toluidine
purified hydrogenase I from Clostridium pasteuranium, (1–56)
which had been accurately quantified by quantitative
amino acid analysis, was overestimated by the Lowry from Nocardia globerula (Table 1–1) illustrates this sys-
procedure by a factor of 1.37 ± 0.03.76 The least quantita- tematic strategy. In each step of the purification the spe-
tive but most convenient and rapid methods for assess- cific enzymatic activity increases as extraneous proteins
ing the concentration of protein are the colorimetric are separated from the desired protein, and the yield of
method of Bradford77 and the absorbance of the solution enzymatic activity after each step is high. Nevertheless,
at 280 nm. The specific activity of a pool of fractions because there are so many steps, the overall yield is only
from a step in the procedure for purifying the protein is 8%, but an 8% yield is high for the purification of a pro-
the amount of biological activity displayed by a mil- tein. In this example, chromatography by molecular
ligram of the proteins in that solution—the activity milli- exclusion on Superose is used in the last step when total
liter-1 divided by the amount of protein milliliter-1. For amounts of protein are small so that the samples can be
an enzyme, the units of specific activity are (micromoles concentrated to the small volumes required by this pro-
22 Purification
cedure. Chromatography by anion exchange (DEAE- removed from the solution whenever grams of anhy-
Sephacel), however, can be used early in the purification drous protein, !mp, are added to maintain constant
because large volumes at low concentration of elec- chemical potential. The usual reason83 given for the
trolyte can be passed through the ion-exchange observation of negative preferential solvation is that, in
medium to concentrate the protein on the top of the an aqueous solution, the layer of water surrounding the
column. The chromatography itself is then initiated by protein has properties distinct from those of the rest of
increasing the concentration of electrolyte. the water in the solution and a salting-out solute is pref-
The precipitation of proteins from an aqueous erentially excluded from that layer of solvation. The
solution that is effected by the addition of a high con- reason grams of solute must be removed to maintain a
centration of another solute has a long history. constant chemical potential is that water is removed
Originally, such precipitations were observed upon the from the bulk solution to form this layer of hydration and
addition of certain salts to solutions of proteins. This solute must be removed from the overall solution to keep
observation led to the terms salting out, to describe a its concentration the same in the bulk solution sur-
precipitation caused by a salt, and salting in, to rounding the hydrated protein.
describe the dissolution of a precipitate caused by a A positive value for the preferential solvation of a
salt. For example, sulfate ion salts out, and thiocyanate particular solute states that the grams of that solute in
ion and guanidinium ion salt in. A systematic study of the solution must be increased when the grams of pro-
the effect of salts on the solubility of proteins led to the tein are increased in order to maintain constant chemi-
Hofmeister series,78-80 an ordering of various ions on cal potential. Therefore, the solute prefers to interact
the basis of their ability to salt out or salt in.* Similar with the protein rather than with water; for example, it
effects, however, are observed with nonionic solutes as has a higher solubility in the layer of water around the
well, somewhat confounding the words chosen. Urea protein or it simply binds to the protein. Positive prefer-
salts in, and poly(ethylene glycol) salts out. ential solvations mean that the protein becomes more
It has been shown that these capacities of solutes, soluble as the solute is added to the solution. Such salt-
both ionic and nonionic, to affect the solubility of a pro- ing-in is displayed by urea, potassium thiocyanate, and
tein can be ascribed to differences in preferential solva- guanidinium chloride. At concentrations of 1 M, the
tion.83-85 The preferential solvation of a particular value of the preferential solvation of bovine serum albu-
protein by a particular solute can be defined by the equa- min by potassium thiocyanate83 is +0.07 mL g-1 and that
tion of bovine serum albumin by guanidinium chloride84 is
+0.26 mL g-1. The ability of urea to increase the solubility
of proteins is frequently used during their purification.
preferential solvation ∫
( )
!ms
!mp
T, mH2O, ms
(1–57)
For example, the proteins that form intermediate fila-
ments, which are naturally occurring, insoluble poly-
meric aggregates of protein, are purified from the
solution that is obtained by dissolving the filaments in
gs
7 M urea.86 The advantage of using urea is that because it
is a neutral molecule, it has no effect on chromatography
where ms is the grams of that solute in the solution for by ion exchange.
every gram of water, mp is grams of that protein in the It is for precipitation, however, that preferential sol-
solution for every gram of water, gs is the concentration vation is usually exploited during purification of a pro-
of the solute in the solution in grams milliliter-1, T is the tein. Assume that a solution is at saturation in the
temperature, and mH2O and ms indicate that both the concentration of a particular protein; in other words, the
chemical potential of the water and the chemical poten- chemical potential of that protein in the saturated solu-
tial of the solute must remain constant as the grams of tion is equal to the chemical potential of that protein in
protein, !mp, change. Solutes that display negative its precipitate. If a solute with negative value of preferen-
values of preferential solvation salt out and solutes that tial solvation is added to the saturated solution of pro-
display positive values of preferential solvation salt in, tein, some of the protein must precipitate to maintain a
and the magnitude of their values of preferential solva- constant chemical potential. In reality, what happens is
tion correlates with the potency of their ability to salt out that as more and more of the solute is added, the chem-
or salt in. ical potential of the protein decreases until it equals that
A negative value for preferential solvation, indicat- of its precipitate and then it begins to precipitate. The
ing salting out, states that grams of solute, !ms, must be more negative the value of preferential solvation for the
solute being added, the more rapidly does the concen-
tration of protein reach and then surpass saturation.
* The effects of salts on many properties of proteins, such as their At 1 M concentration, the value for the preferential
enzymatic activity81 and their specific associations with each solvation of bovine serum albumin by sodium sulfate83 is
other,82 are often governed by the Hofmeister series.80 –0.52 mL g-1. As a comparison, the preferential solvation
Purification of a Protein 23
of bovine serum albumin by NaCl, a salt that shows weak dialysis, the protein will often precipitate, while other
salting-out, is –0.26 mL g-1 at a concentration of 1 M. proteins, which have different isoelectric points, do not.
Although sodium sulfate has been used to precipitate Such an isoelectric precipitation has been used in the
proteins during purifications, ammonium sulfate is pre- purification of aspartate carbamoyltransferase90 and in
ferred because it is more soluble than sodium sulfate and the purification of fibrinogen.91 One traditional method
it is also lethal to fungi or bacteria that would otherwise of concentrating protein and removing it from other
be happy to use the precipitated protein as a source of molecules in a homogenate is to precipitate it by adding
food. A protein as a precipitate in a concentrated solu- acetone.92 The resulting dry acetone powder can be
tion of ammonium sulfate at 4 ∞C is usually stable for extracted with a buffered aqueous solution, and if one is
decades. Traditionally, the concentration of ammonium lucky, the protein of interest will dissolve. Because their
sulfate used to precipitate a protein is expressed as the DNA is not contained in nuclei, when bacterial cells are
percentage that the final concentration in the solution is fragmented by homogenization, the DNA is released as
of the concentration of ammonium sulfate at saturation an intractable, gelatinous mass. Before the solution can
(0.52 g mL-1 at 4 ∞C). be processed further, the DNA must be precipitated with
Ammonium sulfate at high concentrations causes streptomycin sulfate93 or the DNA must be hydrolyzed
most proteins to precipitate from solution. In the exam- to small fragments that are not gelatinous by adding
ple of aryl-acylamidase (Table 1–1), the enzyme was pre- nucleases to the solution.
cipitated between 25% and 60% ammonium sulfate. No Isoelectric precipitation and precipitations with
purification was observed in this instance; the step was poly(ethylene glycol) and ammonium sulfate are
used to concentrate the protein and rapidly remove it reversible, and the protein is readily redissolved by
from all of the other metabolites in the clarified decreasing the concentration of precipitant or changing
homogenate. Usually, however, an attempt is made to the pH. In contrast, precipitation by acid or heat is usu-
obtain some purification. Each protein precipitates in a ally not reversible. In these situations advantage is taken
given range of ammonium sulfate concentration. of the ability of the protein of interest to remain in solu-
Extraneous proteins that precipitate at lower concentra- tion while other proteins precipitate irreversibly. An
tions can be removed first, and then the protein being example of the use of precipitation with heat occurs in
purified can be precipitated by raising the concentration the purification of 6-phosphofructokinase, and during
of ammonium sulfate and thus be separated from pro- this step a 2.5-fold increase in specific activity was
teins that remain soluble at the higher concentration. For recorded.94 These techniques are quite harsh and can
example, formate–tetrahydrofolate ligase was purified lead to degradation of the protein being purified by
10-fold by bringing the solution of ammonium sulfate to endopeptidases or to chemical alterations such as
50% of saturation to precipitate other proteins, then deamidation of glutamine and asparagine side chains95
increasing the concentration of ammonium sulfate to even though little loss of enzymatic activity is recorded.
70% of saturation to precipitate the synthase while leav- Proteins are separated chromatographically by
ing yet other proteins in the supernatant.68 Purification exploiting differences among them in particular proper-
by ammonium sulfate precipitation is usually not so ties. Different proteins have different sizes and shapes
large as in this example, but the procedure is a mild one, and can be separated on chromatography by molecular
usually of high yield. Precipitation with ammonium sul- exclusion. Different proteins also have different charges
fate can be used to concentrate rapidly and gently a solu- at a given pH and can be separated on chromatography
tion of protein between later steps in a purification by ion exchange. In the case of chromatography by ion
(Table 1–1). exchange, a pH is chosen at which the protein to be puri-
Poly(ethylene glycol) has also been used to precip- fied has a net charge opposite to the fixed charge on the
itate proteins selectively and reversibly. It is easy to imag- chromatographic medium so that it will participate in
ine why a large hydrophilic polymer such as ion exchange with the stationary phase as the chro-
poly(ethylene glycol) would be excluded from the layer of matography progresses. The elution of bound protein is
water surrounding a protein and thus have a negative usually performed with a gradient of increasing concen-
value for preferential solvation. Tryptophan 5-mono- tration of a simple monovalent salt such as KCl. If a gra-
oxygenase can be purified 5-fold after precipitation with dient of pH is used, the change in pH is usually in the
poly(ethylene glycol) and redissolution in aqueous direction that would decrease the magnitude of the net
buffer.87 Trimethylamine oxide, a naturally occurring charge on the protein. Because the value of a ¢ is chang-
solute in the serum of fish,88 is also able to precipitate ing continuously, the use of a gradient always produces
proteins.89 chromatographic separations of much lower resolution
Several other types of precipitation are used during than those performed isocratically without a gradient.
the purification of a protein. At the pH at which a given The advantage of a gradient, however, is that it bypasses
protein bears no net charge, known as its isoelectric pH, the problem of finding conditions of pH and ionic
it is least soluble in water. If the pH is adjusted to this strength at which the value of a ¢ for the protein being
value and the salts in the solution are removed by purified is in a usable range. Because molecules of pro-
24 Purification
tein are multivalent ions, their values of a ¢ change rap- Each of the three enzymes migrates with a characteristic
idly as the ionic strength is varied, and such a search is elution volume, Ve, and the glyceraldehyde-3-phosphate
often tedious and fruitless. dehydrogenase is cleanly separated from the other two
Glyceraldehyde-3-phosphate dehydrogenase (GDH), enzymes by molecular exclusion chromatography on a
phosphoglycerate mutase (PGM), and phosphoglycerate column of Sephadex G-150. The fractions containing the
kinase (PGK) in the ammonium sulfate precipitate from activities of phosphoglycerate mutase and phosphoglyc-
a clarified homogenate could be separated on chro- erate kinase were combined and submitted directly to
matography by molecular exclusion (Figure 1–5A).96 ion-exchange chromatography on DEAE-cellulose devel-
A
60 4
Absorbance at 280 nm
(mmol min –1 mL –1)
Enzymatic activity
45 3
30 PGM
2
15 1
GDH PGK
3 4 5 6
Volume through column (L)
B
160 1.6
Absorbance at 280 nm
(mmol min –1 mL –1)
Enzymatic activity
120 1.2
80 0.8
40 0.4
PGM PGK
Figure 1–5: Chromatography by molecular exclusion (A) and chromatography by anion exchange (B) of proteins in a homogenate from the
bacterium E. coli.96 The clarified homogenate was submitted to precipitation with ammonium sulfate (30–45%). The precipitate (7.2 g of pro-
tein) was redissolved in a minimum volume (120 mL) of aqueous buffer and submitted to zonal chromatography on a column (10 cm ¥
120 cm) of cross-linked dextran (Sephadex G-150). (A) Fractions were assayed for protein (absorbance at 280 nm) and enzymatic activity
(micromoles minute-1 milliliter-1) of glyceraldehyde-3-phosphate dehydrogenase (GDH), phosphoglycerate mutase (PGM), and phospho-
glycerate kinase (PGK), respectively. The proteins contained in the fractions from 4.9 to 5.8 L in the chromatogram in panel A were combined
and submitted to chromatography by anion exchange. (B) The ionic strength of the buffer used for the chromatography by molecular exclu-
sion was low enough that the sample (900 mL) could be passed directly through the column (2.2 cm ¥ 25 cm) of diethylaminoethyl- (DEAE-)
cellulose while the proteins gathered at the top of the medium for ion exchange. Chromatography was then initiated with a gradient of NaCl
(0–0.15 M in the same buffer at pH 8). Fractions were again assayed for protein and enzymatic activity. Reprinted with permission from ref
96. Copyright 1971 Journal of Biological Chemistry.
Purification of a Protein 25
oped with a gradient of sodium chloride (Figure 1–5B). In observed on chromatography by ion exchange or molec-
this step the phosphoglycerate mutase was cleanly sepa- ular exclusion.
rated from the phosphoglycerate kinase. These examples Traditionally, hydroxylapatite, because of its physi-
illustrate the use of column chromatography, monitored cal properties, has been used mainly for selective adsorp-
by enzymatic assay, to separate proteins. tion of proteins, but recently much more effective,
An example of the use of a sequence of steps of beaded forms of hydroxylapatite that can be used for
column chromatography to purify a particular protein is chromatography by adsorption have become available.
found in the purification of a-ketoisocaproate oxygenase Nitric-oxide reductase could be purified 100-fold by
from rat liver (Figure 1–6).97 Aside from an initial ammo- chromatography on one of these media,100 and acetyl-
nium sulfate precipitation, only three consecutive steps, CoA hydrolase, 60-fold.101 The media most widely used,
chromatography by ion exchange (Figure 1–6A), chro- however, for chromatography by adsorption (Table 1–2)
matography by adsorption (Figure 1–6B), and chro- are produced by synthetically coupling defined organic
matography by molecular exclusion (Figure 1–6C), were functional groups or molecules or chelated metal ions105
necessary to purify the enzyme to homogeneity. to beaded hydrophilic matrices, usually cross-linked
Because the resolution of chromatography by ion agarose or polymethacrylate. Although the intention in
exchange run with a gradient and the resolution of chro- the syntheses in which organic molecules are covalently
matography by molecular exclusion are not great attached to the polymer has often been to produce a
(Figures 1–5 and 1–6), the increase in specific activity chromatographic medium with a specific affinity for one
seen in each of the chromatographic steps is usually particular protein or class of proteins, most of these
around 5-fold. Extreme examples of purification, such as products have turned out to be simple adsorption media
the 100-fold purification of 3-deoxy-7-phosphoheptu- with useful and unexpected affinities for proteins in gen-
lonate synthase on phosphocellulose98 or the 100-fold eral.106 Ironically, this makes them more valuable than
purification of methylcrotonyl-CoA carboxylase on they were originally intended to be.
DEAE-cellulose,99 are rare. For reasons that are not obvi- Successful purification of a minor component from
ous, however, it has recently been discovered that the a complex mixture requires that the set of distribution
magnitude of the purification on chromatography by coefficients, a ¢i, for the components present assume a
adsorption is often significantly greater than that new and randomly permuted sequence of magnitudes as
each new chromatographic medium is used. If it were
possible to do so, a series of chromatographic steps
Absorbance at 280 nm
100 200
B Figure 1–6: Column chromatography of a-ketoisocaproate oxyge-
Protein (mg mL –1)
coproporphyrinogen oxidase102
step 1 Cibacron blue increasing [sodium cholate]c 80
step 2 phenyl groupb increasing [Tween 80]c 2.5
isocitrate dehydrogenase (NADP+)19
step 1 reactive red increasing [NaCl] 20
step 2 reactive red increasing [NADP]d 15
step 3 phenyl groupb decreasing [(NH4)2SO4] 2
formate–tetrahydrofolate ligase68 Matrex green increasing [KCl] 5
glutamyl-tRNA reductase103 phenyl groupb decreasing [KCl] 9
aminodeoxychorismate lyase104 reactive yellow increasing pH 260
a
In all cases cited, cross-linked agarose (Figure 1–7) was used as the polymeric support to which the organic molecules were covalently attached. bIn ether linkage to agarose.
c
These solutes are detergents. dAffinity elution. e Enrichment during each step.
ever, that do not rely on chromatography and that often idea in affinity adsorption is to synthesize a stationary
produce even greater degrees of purification. They are phase to which has been covalently attached a chemical
based on the selective elution from or selective adsorp- compound that binds specifically and with high affinity
tion to a stationary phase and can be referred to as affin- to the protein being purified. The compound syntheti-
ity elution or affinity adsorption, respectively. cally attached to the stationary phase is usually an analog
When a protein is purified by affinity elution, it is or a derivative of a reactant or product in the reaction
first adsorbed to a stationary phase, such as a chromato- catalyzed by an enzyme, an inhibitor of the enzyme, an
graphic medium; and after all unabsorbed proteins have allosteric activator of the enzyme, or an agonist or antag-
been washed away, a compound that binds with high onist of a receptor. This compound, when covalently
specificity to the protein of interest and leads to its elu- attached to the stationary phase, is referred to as an
tion is added (as for example in the second step of the immobilized ligand for the protein. Cross-linked
purification of isocitrate dehydrogenase, Table 1–2). The agarose110,111 is the stationary phase to which the immo-
presence of this compound can sometimes cause only bilized ligand is usually attached.
that protein to which it binds to elute from the stationary One of the original examples of this technique112
phase. For example, when (carboxymethyl)cellulose is can serve to illustrate the strategy. Micrococcal nuclease
added to a crude, clarified homogenate from liver at pH 6, is an enzyme from Staphylococcus aureus that can
all of the fructose 1,6-bisphosphatase is adsorbed along hydrolyze the phosphodiesters of either single-stranded
with many other proteins. When the (carboxymethyl) RNA or double-stranded DNA to produce as its final
cellulose is collected, washed well with 5 mM sodium products 3¢-phosphomononucleotides or dinucleotides.
malonate, pH 6, and then rinsed with 0.06 mM fructose Thymidine 3¢,5¢-bisphosphate is a specific inhibitor of
1,6-bisphosphate in 5 mM sodium malonate, pH 6, only the nuclease that binds to it tightly. A p-aminophenyl
the fructose 1,6-bisphosphatase elutes in the rinse. In one derivative of this inhibitor was synthesized and attached
step the enzyme can be purified 400-fold, to homogene- covalently to agarose through its aniline nitrogen to pro-
ity.107 Transketolase, after initial purification by DEAE- duce a stationary phase displaying the thymidine
cellulose from homogenates of human leukocytes, will 3¢,5¢-bisphosphate (Figure 1–7).112 When a crude super-
adsorb tightly to the top of a small column (16 mL) of natant containing micrococcal nuclease was passed over
(carboxymethyl)cellulose when a dilute solution (90 mL) this affinity medium, none of the nuclease emerged but
of the protein dissolved at low ionic strength is passed almost all of the protein did. The nuclease could then be
over the column. After the column has been washed eluted nonspecifically with dilute acetic acid in greater
extensively, the transketolase is eluted with buffer to than 90% yield. It was completely purified in this one
which xylulose 5-phosphate (0.2 mM) and ribose 5-phos- step.
phate (0.3 mM) have been added. The transketolase is Since this early report, the technical aspects of
purified 40-fold to homogeneity.108 Protein kinase N, affinity adsorption have been exhaustively explored. The
bound to a methylenesulfonate cation-exchange main difficulty to which many of these investigations
medium, can be eluted specifically with ATP (0.1 mM) for have been directed is positioning the ligand far enough
a purification of 2500-fold.109 from the polymeric matrix of the agarose to minimize
Although it requires much more effort, affinity steric hindrance and thus interact effectively with the
adsorption is more widely used than affinity elution and protein.113,114 This problem may explain many of the
has been successful in a number of instances. The basic failed attempts to use the technique of affinity adsorp-
Purification of a Protein 27
O
H CH 3
N
- O OH O OH
O O O N OH OH
P O O O
O O O
O OH O
O O n
OH OH
O O
P + BrCN agarose
-
O O
NH 2 - HBr
+
O O C N O O C N
O C N O OH O
O O O O
O O O C N n
OH
O
H CH 3
N
-
O O O N
P O
O O-
O
P O
NH
-
O O
N O
O Figure 1–7: Synthetic strategy used to couple deoxythymidine
H
OH O 3¢,5¢-bisphosphate covalently to agarose activated with cyanogen
... O
O O ... bromide (BrCN).112 The cyanylation occurs randomly on the
O agarose.
OH
tion. Several long, hydrophilic connecting links, usually ever, often requires a greater investment than assem-
referred to as spacers, that serve the purpose of the bling a sequence of simple chromatographic steps and
p-aminophenyl in the original example (Figure 1–7) have has a higher risk of failure. Often the affinity adsorbent
been developed to solve this problem. Often a long produces only a modest purification of 10-fold or less
hydrophilic spacer is created during the set of reactions under conditions that suggest that the process occurring
used to attach the ligand to the solid phase (Figure 1–8).34 is either nonspecific ion exchange139 or simple adsorp-
Many different strategies for attaching ligands of various tion140 or affinity elution from a nonspecific stationary
structures to the stationary phase have been developed. phase.141 Often the desired protein adsorbs so tightly to
The cases in which affinity adsorption has been the affinity medium that it can be eluted only in low
successful in the purification of proteins provide a yield.142
provocative collection of examples (Table 1–3). Because The central, defining feature of affinity adsorption
purifications of 100-fold in one step are not unusual, this is the design of the stationary phase, but the conditions
approach has obvious advantages over the traditional used for elution of the bound protein are also character-
strategy that combines chromatography by ion istic. Often they are merely the application of a mobile
exchange, chromatography by molecular exclusion phase of extreme pH or ionic strength such as in the orig-
(Table 1–1), and chromatography by adsorption (Table inal example of micrococcal nuclease. The ideal
1–2), where several steps are required to achieve the approach, however, is to combine affinity adsorption
same degree of purification. Affinity adsorption, how- with affinity elution to gain an advantage in each of the
28 Purification
2 50
Protein (mg mL –1)
H
N O
agarose
NH NH
100 200
B
O
0.15 Wash Elute 30
N N OCH 3
H N N
O N
OCH 3
0.5 10
NH 2
+
O
20 40
Fraction number
(CH 3)2N(CH 2)3N NC 2H 5
H H Figure 1–9: Affinity adsorption and affinity elution used in combi-
nation to purify 5-formyltetrahydrofolate cyclo-ligase.127 (A) A
Figure 1–8: Use of a hydrophilic spacer to connect a specific ligand crude extract (7.3 g of protein in 2 L) from the bacterium
to a polymeric support.34 N,N-Di-(3-aminopropyl)amine was Lactobacillus casei was passed over a column (4 cm ¥ 18 cm) of
attached to agarose by activating the polysaccharide with agarose to which 5-formyltetrahydropteroylglutamate had been
cyanogen bromide (Figure 1–7). 1-(4-Amino-6,7-dimethoxy- attached. After the affinity adsorbent had been washed with 2 L of
2-quinazolinyl)piperazine, which is a portion of prazosin, a specific buffer until no more protein emerged, the bound enzyme was
antagonist for a1-adrenergic receptors, was succinylated and then eluted with a solution of 5-formyltetrahydrofolate, a reactant in the
attached to the aliphatic amine by activation of the resulting enzymatic reaction. (B) A purified fraction (0.7 mg of protein in
carboxylic acid with 1-[(N,N-dimethylamino)propyl]-3-ethyl- 40 mL from a later step in the procedure) was passed over a column
carbodiimide. This produced a spacer of 14 atoms connecting an (2 cm ¥ 13 cm) of agarose to which ATP had been covalently
oxygen of the polysaccharide with the nitrogen of the ligand. The attached. After the affinity absorbent had been washed with
spacer is hydrophilic by virtue of the O-alkyl-N-alkyl urea, the 100 mL of buffer, the bound enzyme was eluted with a solution of
amine, and the two N-alkyl amides. This affinity medium was used ATP, another reactant in the enzymatic reaction. Protein con-
to purify a1-adrenergic receptor. centration (milligrams milliliter-1; 2) and enzymatic activity
(nanomoles minute-1 milliliter-1; 3) were measured for each frac-
tion collected from each column. Reprinted with permission from
ref 127. Copyright 1984 Journal of Biological Chemistry.
Table 1–3: Examples of the Use of Affinity Adsorption in the Purification of Proteins
Purification of a Protein
adenylate cyclase133 succinylated deacetylforskolin succinyl carboxylate forskolin 2000
a subunit of GTP-binding regulatory protein134 bg subunits of the complete protein thiols of cysteines AlF4-
myristoylated alanine-rich c-kinase substrate135 calmodulin lysylamines NaCl, EGTA 100
[heparan sulfate]-gluosamine N-sulfotransferase136 adenosine 3¢,5¢-bisphosphate adenosine N6 adenosine 3¢,5¢-bisphosphate 40
malate dehydrogenase (oxaloacetate-decarboxylating) adenosine 2¢,5¢-bisphosphate adenosine N6 NADP+ 50
(NADP+)137
binding protein for complement component C3138 complement component C3 thiol of a cysteine 20% ethanol
29
30 Purification
containing this specific sequence. An extract of nuclei were combined, activity returned. The single proteins in
from HeLa cells was purified on chromatography by each of these two fractions were then purified separately,
molecular exclusion, chromatography by adsorption on in each case by use of assays supplemented with the
heparin bound to agarose, chromatography by cation other two necessary components. In the end, the three
exchange on sulfated dextran, and affinity adsorption on distinct proteins that together perform the reaction were
agarose to which the specific DNA was attached. In the each purified to homogeneity.148 Only when all three are
last step, the protein was eluted with a high concentra- mixed together is enzymatic activity observed.
tion (0.5 M) of KCl. The first three steps produced 100- The goal of purification is to obtain the protein of
fold purification with a 20% yield, and the last step alone interest isolated from all of the other proteins that were
produced a further 100-fold purification with a 50% yield. originally in the homogenate derived from the biological
The inhibition of the DNA polymerase from herpes specimen. That this has been achieved is often suggested
simplex virus by the antiviral agent 9-[O-(2-hydrox- by the coelution of the protein present and the biologi-
yethyl) hydroxymethyl]guanosine (acyclovir) results cal or enzymatic activity in the last chromatographic step
from the formation of a tight complex between the poly- of the purification (Figures 1–6 and 1–10).149 This is only
merase, a duplex of DNA containing a template and a an indication of purity, and the absolute purity of the
primer into which 9-[O-(2-hydroxyethyl)hydroxymethyl] final preparation must always be demonstrated inde-
guanosine has been incorporated at the 3¢ end of the pendently by electrophoresis.
primer of DNA as it is being elongated, and the triphos-
phate of the next nucleotide encoded by the template.
ant with its concentration. The preferential solvation of Alprenolol was covalently attached to agarose to
bovine serum albumin by lactose85 is –0.35 mL g-1. What produce an affinity adsorbent for the purification of
molar concentration of lactose should have an effect on b-adrenergic receptor. The final concentration of the
the solubility of bovine serum albumin equal to a 1 M alprenolol covalently bound to the solid phase, [AlB]¢TOT,
concentration of Na2SO4? What is the percent saturation was 2 mM in units of millimoles (liter of bed)-1. All molar
of a 1 M solution of (NH4)2SO4? Why isn’t lactose used to concentrations designated with primes are in moles (liter
precipitate protein? of bed)-1. Assume that the dissociation constant between
covalently bound alprenolol and b-adrenergic receptor is
Problem 1–9: Calculate the number of theoretical plates the same as that for unbound alprenolol (8 nM).
in the column used for the separation displayed in Figure Consider what happens when a solution containing
1–5A from the width of the peak of phosphoglycerate b-adrenergic receptor is added to a chromatographic
mutase. Use the number of theoretical plates to calculate column containing the affinity adsorbent. If, as is rea-
the width the peak of glyceraldehyde-3-phosphate dehy- sonable, [bAR]¢ << [AlB]¢TOT, where [AlB]¢TOT is the molar
drogenase should have. Why might its peak be wider than concentration of covalently bound alprenolol (2 mM),
the width calculated? then [AlB]¢TOT = [AlB]¢, the molar concentration of cova-
lently attached alprenolol to which b-adrenergic recep-
Problem 1–10: Calculate the number of theoretical tor is not bound; and from the equation for Kd
plates in the column used in Figure 1–6C.
[ AlB ]¢TOT [ AlB · b AR ]¢
Problem 1–11: The table below describes the purifica- = @ a ¢b AR
tion of glutamyl-tRNA reductase. Calculate, in the proper Kd [ b AR ]¢
units, the total enzymatic activity, the yield, the total pro-
tein, the specific activity, and the cumulative enrichment where a ¢ is the partition coefficient for b-adrenergic
at each step. receptor between the mobile phase, bAR, and its com-
plex with alprenolol covalently bound to the stationary
Problem 1–12: Alprenolol (Al) binds tightly and specifi- phase, AlB · bAR.
cally to b-adrenergic receptor (bAR), which is a protein in
the plasma membranes of certain animal cells. The dis- (A) If the chromatographic column has a volume of
sociation constant for this binding is the equilibrium mobile phase, V0, of 2.0 mL, calculate the elution
constant defined by the equation volume, Ve,bAR, of b-adrenergic receptor.
One way to decrease the elution volume of b-adrenergic
[ Al ][ b AR ] receptor would be to add free alprenolol to the mobile
Kd = phase at a particular molar concentration [AlM]¢ in moles
[ Al · b AR ]
(liter of bed)-1. Again, if [bAR]¢ << [AlB]¢TOT, then
purification step volume of final pool (mL) enzymatic activity (mmol min-1 mL-1) protein concentration (mg mL-1)
a
Figure 1–3.b Agarose (Figure 1–7) to which phenyl groups are attached through ether linkage. c Agarose to which Cibacron blue has been covalently attached. dBeaded
hydrophilic polyether resin to which methylsulfonate groups are covalently attached. eInterfering enzymatic activities prohibited assay.
32 Purification
of water (Kw = [H+][OH-]) and the definition of pH ([H+] = It is its point of zero net proton charge that is rou-
10-pH) to give tinely estimated from the sequence of a protein. If it is
assumed that the protein bears no unknown tightly
K w – 10– 2 pHisoionic,i bound ions or coenzymes and has no unknown post-
Ω H,isoionic,i = (1–59) translational modifications and if it is assumed that each
[ protein i ] 10– pHisoionic,i side chain of each type of amino acid has its ideal, unper-
turbed value of pKa (Table 2–2), then it is possible to esti-
Equation 1–59 can be used to calculate the mean net mate the point of zero net proton charge of the protein
proton charge number on the protein i at its isoionic from its composition of amino acids and any known
point, and this provides a measurement of the absolute tightly bound ions, coenzymes, and posttranslational
mean net proton charge number on the protein i at one modifications (Problem 1–15). Such estimates of points
pH in the absence of electrolytes. It is this direct meas- of zero net proton charge are commonly performed by
urement of the mean number of charges on the protein simple algorithms available at data banks on the internet.
at a given pH in the absence of electrolyte that is usually Such calculations are usually rather inaccurate estimates
used to anchor the titration curve of a protein (Figure of the actual points of zero net proton charge because the
1–11).151 values for the pKa of the amino acids are seldom the same
The point of zero net proton charge is the pH at in the native protein as their ideal, unperturbed values,
which the mean net proton charge number on protein i which are accurate estimates only when the amino acid
is zero. The isoionic point, pHisoionic,i, is formally distin- is in an unfolded polypeptide and does not have an imme-
guished from the point of zero net proton charge because diate neighbor with an ionized side chain. For example,
at the isoionic point the protein does bear a mean net Glutamate 89 of b-lactoglobulin is buried within the pro-
proton charge. It is clear from Equation 1–58, however, tein at low pH and does not titrate with the rest of the glu-
that if [protein i] is significant and pHisoionic,i is between tamates but becomes exposed during a change that
pH 5 and 9, there is little difference between the isoionic occurs in the structure of the protein above pH 7 and
point and the point of zero net proton charge. This is not titrates as the structural change progresses.152 In addition,
the case, however, for acidic or basic proteins. there often are unknown tightly bound ions or post-
translational modifications. Finally, the point of zero net
proton charge is often between pH 6 and 8, where small
shifts in the titration curve lead to large changes in the
point of zero net proton charge (Figure 1–11). The result
+15 of one of these algorithmic estimates of the point of zero
net proton charge is usually referred to, erroneously, as
the isoelectric point of the protein.
+10 The isoelectric point of protein i, pIi, is the pH at
which, under a given set of conditions, the mean net
Z H,RNase
Isoionic point from the mean net molecular charge number on pro-
0 Zero net proton
tein i, 6i, because proteins have a tendency to bind
charge weakly the ions of electrolytes in the solution, even ones
as simple as halides153 and alkali metal ions.154 This bind-
–5 ing occurs even at the point of zero net proton charge
and is reflected as a decrease or increase in pHisoionic,i as a
–10 neutral salt is added to an isoionic solution.150 For exam-
2 4 ple, if protein i in an isoionic solution binds more of the
6 8 10
pH anions than the cations of a neutral salt that has been
added, the increase in its negative charge will indirectly
Figure 1–11: Net mean proton charge number on ribonuclease as cause it to take up more protons, increasing pHisoionic,i.
a function of pH. Solutions of ribonuclease at ionic strengths The reverse effect on the isoionic point is observed when
0.01 M (2), 0.03 M (4), and 0.15 M (3), produced with KCl, were
titrated with either KOH or HCl.151 The changes in pH as a function the cations are preferentially bound.
of the equivalents of acid or base added (mole of protein)-1 were This binding of small simple ions, such as halides
recorded. The isoionic point was determined by passing a solution and alkali metal cations, to proteins results from chela-
of the protein over a mixed-bed medium for ion exchange to tion. Two or more fixed charges or dipoles on the protein,
remove all electrolytes except the protein, H+, and OH-. The point of opposite sign to the bound ion, have to be properly ori-
of zero net proton charge was then calculated with Equation 1–59.
The absolute mean net proton charge number, 6H,RNase, is pre- ented to perform such chelation. Consequently, the
sented as a function of pH. Reprinted with permission from ref 151. number of each type of ion bound at the isoionic point is
Copyright 1956 American Chemical Society. a unique and unpredictable property of each protein. In
34 Purification
deoxyhemoglobin, a site at which chloride binds to the Problem 1–13: The commercially available anion-
protein has been identified, and it sits between two func- exchange medium DEAE-Bio-Gel A is a beaded polymer
tional groups, an ammonium cation of the amino termi- formed from the naturally occurring polysaccharide
nus and a guanidinium cation of an arginine, that both agarose (Figure 1–7) to which are attached 2-(diethyl-
bear a positive charge and chelate the chloride.155 In tryp- amino)ethyl groups (Figure 1–2). In a column poured
tophanase, a site at which potassium ion binds to the pro- with DEAE-Bio-Gel A, the concentration of covalently
tein is formed from the oxygen of a carboxylate and three attached tertiary ammonium cations is 20 mmol (L of
acyl oxygens from the backbone of the polypeptide that bed)-1. The bed of such an ion-exchange resin can be
together chelate the ion.156 In plasminogen activator divided theoretically into two compartments that can be
inhibitor 1, a site at which a chloride ion binds is sur- referred to as the stationary compartment and the
rounded by two ammonium cations of two lysines and mobile compartment. The stationary compartment,
two NH groups of two amides from the backbone of the which is the volume within the beads, surrounds the
polypeptide that all chelate the ion.157 In exotoxin A from covalently attached tertiary ammonium cations and
Pseudomonas aeruginosa, a site at which a chloride ion includes enough of the surrounding volume that the
binds is formed from two guanidinium cations of two compartment is electroneutral. The mobile compart-
arginines, and a site at which a sodium ion binds is formed ment, which is the volume surrounding the beads, is the
by two acyl oxygens from the polypeptide backbone.158 remainder of the volume that is accessible to the protein
Although there is no relation between the number of being submitted to the chromatography.
ions bound and the charge on the protein at a particular An isoionic solution of bovine serum albumin at
pH, proteins with high densities of negative charge seem 50 mg mL-1 has a pH of 5.48. This solution is adjusted to
to bind cations more readily than those with low densities the desired pH with KOH to produce a potassium salt of
of negative charge.154 This tendency presumbably results bovine serum albumin, KnBSA. Samples of this polyanionic
from the increase in the probability of proper juxtaposi- form of bovine serum albumin are submitted to chro-
tion for chelation with the increase in the density of neg- matography by anion exchange on a column 4.5 cm in
ative charge. As the pH is lowered from the point of zero diameter and 40 cm in length of DEAE-Bio-Gel A in the
net proton charge, the density of positive charge on a pro- chloride form. The solution within the DEAE-Bio-Gel A
tein increases only marginally; rather, the density of neg- itself has been adjusted with HCl to the same pH as that of
ative charge decreases as carboxylates are neutralized. It the solution of protein and equilibrated with an unbuffered
has been observed that the number of bound anions solution of KCl. No buffer has to be used because the bovine
increases as the pH is lowered,153 which results from the serum albumin and the diethyl aminoethyl groups on the
decrease in electrostatic repulsion, due to these carboxy- agarose provide adequate buffering.
lates, that at neutral pH inhibits the chelation of dissolved The movement of the bovine serum albumin
anions by the fixed positive charges on the protein. For through the chromatographic system will be determined
reasons that are not well understood but may include the by its partition coefficient between the stationary com-
differences in ionic radii, proteins seem to bind halides partment and the mobile compartment
more readily than they do alkali metal ions.
The mean net molecular charge number, 6i, on pro-
tein i in a solution containing simple neutral salts such as [ BSAn – ]¢S
a ¢BSA =
(NH4)2SO4, NaCl, or KCl is the sum of the mean net [ BSAn – ]¢M
proton charge number and the net charge number con-
tributed by these loosely bound ions:
where the superscript n– refers to the mean net molec-
m ular charge number on the bovine serum albumin at
6 i = 6 H,i + ∑ nj zj (1–60) the chosen pH, and as in Equation 1–1, the primes on
j=1 the concentrations indicate that they are in units of
moles (liter of bed)-1. The free energies of transfer of
_ the ions between the stationary compartment and the
where nj is the mean number of ions of species j and
charge number zj bound by the protein. It is this net mobile compartment, however, are governed by the
charge on protein i that determines its behavior on chro- actual molar concentrations of the bovine serum albu-
matography by ion exchange or electrophoresis. In turn, min in the two compartments (indicated by the
electrophoresis is the usual method for determining the unprimed brackets as usual) according to the partition
isoelectric point of a protein. coefficient
( )
2n
( )
1 – fS
1 – fS a ¢BSA –1
a BSA = a ¢BSA
fS
fS [ DEAE+ ]¢S
=
( )1 – fS
1n
f S[ K + ]M
where fS is the fraction of the total accessible a ¢BSA
fS
volume of the bed, VT, that is the volume of the
stationary compartment, VS. Note that by defini-
tion the sum of VS and the volume of the mobile where [DEAE]¢ is the concentration of covalently
compartment, VM, is VT. attached tertiary ammonium cations: 20 mmol
(L of bed)-1.
The ideal distribution of bovine serum between the
mobile and stationary compartments in the DEAE-Bio- (D) The titration curve of bovine serum albumin159 is
Gel A is governed by equations equivalent to Equations such that the value of the partial derivative
1–15 to 1–18 that describe the conservation of charge in (!6H,BSA/!pH)T,Ic has a constant value over the
the two compartments and the equivalence of the ideal region from pH 5.5 to 7.0 of -5.9.
activities of the various dissolved salts in the system. Assume that, under the conditions of the experiment,
bovine serum albumin does not bind either K+ or Cl-.
(B) Write four equations equivalent to Equations 1–15 What is the mean net molecular charge number on the
to 1–18 for the special case of bovine serum albu- bovine serum albumin at pH 6.00, and what is the mean
min on DEAE-Bio-Gel A. Use the explicit abbrevi- net molecular charge number on the bovine serum albu-
ations K+, Cl-, BSAn–, and DEAE+. Remember that min at pH 7.00?
for the potassium salt of a multivalent anion, Kn A,
where the charge number on anion A is n–, the (E) Before solutions containing the potassium salts of
ideal activity of the salt in a solution is bovine serum albumin are run, a sample of the
isoionic solution of bovine serum albumin at
50 mg mL-1 is adjusted to pH 5.0 with HCl and run
a Kn A = [ K + ]n [ An – ] on the column of DEAE-Bio-Gel A equilibrated at
pH 5.0 and eluted with 0.04 M KCl. The elution
volume of the bovine serum albumin in this run is
In this equation, n does not have to be an integer.
537 mL. What parameter of the ion-exchange
column is measured by this experiment?
(C) Unlike the derivation in the book, assume that
only the concentration of bovine serum albumin, (F) A sample of the isoionic solution of bovine serum
not [K +]S, is negligible and show that albumin at 50 mg mL-1 is adjusted to pH 6.00 with
KOH and run on the column of DEAE-Bio-Gel A
equilibrated at pH 6.00 and eluted with 85 mM
[ DEAE+ ]S2 + 4[ K + ]M2 – [ DEAE+ ]S KCl. The elution volume of the bovine serum
[K ]S
+
=
2 albumin on this run is 3.38 L. What is the value of
aBSA under these conditions?
that
(G) Show that the value of fS for the DEAE-Bio-Gel A
( )
n
in the column is 0.060.
[ DEAE+ ]S2 + 4[ K + ]M2 + [ DEAE+ ]S
a BSA = (H) A sample of the isoionic solution of bovine serum
2[ K + ]M
albumin at 50 mg mL-1 is adjusted to pH 7.00 with
KOH and run on the column of DEAE-Bio-Gel A
that equilibrated at pH 7.00. What concentration of
KCl must be used to have the elution volume of
( )( )
n the bovine serum albumin be 3.00 L?
fS [DEAE+]¢S2 + 4fS2[K +]M2 + [DEAE+]¢S
a ¢BSA = (I) Explain which of the assumptions, either implicit
1 – fS 2f S[K +]M or explicit, relied upon in the preceding develop-
ment are most certainly oversimplifications and
that explain why each of them is an oversimplification.
(a BSA ) – 1
2n
[ DEAE+ ]S Problem 1–14: At a protein concentration of 3 ¥ 10-4 M,
= the isoionic pH of ribonuclease151 is 9.60. Calculate
(a BSA )
1n
[ K + ]M 6H,isoionic,RNase.
36 Purification
Problem 1–15: Assume that the side chains of the acidic ferent forms of the protein are 4.32, 4.29, 4.26, 4.23, 4.20,
and basic amino acids in a native properly folded protein 4.17, 4.14, and 4.11, respectively. At these values of pH,
all have the same values for their acid dissociation con- each phosphate ester would have a charge number of
stants that they do in the unfolded polypeptide (Table –1.00 so an additional equivalent of negative charge is
2–2). Let f be the fraction of a particular acidic or basic added to the protein when an additional phosphate is
amino acid that is ionized at a given pH. added.
(A) Show that for a particular type of amino acid the (A) Explain why the isoelectric point of the protein
conjugate base of which is anionic, such as aspar- decreases as each phosphate is added.
tate, glutamate, cysteine, tyrosine, or a carboxy
(B) What amino acid side chains are titrating in this
terminus,
range of pH? (See Table 2–2).
1 (C) Assume that the aspartates and glutamates of the
f anionic =
1 + 10 (pK a – pH) protein have the same pKa (4.2). The decrease in
the mean net proton charge number on a protein
where the pKa is the one found in Table 2–2. Show as the pH is lowered, if only the glutamates and
that for a particular type of amino acid the conju- aspartates are titrating, should be
gate acid of which is cationic, such as histidine,
lysine, arginine, or an amino terminus, 1 1
D 6 H = n E+D –
1 1 + 10 (4.2 – pHf ) 1 + 10 (4.2 – pHi )
f cationic =
1 + 10(pH – pK a) where nE+D is the total number of glutamates plus
aspartates in the protein, pHf is the final pH, and
pHi is the initial pH. What is the total number of
where the pKa is the one found in Table 2–2 for
glutamates plus aspartates in the protein?
that amino acid.
In a molecule of fructose-bisphosphate aldolase
from rabbit skeletal muscle, there are four identical
polypeptides, each containing one amino terminus, 14 Electrophoresis
aspartates, 24 glutamates, 11 histidines, eight cysteines, When a molecule of protein i at a given pH in an aqueous
12 tyrosines, 26 lysines, 15 arginines, and one carboxy solution of electrolytes is placed in an electric field, it will
terminus. There are no bound coenzymes or posttransla- experience a force, Fel, in the direction x such that
tional modifications. The pKa of a carboxy terminus is
3.3, and that of an amino terminus is 8.0.
Fel = Qi Ex = ea 6 i Ex (1–61)
(B) Calculate the mean net proton charge number on
a molecule of fructose-bisphosphate aldolase at
where Qi is the mean charge on proteini (coulombs), ea is
pH 8 and at pH 9. (If you are adept at using a com-
the elementary charge (1.602 ¥ 10-19 C), 6i is the mean net
puter, go to part E first).
molecular charge number of the molecule of protein i
(C) Estimate the point of zero net proton charge for under these circumstances, and Ex is the electrical field
fructose-bisphosphate aldolase. (volts centimeter-1) or gradient of the electrical potential
(!V/!x) in the x direction. The units of force (grams cen-
(D) What is the value of the point of zero net proton
timeter second-2) follow from the fact that one volt is
charge for fructose-bisphosphate aldolase
one joule coulomb-1 (107 gram centimeter2 second-2
according to the experiments in Figure 1–16?
coulomb-1). Electrophoresis is usually run in an appara-
(E) Write a program or program a spreadsheet to cal- tus designed so that (!V/!y) and (!V/!z) are zero, and the
culate the mean net proton charge number on force Fel will cause the molecule of protein i to move only
fructose-bisphophate aldolase at any pH. in the x direction.
For the moment, it will be assumed that only the
(F) Use the program to draw a titration curve of fruc-
molecule of protein i and its physically bound ions move.
tose-bisphosphate aldolase.
As the molecule of protein i moves, a frictional force, Ffric,
exerted by the surrounding stationary liquid is experi-
Problem 1–16: A particular protein is modified within enced by the molecule. The frictional force is propor-
the cells where it is normally located by the covalent tional to the velocity of movement of the molecule
attachment of inorganic phosphate in the form of phos-
( )
phate esters. Anywhere between zero and seven phos- !xi
phates can be attached to the protein under normal F fric = – fi (1–62)
circumstances. The isoelectric points of these eight dif- !t E
Electrophoresis 37
Concentration c
one of its physical properties. t2
t=0
Æ Æ t short
At this point a digression is necessary to explain the
!c / !x
frictional coefficient before continuing with a discussion
of electrophoresis. The most direct way to determine the t1
frictional coefficient of a molecule of protein is from its
t2
diffusion coefficient, D. The diffusion coefficient is a
measure of the net tendency of any population of identi- 0
0 0
cal molecules to spread from a region of high concentra-
Distance x Distance x
tion to a region of low concentration; the driving force
behind this movement is not a function of any intrinsic Figure 1–12: Measurement of a diffusion coefficient.160 (A)
feature of the individual molecules such as their charge Spreading of a boundary of concentration at the interface formed
number or their mass. The diffusion coefficient Di (cen- between two solutions, one containing the solute and the other not
timeters2 second-1) of any substance i in solution is containing the solute. A solution containing the solute is brought in
contact with a solution otherwise identical, but lacking the solute,
defined by Fick’s law to form an interface at the origin of the horizontal axis. At the ini-
tial time the function of the concentration (c) is discontinuous at
( )
!ci the interface at the origin of the horizontal axis, but as time pro-
J x,i = – Di (1–63) gresses (t1 and t2) the solute diffuses in the direction x normal to the
!x t interface into the vacant solution and a gradient of concentration
develops. (B) The first derivative of the function of concentration
with respect to distance in the direction x [(!c/!x)t] at any instant is
where Jx,i is the flux (moles centimeter-2 second-1) of sub- a Gaussian function (curves labelled t1 and t2), the width of which
stance i through a planar surface of unit area, ci is the increases and the height of which decreases with time, t. Reprinted
concentration (moles centimeter-3) of the substance i at with permission from ref 160. Copyright 1961 John Wiley.
any point, and x is the distance (centimeters) along an
axis normal to the planar surface. The greater (!ci/!x)t,
the greater the diffusive force, and the greater the net The frictional coefficients of spheres or ellipsoids of
flux. The diffusion coefficient of substance i, Di, is revolution can be calculated. For a sphere
simply the constant of this proportionality. It can be
shown that f = 6phr (1–66)
()
2 tein in the electric field. When the electric field is turned
1 A
D = (1–65)
4p t H on, a steady state4 is rapidly reached in which Fel = –Ffric
and which is characterized by a constant terminal veloc-
where A is the area (concentration) of the curve of ity (!xi/!t)E of the molecules of protein i in the direction
(!ci/!x)t against x and H is its maximum height (concen- of the electric field. At steady state, because Fel = –Ffric
tration centimeter-1). At the present time, however, the
diffusion coefficients of proteins are usually measured by
dynamic light scattering161,162 or by pulsed field gradient
nuclear magnetic resonance.163,164
( ) !xi
!t E
=
e a 6i Ex
fi
(1–68)
38 Purification
(cm 2 V –1 s –1)
field, is uniform in its dimensions and in its specific con- 20
ductance so that Ex is constant over its length. The free
electrophoretic mobility, u∞i (centimeters2 volt-1
second-1) of protein i is defined as 16
(!xi !t)E di l 12
uªovalb ¥ 10 5
uªi ∫ = (1–69)
(!V !x )
t
Vt
8
( )
in the electrophoretic mobility of the protein as the
ionic strength increases.* ea 6i 1
uªi @ f ( k ai ) (1–79)
On the basis of these assumptions, an equation has fi 1 + k ai
been derived165,167-169 to describe the electrophoretic
mobility of protein i if its shape is approximately that of
a sphere: This equation predicts that the electrophoretic mobility
will decrease as the ionic strength increases (Figure 1–13)
( )
because k increases as the ionic strength increases
ea 6i 1 + k aj
uªi = f ( k ai ) (1–78) (Equation 1–77).
fi 1 + k aj + k ai The points in Figure 1–13 are the observed elec-
trophoretic mobilities of the protein ovalbumin at vari-
ous ionic strengths as measured by Tiselius and
where f(kai), Henry’s function, is a function in kai for Svensson.165 The top line is their calculation of the mobil-
which there is no exact expression168 but which can be ities with Equation 1–70 by use of independent measure-
expressed graphically (Figure 1–14).160 The value of this ments of 6ovalbumin and fovalbumin. The lower line is their
function varies between 1.0 and 1.5. It can be seen that calculation of the mobilities with Equation 1–79. The
when kaj < 1, as is usually the case for a solution of pro- agreement between calculated values and observed
tein, values is surprisingly satisfactory. As the authors point
out, the calculated value from Equation 1–70, in the
absence of electrolyte, comes close to the extrapolated
1.5 value of the actual mobilities.
According to Equation 1–78, at a constant ionic
strength, the electrophoretic mobility of protein i should
1.4 be directly proportional to 6i, and this proportionality is
reflected in the direct proportionality that obtains between
6H,i and ui∞ (Figure 1–15)170 as 6H,i is varied by varying the
1.3 pH at a constant ionic strength.170 In fact, it is possible to
f (k a)
(cm 2 V –1 s –1 )
+12 +6 B C
0.4 D
A
Z H,trypsin
+6 +3
I c "2
1
_
uªtrypsin ¥ 10 5
0.2
0 0
-6 -3
2 6 10 4 6 8
pH pI
Figure 1–15: Comparison of the electrophoretic mobilities of Figure 1–16: Variations in the electrophoretic isoelectric points
trypsin (centimeter2 volt-1 second-1) at 0 ∞C (u∞trypsin; points) with (pI ) of a protein as a function of the square root of the ionic
the acid–base titration curve of trypsin determined at 20 ∞C strength (Ic").174 Line A, ovalbumin in acetate; line B, fructose-bis-
(6H,trypsin, continuous curve).170 The respective scales on the two phosphate aldolase in phosphate; line C, fructose-bisphosphate
vertical axes, those for electrophoretic mobility and mean net aldolase in acetate; line D, carboxyhemoglobin in phosphate.
proton charge number, respectively, both with respect to pH, were Reprinted with permission from ref 174. Copyright 1949 American
adjusted to produce maximum coincidence. The value for 6H,trypsin Chemical Society.
= 0 was arbitrarily set to coincide with the isoelectric point. The
coincidence displayed is in shape rather than absolute value or
excursion. The different symbols denote the different buffers used
and 6H,i is available from titration data (Figure 1–11).
to maintain the pH during electrophoresis: (¥) Na+, H+, Cl-; (䉭) Na+, To this point, only the free electrophoretic mobility
H+, acetate-, Cl-; (䊐) Ca2+, H+, barbiturate-, Cl-; (䊊) Mg2+, H+, barbi- of a protein, ui∞, has been discussed. The free elec-
turate-, Cl-; (䉮) Ca2+, H+, glycinate-, Cl-; (䉫) Ca2+, H+, NH3, Cl-. The trophoretic mobility is the electrophoretic mobility dis-
ionic strength was maintained at 0.13 M. Reprinted with permis- played by a protein in free solution. This property of the
sion from ref 170. Copyright 1952 Academic Press.
protein is measured by moving boundary electrophore-
sis175 in an apparatus developed by Tiselius.176 This tech-
nique has been supplanted by electrophoresis in
from the distortion of the outer layer of the double layer
continuous gels of cross-linked polyacrylamide. A gel of
caused by its movement in a direction opposite to that of
cross-linked polyacrylamide is a hydrated plastic cast in
the protein and its inability to dissolve behind it and re-
a mold from a solution of acrylamide and the cross-linker
form around it instantaneously.
N,N ¢-methylenebis(acrylamide) along with a buffer and
At its isoelectric point, pIi, the electrophoretic
other salts. The total concentration of acrylamide and
mobility of protein i becomes zero (Equation 1–78), and
N,N ¢-methylenebis(acrylamide) in the final gel can be
this fact permits the isoelectric point of a protein to be
varied from 3% to 20%.
measured by electrophoresis.173 Electrophoretic mobili-
It has been demonstrated experimentally by
ties are measured at values of pH greater than and less
Morris177 that the relative electrophoretic mobilities of
than pIi, and the pH of zero mobility is determined by
proteins in polyacrylamide gels vary regularly with the
interpolation (Figure 1–15).
concentration of acrylamide used to cast the gel (Figure
The effect of ionic strength on the isoelectric point
1–17)177 and
of a protein in the absence of actual binding of the ions
in the electrolyte to the protein has been calculated to be
smaller than the experimental error in measurement.150 u i = uªi exp ( – K r,i Ta ) (1–81)
Nevertheless, significant variations in isoelectric point
with ionic strength are generally observed (Figure
where ui is the electrophoretic mobility of protein i
1–16),174 and these depend on the particular neutral salt
observed on a gel cast from a solution whose total con-
chosen to adjust the ionic strength. The explanation for
centration of acrylamide, in percent, was Ta and Kr,i is a
this behavior can only be the preferential binding of par-
retardation coefficient unique to protein i. Such behav-
ticular ions—in Figure 1–16, always that of the anions—
ior was first noted by Ferguson178 on gels cast from starch
in the chosen electrolyte. The net binding of ions can be
in which the same equation applies (Equation 1–81), but
calculated from the observed changes in the isoelectric
the concentration is Ts, the concentration of the
point because, from Equation 1–60, when 6i = 0
starch.178
m According to Equation 1–81, ui∞ should be the free
6 H,i = – ∑ nj zj (1–80) electrophoretic mobility of protein i, and this has been
j=1 shown to be the case.177 It follows that
42 Purification
pH L2 pH L2 pH New Running
gel
pH L2 Lower
f (k ai )
electrode
ea 6i
ui @ exp ( – K r,i Ta ) (1–82)
fi 1 + k ai Figure 1–18: Disc electrophoresis.182 At the start (left) the proteins
in the original sample (black rectangle) are in a large volume and at
a low pH (pHL1). They are compressed to a small volume, or disc, as
Examination of this relationship reveals that the elec- they move through the stacking gel by being trapped in the stable
trophoretic mobilities of the proteins in a complex mix- boundary between the upper solution (pHU, buffer) and the solu-
ture upon a gel of polyacrylamide are directly tion of the original sample and the spacer (pHL1). (Middle) Upon
proportional to their respective charges, which are deter- fusion of this descending boundary and the stable ascending
mined by complex functions of pH (Figure 1–11); are boundary between the solution originally in the running gel (pHL2)
and the solution originally in the stacking gel (pHL1), the pH at the
complex functions of their respective frictional coeffi- boundary increases and the new more rapidly moving boundary
cients, which are determined by their sizes and shapes; outstrips the proteins and deposits a newly created solution of
and are exponentially proportional to the product of a higher pH (pHnew) behind it as it moves ahead of the separating
constant, which is unique for each, and the concentra- proteins (right). The proteins also escape the first boundary
tion of acrylamide. At a given pH, ionic strength, and because, at about the same time as the jump in pH at the fusion of
the descending and ascending boundaries, they encounter the run-
concentration of polyacrylamide, each of the proteins in ning gel, which has a higher percentage of acrylamide and which
this mixture will have a characteristic electrophoretic decreases their mobility. Reprinted with permission from ref 182.
mobility (Figure 1–17) and they can be separated one Copyright 1964 New York Academy of Sciences.
from the other. In this way, electrophoresis can provide a
catalogue of the number of proteins present in the mix- * The electrophoresis of proteins unfolded in solutions of dodecyl
ture and the relative amounts of each. sulfate is quite different and will be discussed later.
Electrophoresis 43
The first of these boundaries is used to trap the pro- acid. Under these conditions, the anion a is slow because
teins and sweep them into an extremely narrow band only a fraction of the weak acid is anionic at any instant.
prior to the electrophoretic separation. This process has The acid–base equilibrium has the effect of decreasing
been called stacking. It significantly improves the reso- the mobility of the anion a from its value in the absence
lution of the subsequent separation by shrinking the of its conjugate acid to a lower value, and
original sample to a hairline so that all of the molecules
of protein begin the electrophoretic separation at nearly ua = uªa fa (1–83)
the same point. The stacking occurs because the proteins
are initially placed as a sample that is sandwiched where ua is the mobility of the upper anion at the actual
between an upper solution and a lower solution. The ratio of conjugate base to acid in the solution, ua∞ is its
upper solution is simply poured on top of the sample, but mobility in the absence of its conjugate acid, and fa is the
the lower solution is in a polyacrylamide gel, so that con- fraction of the total weak acid that is ionized at the ratio
vective turbulence does not disrupt the stable moving chosen. In such a situation, the proteins can be released
boundaries, but it is a gel of a low concentration of poly- from the descending boundary by abruptly increasing
acrylamide, so that the mobilities of the proteins are as the pH, and hence the value of fa , so that ua becomes
high as possible. This gel of high porosity is the stacking greater than the mobilities of the proteins, and the new
gel. The solution in which the protein is dissolved has the stable, but now rapidly descending, boundary that
same composition as the lower solution. results drops the proteins behind at the origin of the elec-
The upper and lower solutions are prepared so their trophoretic separation.
respective ionic compositions will form a stable moving The abrupt increase in the pH of the upper solution
boundary of a particular type. Although systems for of the descending boundary can be accomplished by the
cationic proteins are also available, to describe this arrival of a stable ascending boundary between two con-
boundary, it will be assumed that the direction of elec- centrations of the same cationic buffer. Behind this
trophoretic movement of both the proteins and this first ascending boundary is a solution of the same cationic
stable moving boundary is downward and a pH has been weak acid as used for the upper and lower solutions of the
chosen such that the proteins are all anionic. In this case, initial descending boundary [for example, tris(hydroxy-
both the upper solution and the lower solution above methyl) methylammonium ion] but at a higher concen-
and below the boundary, respectively, are prepared from tration and a higher pH than that of the solution behind
salts of the same cationic weak acid [for example, the initial descending boundary. This ascending bound-
tris(hydroxymethyl)methylammonium ion]. An anion ary between different concentrations of the same cationic
(for example, glycinate ion) the mobility of which is less buffer has been constructed so that the anion in both its
than the mobilities of all the proteins is used to make the upper and lower solutions (for example, chloride) is the
upper solution, and an anion (for example, chloride ion) same. Because its upper solution is by definition the lower
the mobility of which is greater than the mobilities of all solution of the initial descending boundary, this anion is
the proteins is used to make the lower solution. The already the fast anion of that boundary. The cationic weak
stable descending boundary formed is one between acid of the upper solution of the ascending boundary is
these two anions. If a molecule of one of the proteins by definition the cationic weak acid in the two solutions
finds itself in the lower solution, it is surrounded by used to make the initial descending boundary. The con-
anions that are moving faster than it is, and it is over- centration and pH of the cationic buffer in the lower solu-
taken by the boundary. If a molecule of one of the pro- tion of the ascending boundary, however, is chosen to be
teins finds itself in the upper solution, it is surrounded by high enough to adjust the final pH behind the new
anions that are moving more slowly than it is, and it out- descending boundary to a value high enough to release
strips them and returns to the boundary. The result of the proteins from the initial descending boundary. If the
these events is that the proteins all gather within the release is unsuccessful, or only partially successful, the
descending boundary itself, which remains extremely proteins, or some of the proteins, remain trapped in the
sharp and stable if the upper and lower solutions have new descending boundary and are never separated.
the proper ionic compositions.184 These trapped unseparated proteins form an extremely
The stacking process is able to compress the pro- sharp but uninformative and deceptive band at the
teins to thin lamella, but in order for electrophoretic sep- bottom of the final electrophoretogram.185
aration to occur, they must be released from the The release of the proteins from the initial descend-
boundary after they have been stacked. This can be done ing boundary in which they were trapped and stacked can
if the upper solution of this initial descending boundary be accomplished even more effectively by using a stable
has been made with an anion, a, that is slower than the ascending boundary behind which is a solution of a
protein only because it is the conjugate anionic base (for cationic weak acid of a higher pKa than the cationic weak
example, glycinate ion) of a weak neutral acid (glycine, acid used as the counterion in the initial descending
pKa = 9.6) and the pH of the upper solution has been boundary.186 This ascending boundary has been con-
chosen to be significantly lower than the pKa of that weak structed so that the anion in both its upper and lower solu-
44 Purification
tions is the same. Because its upper solution is the lower the stacking gel, it must maintain a constant pH and
solution of the initial descending boundary, this anion is ionic strength behind it (pHU) to maintain the low and
the fast anion of the upper boundary (for example, chlo- constant mobility of the slow anion in the upper solu-
ride ion). The cationic weak acid of the upper solution of tion. As the new descending boundary moves, it must
the ascending boundary must be the cationic weak acid deposit behind itself a solution of constant pH and ionic
of the two solutions used to make the initial descending composition (pHnew) to form a uniform electrophoretic
boundary (for example, pyridinium ion; pKa = 5.14). The field upon which the proteins can be separated. The pH
cation of the lower solution of the ascending boundary is and ionic composition of the solution that is deposited
chosen to be the cationic conjugate acid [for example, behind the new descending boundary is different from
tris(hydroxymethyl)methylammonium ion; pKa = 8.10] of the pH and ionic composition of any of the solutions ini-
a neutral base strong enough to adjust the final pH behind tially present, but the cation in this newly created solu-
the new descending boundary to a value higher than the tion is the weak cationic acid of the original lower phase
pKa of the neutral conjugate acid of the anion a (for exam- of the ascending boundary and the anion in this solution
ple, 4-morpholineethanesulfonate ion; pKa = 6.15) and is the now accelerated anion of the original upper phase
release the proteins from the initial descending bound- of the initial descending boundary. The constant pH
ary. The difficulty with this strategy is that the pH of the deposited behind this new descending boundary is
upper solution of the initial descending boundary is often established by the weak cationic acid found on both
so low that the proteins are no longer anionic and move sides of the boundary and its conjugate base and the now
upward instead of downward. But it is effective with com- accelerated slow anion found in the upper solution of the
plexes of protein and dodecyl sulfate because they are boundary, which is a weak anionic base, and its conju-
anionic at all reasonable values of pH. gate acid. All four of these species together buffer the
To ensure that as many proteins as possible are deposited solution and determine both the ionic
released from the initial descending boundary, shortly strength and the value of the deposited pH and hence the
after the fusion of the ascending boundary and the initial pH of the actual electrophoresis.
descending boundary, the descending band of proteins The equations that govern the creation of a stable
in the stacking gel encounters a much higher concentra- moving boundary and the ability of that boundary to
tion of polyacrylamide, the running gel, which decreases deposit a solution of uniform pH and ionic composition
the mobilities of all of the proteins by virtue of the rela- were derived by Ornstein182 from the regulating func-
tionship in Equation 1–81. This frictional deceleration of tions described by Kohlrausch.184 On the basis of these
the proteins increases the probability that all of their equations, Jovin187 has developed a more elaborate theo-
mobilities will be less than that of the now accelerated retical description of discontinuous electrophoresis, and
anion of the upper solution so that they can escape from he and his colleagues have provided the necessary
the new descending boundary. recipes for a large number of discontinuous systems.188
The polyacrylamide gel is poured in two stages
(Figure 1–18): the running gel, the polyacrylamide con- Suggested Reading
centration of which is high and upon which the separa-
Tiselius, A., & Svensson, H. (1940) The influence of electrolyte con-
tion will occur, and the stacking gel, the polyacrylamide centration on the electrophoretic mobility of egg albumin,
concentration of which is as low as possible to keep the Trans. Faraday Soc. 36, 16–22.
mobilities of the proteins as high as possible and in Carbeck, J.D., & Negin, R.S. (2001) Measuring the size and charge of
which the stacking will occur. proteins using protein charge ladders, capillary electrophoresis,
Three stable moving boundaries must be con- and electrokinetic models of colloids, J. Am. Chem. Soc. 123,
structed (Figure 1–18). At the start of the electrophoresis, 1252–1253.
the initial descending boundary between the slow anion
and the fast anion that will compress the proteins is the Problem 1–17: The uptake of protons by 1 mol of
boundary between the upper electrode solution (pHU) horse carboxyhemoglobin in the range of pH 6–8 is
and the solution in the sample and the stacking gel about 9 mol of protons for each drop of 1 unit in pH.189
(pHL1). At the start of the electrophoresis, the ascending Use this value to estimate the moles of phosphate
boundary between the two concentrations of the bound by a mole of horse carboxyhemoglobin at its iso-
cationic conjugate acid of the weak base or between the electric point at the phosphate concentration of the last
cationic conjugate acids of the weaker base and the point in curve D of Figure 1–16 ([phosphate] = 0.12 M).
stronger base that will deliver the pH jump is the bound- Assume that no cations other than protons are binding
ary between the solutions in the running gel (pHL2) and to the protein under these conditions.
the stacking gel (pHL1). The third stable moving bound-
ary is the new descending boundary that deposits behind Problem 1–18: Use interpolated values for the free
it the solution in which the proteins are actually sepa- electrophoretic mobility of ovalbumin (Figure 1–13) at
rated (pHnew). It forms upon the fusion of the other two. ionic strengths of 0.0025, 0.01, and 0.16 M to calculate
As the initial descending boundary moves through the charge number on the protein during the elec-
Criteria of Purity 45
trophoresis. The pH for the measurements was 7.1, and radius, 6 is the mean net charge number on the protein
the temperature was 294 K. The viscosity of water at at pH 7, (!6H,i/!pH)Ic is the change in mean net proton
294 K is 1.0 mPa s. The diffusion coefficient of ovalbu- charge number with pH, Kr is the retardation coefficient
min at 294 K is 4.2 ¥ 10-7 cm2 s-1. for polyacrylamide, and u∞ is the free electrophoretic
mobility for a temperature of 25 ∞C, an ionic strength of
Problem 1–19: The isoelectric point of normal hemo- 0.1 M, and a pH of 7.0.
globin, hemoglobin A, is 6.87, and that of sickle hemo-
(A) Assume that, at constant ionic strength,
globin, hemoglobin S, is 7.09 when electrophoresis is
(!6H,i/!pH)Ic is equivalent to (!6i/!pH)Ic for each
carried out under the same conditions.173 In the vicinity
of the five proteins, and calculate the elec-
of the isoelectric point, the charge number on either of
trophoretic mobilities of these five proteins, at
these hemoglobins changes by about 13 equiv for every
25 ∞C and an ionic strength of 0.1 M, under each
mole of protein for every change of 1 unit in pH. At the
of the following conditions: (1) pH 7.0 on 5% poly-
same pH, anywhere between their two respective iso-
acrylamide; (2) pH 7.0 on 10% polyacrylamide;
electric points, what is the difference in charge number
(3) pH 5.0 on 5% polyacrylamide; (4) pH 5.0 on
between hemoglobin A and hemoglobin S?
10% polyacrylamide.
Problem 1–20: The frictional coefficient of trypsin at (B) What is the order of the migration of these five
10 ∞C is 5.5 ¥ 10-8 g s-1. Assume the molecule to be a proteins under each of the four conditions?
sphere and calculate its free electrophoretic mobility at
(C) What will happen to protein E at pH 7.0 that
10 ∞C and at pH 6 and Ic = 0.13 M by using the results of
would not happen at pH 5.0 if a mixture of the
the acid–base titration in Figure 1–15, which are for
proteins is run on vertical polyacrylamide gels
20 ∞C, and Equation 1–79. Assume that 6trypsin = 6H,trypsin
with the cathode at the bottom and the anode at
and that 6H,trypsin at pH 6 is the same at 10 ∞C as at 20 ∞C.
the top?
Problem 1–21: The frictional coefficient of ribonuclease (D) Assume that 6i does not change as ionic strength
at 25 ∞C is 2.6 ¥ 10-8 g s-1. Assume the molecule to be a changes and calculate the mobilities of the five
sphere and calculate its free electrophoretic mobility at proteins at an ionic strength of 0.2 M at pH 5 and
pH 6 and [KCl] = 0.15 M by using the results presented in at 25 ∞C on 5% polyacrylamide. How does the
Figure 1–11 and Equation 1–79 with the assumption that increase in ionic strength affect the mobilities?
6RNase = 6H,RNase. In a field of 20 V cm-1, how far would
ribonuclease travel in 3 h if it had this mobility?
protein a 6 ⎛ ∂6 H ⎞ Kr u∞
(nm) (pH 7) ⎜⎝ ∂pH ⎟⎠ (%-1) ⎛ cm ⎞
2
Enzymatic activity
4
Dye
Relative absorbance
at 600 nm
Figure 1–19: Disc electrophoresis on gels of polyacrylamide of
native proteins from successive steps in the purification of [acyl-
carrier-protein] S-malonyltransferase from E. coli.190 Electro-
phoresis was performed on polyacrylamide gels cast from 15%
solutions of acrylamide in a discontinuous system of tris(hydroxy-
methyl)methylamine and glycylglycine. The different gels repre-
sent samples from successive steps in a complete purification of
the enzyme, seen in its final purified state on gel F. The gels were
stained for protein with Coomassie brilliant blue. Reprinted with ) Migration (
permission from ref 190. Copyright 1973 Journal of Biological
Chemistry.
Figure 1–20: Electrophoresis of purified porcine phosphomeval-
onate kinase (20 mg) on a gel cast from a 10% solution of acryl-
amide.51 Following the electrophoresis, the cylindrical gel was
divided in half longitudinally. One half was cut into slices laterally,
and the slices were assayed individually for enzymatic activity (A).
components can be resolved. Also, by running polyacry- The other half was stained for protein and then scanned for the
lamide gels loaded with a series of protein concentra- resulting absorbance (B). The inset in panel B is a photograph of
the stained gel. Reprinted with permission from ref 51. Copyright
tions, the number and relative amounts of any minor
1980 American Chemical Society.
impurities can be quantified.193 The polyacrylamide gels
should also be stained with two distinct dyes, for exam-
ple Coomassie brilliant blue and silver oxide,194,195
because some proteins do not stain so strongly as others
with a particular dye.
The single component observed upon elec-
trophoresis of a sample from the final step of the purifi-
cation must be shown to be the protein actually
Figure 1–21: Staining a polyacrylamide gel for enzymatic activ-
responsible for the biological function being purified. ity.19 Two samples of purified isocitrate dehydrogenase (NADP+)
Either the polyacrylamide gel is sliced and the assay is from the final step on phenyl agarose (Table 1–2) were submitted
performed on each slice (Figure 1–20),51,97,131,196 or the to electrophoresis on separate lanes of a thin slab of polyacry-
intact polyacrylamide gel is stained for enzymatic activ- lamide. After the electrophoresis, the lanes were cut from the slab.
ity (Figure 1–21).19 The latter is accomplished by placing One of the lanes (lower) was stained for protein with Coomassie
brilliant blue. The other lane (upper) was placed in a solution of
the gel in a solution that promotes the incorporation of isocitrate and NADP+. The intrinsic fluorescence of the NADPH
radioactivity197 or that gives a fluorescent product or a produced by the enzyme was observed by illuminating the gel with
colored product from the enzymatic reaction. For exam- ultraviolet light. Reprinted with permission from ref 19. Copyright
ple, by adding lead acetate, the SeH2 produced in a poly- 1992 Blackwell Publishing.
acrylamide gel from the action of selenocysteine lyase
can be made to form a yellow band where the enzyme is ored, by virtue of a bound chromophore, such as the
located.60 The most widely used stain for enzymatic coenzyme B12 associated with D-lysine 5,6-aminomu-
activity is based on the ability of NADH to reduce tase,200 and the coelectrophoresis of the purified protein
p-nitrotetrazolium blue to give a blue color.198,199 It is and that color can be observed directly.
obvious that through coupled assays this reaction can be Several artifacts can produce misleading results on
used to visualize a large array of different enzymatic electrophoresis. For example, aggregation of individual
activities. At times, the protein being purified is itself col- molecules of the same protein can occur201,202 during
Heterogeneity 47
either the purification or the stacking process, and this polyelectrolytes known as ampholytes. The isoelectric
produces an array of complexes, each with a different points of the ampholytes in the mixture vary over a con-
frictional coefficient and retardation coefficient. The tinuous range of pH values. Upon application of an elec-
neutral amides of glutamines and asparagines on the tric field, this mixture forms a stable gradient of pH in the
protein can hydrolyze randomly and in low yield during gel. Each protein migrates through this gradient until it
a harsh purification to produce anionic carboxylates, and reaches a pH equal to its isoelectric point where it can no
this modification leads to variations in 6i that produce longer move, and the proteins in a mixture are spread
multiple components from the same protein. Because upon the field in order of their respective isoelectric
these or other similar modifications are integral points. It is a technique that is less flexible than disc elec-
processes, the components that result from them are trophoresis because it separates molecules on the basis
usually evenly spaced upon the electrophoretogram,172 of only one property rather than three. It also seems to be
and the nature of the artifact can be recognized by this more sensitive to minor heterogeneities of charge than is
pattern.201,203,204 Each component, however, should be electrophoresis. Because, however, isoelectric focusing
biologically active if the protein is pure.203,204 detects heterogeneity of charge more successfully than
Although the coelectrophoresis of the purified pro- electrophoresis, it is an even more stringent test of the
tein and the biological activity is the most convincing cri- homogeneity of a protein.210 The coisoelectrofocusing of
terion of purity, occasionally the electrophoresis itself protein and biological activity,48,206,211-213 is an additional
destroys the activity.205 For this reason, or simply for per- criterion of purity independent from the observation of
sonal satisfaction, other criteria of purity are often used. coelectrophoresis. Isoelectric focusing has been com-
Immunoglobulins raised against the purified enzyme bined with electrophoresis to resolve complex mixtures
should behave on immunodiffusion and immunoelec- of proteins in two dimensions.214 When the clarified
trophoresis as expected of immunoglobulins directed homogenate produced from the cytoplasm of the bac-
against a single antigen. It is also encouraging when these terium E. coli was submitted to such a procedure, more
immunoglobulins are able to precipitate all of the protein than 1000 different proteins were represented upon the
and all of the biological activity19,206 but not essential, field (Figure 1–22).214 This display indicates the complex-
because some immunoglobulins are ineffective at ity of the mixture of proteins in a cell. From such a mix-
immunoprecipitation. Activity and protein should comi- ture, a single protein with a single biological activity is
grate on chromatography (Figures 1–6 and 1–10)207 or purified.
cosediment upon gradients of sucrose.208 Even more con-
vincing is the observation that the single band of protein Suggested Reading
observed upon electrophoresis of samples from fractions
Muro-Pastor, M. I., & Florencio, F. J. (1992) Purification and prop-
collected from the final chromatographic step increases erties of NADP-isocitrate dehydrogenase from the unicellular
in intensity and then decreases in intensity in concert cyanobacterium Synechocystis sp. PCC 6803, Eur J. Biochem.
with the increase and decrease of enzymatic activity, 203, 99–105.
respectively, across the peak.23,104
The grams of protein for every mole of binding site
is between 15,000 and 100,000 g mol-1 for most proteins. Heterogeneity
The concentration of protein (milligrams milliliter-1) and
the concentration of binding sites (moles liter-1) for a Often heterogeneity in a preparation of a purified pro-
ligand, such as an agonist or antagonist, known to be tein, observed as several different proteins capable of
specific for a desired protein, such as the respective being separated, is detected by electrophoresis or iso-
receptor, can be determined on samples from the same electric focusing even though all of the various compo-
solution. If the ratio of these two quantities lies within the nents are biologically active; often heterogeneity is
expected range and if only one protein can be discerned discovered in later experiments. This heterogeneity may
on electrophoresis, these observations are taken to be have a biological origin, for example, because of varying
convincing criteria of purity, especially if the value of levels of glycosylation or phosphorylation, and the vari-
grams mole-1 agrees with the measured molar mass of ous forms of the protein producing this heterogeneity
the protomer of the protein that has been purified. For may coexist in the tissue prior to homogenization, but
example, purified histidinol-phosphate transaminase usually the heterogeneity arises during the purification
binds 1 mol of pyridoxal phosphate for every 37,000 g of itself. Such heterogeneity is produced by processes that
protein,192 purified methylmalonyl-CoA mutase contains are minimized by avoiding extremes of pH through the
1 mol of adenosylcobalamine for every 73,000 g of pro- use of well-buffered solutions, by working at low tem-
tein,209 and purified a1-adrenergic receptor binds 1 mol peratures (0–5 ∞C), and by performing the purification in
of [3H]prazosin for every 69,000 g of protein.34 as short a period of time as possible.
Isoelectric focusing is a method for assessing That it is the purification itself producing the het-
purity that is based on electrophoresis. A gel of polyacryl- erogeneity often becomes apparent when a new, more
amide is cast from a solution containing a mixture of rapid, less debilitating method of purification is devised
48 Purification
Figure 1–22: Separation of proteins from the cytoplasm of the bacterium E. coli by electrophoresis in two dimensions.214 A sample (10 mg of
protein) from a homogenate of the bacteria, grown in the presence of [14C]amino acids, was submitted to isoelectric focusing (pH 3–10), under
conditions where the proteins were unfolded (9 M urea), on a cylindrical (0.25 cm ¥ 13 cm) gel of polyacrylamide. After the unfolded proteins
had reached their respective isoelectric points, the gel was removed from its tube, soaked in a solution of sodium dodecyl sulfate (SDS) to
coat the unfolded polypeptides with this anionic detergent, and the cylinder was laid across the top of a flat slab (14 cm ¥ 16 cm ¥ 0.3 mm).
The unfolded polypeptides separated by isoelectric focusing (IF) in the first dimension were then separated by electrophoresis (SDS) in the
second dimension. [14C]Polypeptides were located by placing the slab on photographic film and exposing the film for a long enough time that
the radioactive disintegrations in each spot of protein produced the dark spots seen in the figure. Reprinted with permission from ref 214.
Copyright 1975 Journal of Biological Chemistry.
for a certain protein, and the heterogeneity noted previ- known as peptidases. With the exception of a few pepti-
ously, the subject of many publications, simply disap- dases that are located in the cytoplasm such as the cal-
pears. When fructose-bisphosphatase was purified by a pains, which can be inactivated by chelating any free
shorter method,215 the previously studied requirement of calcium, most of the peptidases capable of degrading
the enzyme for alkaline conditions was no longer mani- the normal, native proteins in a cell are present in inac-
fest. When aconitate hydratase was purified by a more tive forms or are segregated from the cytoplasm of the
rapid procedure,216 it was isolated with its iron still cell in which they are located or in which they were pro-
attached. When glyceraldehyde-3-phosphate dehydro- duced. This segregation is accomplished by enclosing
genase from yeast was purified rapidly by affinity chro- the peptidases in tight, membrane-sealed packages, the
matography,217 the heterogeneous behavior in its lysosomes, or excreting them into the extracellular sur-
binding of ligands218-220 was no longer observed. roundings. Upon homogenization, the natural bound-
One of the most publicized causes of heterogeneity aries between the cytoplasm and the cellular
or artifactual alteration of a protein during its purifica- compartments containing these peptidases are
tion is digestion by peptidases.146,221 Proteins the bio- destroyed, and artifactual digestion of the proteins
logical role of which is to degrade other proteins are being purified can commence.
Crystallization 49
30. Roche, P.A., Moorehead, T.J., & Hamilton, G.A. (1982) 61. Barton, R.W., & Neufeld, E.F. (1971) J. Biol. Chem. 246,
Arch. Biochem. Biophys. 216, 62–73. 7773–7779.
31. Ha, S., Chang, E., Lo, M.C., Men, H., Park, P., Ge, M., & 62. Gerhart, J., Wu, M., & Kirschner, M. (1984) J. Cell Biol.
Walker, S. (1999) J. Am. Chem. Soc. 121, 8415–8426. 98, 1247–1255.
32. Caron, M.G., & Lefkowitz, R.J. (1976) J. Biol. Chem. 251, 63. Wu, M., & Gerhart, J.C. (1980) Dev. Biol. 79, 465–
2374–2384. 477.
33. Shorr, R.G., Strohsacker, M.W., Lavin, T.N., Lefkowitz, 64. Canals, F. (1992) Biochemistry 31, 4493–4501.
R.J., & Caron, M.G. (1982) J. Biol. Chem. 257, 65. Gilbert, W., & Mueller-Hill, B. (1966) Proc. Natl. Acad.
12341–12350. Sci. U.S.A. 56, 1891–1898.
34. Graham, R.M., Hess, H.J., & Homcy, C.J. (1982) J. Biol. 66. Skou, J.C. (1964) In Progress in Biophysics and
Chem. 257, 15174–15181. Molecular Biology (Butler, J.A.V., & Huxley, H.E., Eds.)
35. Cohen, S., Ushiro, H., Stoscheck, C., & Chinkers, M. Vol. 14, pp 131–166, Pergamon, New York.
(1982) J. Biol. Chem. 257, 1523–1531. 67. Kyte, J. (1971) J. Biol. Chem. 246, 4157–4165.
36. Kuhn, R.W., Schrader, W.T., Smith, R.G., & O’Malley, 68. Nour, J.M., & Rabinowitz, J.C. (1991) J. Biol. Chem. 266,
B.W. (1975) J. Biol. Chem. 250, 4220–4228. 18363–18369.
37. Sherrill, J.M., & Kyte, J. (1996) Biochemistry 35, 69. Ryu, S., & Tjian, R. (1999) Proc. Natl. Acad. Sci. U.S.A.
5705–5718. 96, 7137–7142.
38. Penefsky, H.S. (1977) J. Biol. Chem. 252, 2891–2899. 70. Trower, M.K., Buckland, R.M., & Griffin, M. (1989) Eur.
39. Briggs, M.R., Kadonaga, J.T., Bell, S.P., & Tjian, R. (1986) J. Biochem. 181, 199–206.
Science 234, 47–52. 71. Yoshioka, H., Nagasawa, T., & Yamada, H. (1991) Eur.
40. Hacker, K.J., & Johnson, K.A. (1997) Biochemistry 36, J. Biochem. 199, 17–24.
14080–14087. 72. Smigel, M.D. (1986) J. Biol. Chem. 261, 1976–1982.
41. Michel, C., Hartrampf, G., & Buckel, W. (1989) Eur. J. 73. Moczydlowski, E.G., & Fortes, P.A. (1981) J. Biol. Chem.
Biochem. 184, 103–107. 256, 2346–2356.
42. Bull, C., & Ballou, D.P. (1981) J. Biol. Chem. 256, 74. Layne, E. (1957) Methods Enzymol. 3, 447–454.
12673–12680. 75. Lowry, O.H., Rosebrough, N.J., Farr, A.L., & Randall, R.J.
43. Lau, S.M., Brantley, R.K., & Thorpe, C. (1989) (1951) J. Biol. Chem. 193, 265–275.
Biochemistry 28, 8255–8262. 76. Adams, M.W., Eccleston, E., & Howard, J.B. (1989) Proc.
44. Labourdenne, S., Brass, O., Ivanova, M., Cagna, A., & Natl. Acad. Sci. U.S.A. 86, 4932–4936.
Verger, R. (1997) Biochemistry 36, 3423–3429. 77. Bradford, M.M. (1976) Anal. Biochem. 72, 248–254.
45. Walde, P., & Luisi, P.L. (1989) Biochemistry 28, 78. Edsall, J.T., & Wyman, J. (1958) Biophysical Chemistry,
3353–3360. Vol. I, pp 263–282, Academic Press, New York.
46. Webb, M.R. (1992) Proc. Natl. Acad. Sci. U.S.A. 89, 79. Hofmeister, F. (1888) Arch. Exp. Pathol. Pharmakol. 24,
4884–4887. 247–260.
47. Horecker, B.L., & Kornberg, A. (1948) J. Biol. Chem. 175, 80. Cacace, M.G., Landau, E.M., & Ramsden, J.J. (1997). Q.
385–390. Rev. Biophys. 30, 241–277.
48. Noyes, B.E., & Bradshaw, R.A. (1973) J. Biol. Chem. 248, 81. Huang, X., Knoell, C.T., Frey, G., Hazegh-Azam, M.,
3052–3059. Tashjian, A.H., Jr., Hedstrom, L., Abeles, R.H., &
49. Duggleby, R.G., & Dennis, D.T. (1974) J. Biol. Chem. Timasheff, S.N. (2001) Biochemistry 40, 11734–11741.
249, 162–166. 82. Vogel, R., Fan, G.B., Sheves, M., & Siebert, F. (2001)
50. McClure, W.R., Lardy, H.A., & Kneifel, H.P. (1971) J. Biochemistry 40, 483–493.
Biol. Chem. 246, 3569–3578. 83. Arakawa, T., & Timasheff, S.N. (1982) Biochemistry 21,
51. Bazaes, S., Beytia, E., Jabalquinto, A.M., Solis de 6545–6552.
Ovando, F., Gomez, I., & Eyzaguirre, J. (1980) 84. Arakawa, T., & Timasheff, S.N. (1984) Biochemistry 23,
Biochemistry 19, 2300–2304. 5924–5929.
52. Parker, A.R., Moore, J.A., Schwab, J.M., & Davisson, V.J. 85. Arakawa, T., & Timasheff, S.N. (1982) Biochemistry 21,
(1995) J. Am. Chem. Soc. 117, 10605–10613. 6536–6544.
53. Kramer, P.R., & Miziorko, H.M. (1980) J. Biol. Chem. 86. Geisler, N., & Weber, K. (1981) FEBS Lett. 125, 253–256.
255, 11023–11028. 87. Tong, J.H., & Kaufman, S. (1975) J. Biol. Chem. 250,
54. Switzer, R.L. (1969) J. Biol. Chem. 244, 2854–2863. 4152–4158.
55. Kataoka, M., Shimizu, S., & Yamada, H. (1992) Eur. J. 88. Doolittle, R.F., Thomas, C., & Stone, W., Jr. (1960)
Biochem. 204, 799–806. Science 132, 36–37.
56. Moriyama, T., & Srere, P.A. (1971) J. Biol. Chem. 246, 89. Yang, Z., Kollman, J.M., Pandi, L., & Doolittle, R.F.
3217–3223. (2001) Biochemistry 40, 12515–12523.
57. Leloir, L.F., & Cardini, C.E. (1957) Methods Enzymol. 3, 90. Gerhart, J.C., & Holoubek, H. (1967) J. Biol. Chem. 242,
843–844. 2886–2892.
58. Cooper, J.L., & Meister, A. (1972) Biochemistry 11, 91. Fuller, G.M., & Doolittle, R.F. (1971) Biochemistry 10,
661–671. 1305–1311.
59. Donald, A., Sibley, D., Lyons, D.E., & Dahms, A.S. (1979) 92. Kautz, J., & Schnackerz, K.D. (1989) Eur. J. Biochem.
J. Biol. Chem. 254, 2132–2137. 181, 431–435.
60. Esaki, N., Nakamura, T., Tanaka, H., & Soda, K. (1982) 93. Beebe, J.A., & Frey, P.A. (1998) Biochemistry 37,
J. Biol. Chem. 257, 4386–4391. 14989–14997.
52 Purification
94. Uyeda, K., & Kurooka, S. (1970) J. Biol. Chem. 245, Chandrasekaran, E.V. (1982) J. Biol. Chem. 257,
3315–3324. 3987–3994.
95. Ahern, T.J., & Klibanov, A.M. (1985) Science 228, 126. Kitani, T., & Fujisawa, H. (1983) J. Biol. Chem. 258,
1280–1284. 235–239.
96. D’Alessio, G., & Josse, J. (1971) J. Biol. Chem. 246, 127. Grimshaw, C.E., Henderson, G.B., Soppe, G.G.,
4319–4325. Hansen, G., Mathur, E.J., & Huennekens, F.M. (1984) J.
97. Sabourin, P.J., & Bieber, L.L. (1982) J. Biol. Chem. 257, Biol. Chem. 259, 2728–2733.
7460–7467. 128. Caron, M.G., Srinivasan, Y., Pitha, J., Kociolek, K., &
98. Nimmo, G.A., & Coggins, J.R. (1981) Biochem. J. 197, Lefkowitz, R.J. (1979) J. Biol. Chem. 254, 2923–2927.
427–436. 129. Deutsch, D.G., & Mertz, E.T. (1970) Science 170,
99. Lau, E.P., Cochran, B.C., & Fall, R.R. (1980) Arch. 1095–1096.
Biochem. Biophys. 205, 352–359. 130. Chibber, B.A., Deutsch, D.G., & Mertz, E.T. (1974)
100. Sakurai, N., & Sakurai, T. (1997) Biochemistry 36, Methods Enzymol. 34, 424–432.
13809–13815. 131. Raeber, A.J., Riggio, G., & Waser, P.G. (1989) Eur. J.
101. Lee, F.J., Lin, L.W., & Smith, J.A. (1989) Eur. J. Biochem. Biochem. 186, 487–492.
184, 21–28. 132. Moomaw, J.F., & Casey, P.J. (1992) J. Biol. Chem. 267,
102. Bogard, M., Camadro, J.M., Nordmann, Y., & Labbe, P. 17438–17443.
(1989) Eur. J. Biochem. 181, 417–421. 133. Pfeuffer, E., Dreher, R.M., Metzger, H., & Pfeuffer, T.
103. Chen, M.W., Jahn, D., O’Neill, G.P., & Soll, D. (1990) J. (1985) Proc. Natl. Acad. Sci. U.S.A. 82, 3086–3090.
Biol. Chem. 265, 4058–4063. 134. Pang, I.H., & Sternweis, P.C. (1989) Proc. Natl. Acad. Sci.
104. Green, J.M., & Nichols, B.P. (1991) J. Biol. Chem. 266, U.S.A. 86, 7814–7818.
12971–12975. 135. Manenti, S., Sorokine, O., Van Dorsselaer, A., &
105. Zachariou, M., & Hearn, M.T. (1996) Biochemistry 35, Taniguchi, H. (1992) J. Biol. Chem. 267, 22310–22315.
202–211. 136. Pettersson, I., Kusche, M., Unger, E., Wlad, H., Nylund,
106. Hutchens, T.W., & Porath, J. (1986) Anal. Biochem. 159, L., Lindahl, U., & Kjellen, L. (1991) J. Biol. Chem. 266,
217–226. 8044–8049.
107. Sarngadharan, M.G., Watanabe, A., & Pogell, B.M. 137. Chang, G.G., Wang, J.K., Huang, T.M., Lee, H.J., Chou,
(1970) J. Biol. Chem. 245, 1926–1929. W.Y., & Meng, C.L. (1991) Eur. J. Biochem. 202, 681–688.
108. Mocali, A., & Paoletti, F. (1989) Eur. J. Biochem. 180, 138. Cheng, Q., Finkel, D., & Hostetter, M.K. (2000)
213–219. Biochemistry 39, 5450–5457.
109. Volonte, C., & Greene, L.A. (1992) J. Biol. Chem. 267, 139. Sugden, B., & Keller, W. (1973) J. Biol. Chem. 248,
21663–21670. 3777–3788.
110. Araki, C., & Arai, K. (1957) Bull. Chem. Soc. Jpn. 30, 140. Hsu, Y.P., & Kohlhaw, G.B. (1980) J. Biol. Chem. 255,
287–293. 7255–7260.
111. March, S.C., Parikh, I., & Cuatrecasas, P. (1974) Anal. 141. Cvetanoviac, M., Moreno de la Garza, M., Dommes, V.,
Biochem. 60, 149–152. & Kunau, W.H. (1985) Biochem. J. 227, 49–56.
112. Cuatrecasas, P., Wilchek, M., & Anfinsen, C.B. (1968) 142. Payne, M.E., Schworer, C.M., & Soderling, T.R. (1983) J.
Proc. Natl. Acad. Sci. U.S.A. 61, 636–643. Biol. Chem. 258, 2376–2382.
113. Cuatrecasas, P. (1970) J. Biol. Chem. 245, 3059–3065. 143. Kadonaga, J.T., & Tjian, R. (1986) Proc. Natl. Acad. Sci.
114. Steers, E., Jr., Cuatrecasas, P., & Pollard, H.B. (1971) J. U.S.A. 83, 5889–5893.
Biol. Chem. 246, 196–200. 144. Reardon, J.E. (1990) J. Biol. Chem. 265, 7112–7115.
115. Chan, W.W., & Takahashi, M. (1969) Biochem. Biophys. 145. Amaya, Y., Yamazaki, K., Sato, M., Noda, K., Nishino,
Res. Commun. 37, 272–277. T., & Nishino, T. (1990) J. Biol. Chem. 265, 14170–14175.
116. Berg, R.A., & Prockop, D.J. (1973) J. Biol. Chem. 248, 146. Pringle, J.R. (1970) Biochem. Biophys. Res. Commun.
1175–1182. 39, 46–52.
117. Geren, C.R., & Ebner, K.E. (1977) J. Biol. Chem. 252, 147. Nealon, K., Nicholl, I.D., & Kenny, M.K. (1996) Nucleic
2082–2088. Acids Res. 24, 3763–3770.
118. Lee, C.Y., Lappi, D.A., Wermuth, B., Everse, J., & Kaplan, 148. Nicholl, I.D., Nealon, K., & Kenny, M.K. (1997)
N.O. (1974) Arch. Biochem. Biophys. 163, 561–569. Biochemistry 36, 7557–7566.
119. Nealon, D.A., & Cook, R.A. (1979) Biochemistry 18, 149. Durchschlag, H., Biedermann, G., & Eggerer, H. (1981)
3616–3622. Eur. J. Biochem. 114, 255–262.
120. Ryan, R.L., & McClure, W.O. (1979) Biochemistry 18, 150. Tanford, C. (1962) Adv. Protein Chem. 17, 69–165.
5357–5365. 151. Tanford, C., & Hauenstein, J.D. (1956) J. Am. Chem. Soc.
121. Huang, J.S., Huang, S.S., & Tang, J. (1979) J. Biol. Chem. 78, 5288–5291.
254, 11405–11417. 152. Qin, B.Y., Bewley, M.C., Creamer, L.K., Baker, H.M.,
122. Allen, M.B., & Walker, D.G. (1980) Biochem. J. 185, Baker, E.N., & Jameson, G.B. (1998) Biochemistry 37,
565–575. 14014–14023.
123. Magnani, M., Serafini, G., Stocchi, V., Bossu, M., & 153. Carr, C.W. (1953) Arch. Biochem. Biophys. 46, 417–423.
Dacha, M. (1982) Arch. Biochem. Biophys. 216, 449–454. 154. Carr, C.W. (1956) Arch. Biochem. Biophys. 62, 476–
124. Kaufman, B.T., & Pierce, J.V. (1971) Biochem. Biophys. 484.
Res. Commun. 44, 608–613. 155. Matthew, J.B., Hanania, G.I., & Gurd, F.R. (1979)
125. Mendicino, J., Sivakami, S., Davila, M., & Biochemistry 18, 1928–1936.
References 53
156. Isupov, M.N., Antson, A.A., Dodson, E.J., Dodson, G.G., 187. Jovin, T.M. (1973) Biochemistry 12, 871–879.
Dementieva, I.S., Zakomirdina, L.N., Wilson, K.S., 188. Chrambach, A., Jovin, T.M., Svendsen, P.J., & Rodbard,
Dauter, Z., Lebedev, A.A., & Harutyunyan, E.H. (1998) D. (1976) in Methods of Protein Separation
J. Mol. Biol. 276, 603–623. (Catsimpoolas, N., Ed.) Vol. 2, pp 27–144, Plenum
157. Stout, T.J., Graham, H., Buckley, D.I., & Matthews, D.J. Press, New York.
(2000) Biochemistry 39, 8460–8469. 189. Cohn, E.J., Green, A.A., & Blanchard, M.H. (1937) J. Am.
158. Wedekind, J.E., Trame, C.B., Dorywalska, M., Koehl, P., Chem. Soc. 59, 509–517.
Raschke, T.M., McKee, M., FitzGerald, D., Collier, R.J., 190. Ruch, F.E., & Vagelos, P.R. (1973) J. Biol. Chem. 248,
& McKay, D.B. (2001) J. Mol. Biol. 314, 823–837. 8086–8094.
159. Tanford, C., Swanson, S.A., & Shore, W.S. (1955) J. Am. 191. Katze, J.R., & Konigsberg, W. (1970) J. Biol. Chem. 245,
Chem. Soc. 77, 6414–6421. 923–930.
160. Tanford, C. (1961) Physical Chemistry of 192. Henderson, G.B., & Snell, E.E. (1973) J. Biol. Chem. 248,
Macromolecules, Wiley, New York. 1906–1911.
161. Berne, B.J., & Pecora, R. (1976) Dynamic Light 193. Zampighi, G., Kyte, J., & Freytag, W. (1984) J. Cell Biol.
Scattering: With Applications to Chemistry, Biology, 98, 1851–1864.
and Physics, Wiley, New York. 194. Oakley, B.R., Kirsch, D.R., & Morris, N.R. (1980) Anal.
162. Roche, T.E., Powers-Greenwood, S.L., Shi, W.F., Zhang, Biochem. 105, 361–363.
W.B., Ren, S.Z., Roche, E.D., Cox, D.J., & Sorensen, C.M. 195. Switzer, R.C.R., Merril, C.R., & Shifrin, S. (1979) Anal.
(1993) Biochemistry 32, 5629–5637. Biochem. 98, 231–237.
163. Chien, W.J., Cheng, S.F., & Chang, D.K. (1998) Anal. 196. Kolhouse, J.F., Utley, C., & Allen, R.H. (1980) J. Biol.
Biochem. 264, 211–215. Chem. 255, 2708–2712.
164. Haner, R.L., & Schleich, T. (1989) Methods Enzymol. 197. Karawya, E., Swack, J.A., & Wilson, S.H. (1983) Anal.
176, 418–446. Biochem. 135, 318–325.
165. Tiselius, A., & Svensson, H. (1940) Trans. Faraday Soc. 198. Schachter, H., Sarney, J., McGuire, E.J., & Roseman, S.
36, 16–22. (1969) J. Biol. Chem. 244, 4785–4792.
166. Manning, G.S. (1981) J. Phys. Chem. 85, 1506–1515. 199. Li, J.J., Ross, C.R., Tepperman, H.M., & Tepperman, J.
167. Debye, P., & Huckel, E. (1923) Z. Physik 24, 305–325. (1975) J. Biol. Chem. 250, 141–148.
168. Henry, D.C. (1931) Proc. R. Soc. London, A 133, 106– 200. Morley, C.G., & Stadtman, T.C. (1970) Biochemistry 9,
129. 4890–4900.
169. Brown, R.A., & Timasheff, S.N. (1959) in Electrophoresis 201. Yu, C., Gunsalus, I.C., Katagiri, M., Suhara, K., &
(Bier, M., Ed.) pp 317–367, Academic Press, New York. Takemori, S. (1974) J. Biol. Chem. 249, 94–101.
170. Duke, J.A., Bier, M., & Nord, F.F. (1952) Arch. Biochem. 202. Takahashi, S., Kuzuyama, T., Watanabe, H., & Seto, H.
Biophys. 40, 424–436. (1998) Proc. Natl. Acad. Sci. U.S.A. 95, 9879–9884.
171. Borza, D.B., Tatum, F.M., & Morgan, W.T. (1996) 203. Olsen, A.S., & Milman, G. (1974) J. Biol. Chem. 249,
Biochemistry 35, 1925–1934. 4030–4037.
172. Carbeck, J.D., & Negin, R.S. (2001) J. Am. Chem. Soc. 204. Scott, W.A., & Tatum, E.L. (1971) J. Biol. Chem. 246,
123, 1252–1253. 6347–6352.
173. Pauling, L., & Itano, H.A. (1949) Science 110, 543– 205. Warnick, G.R., & Burnham, B.F. (1971) J. Biol. Chem.
548. 246, 6880–6885.
174. Velick, S.F. (1949) J. Phys. Colloid Chem. 53, 135– 206. Fernandez-Sorensen, A., & Carlson, D.M. (1971) J. Biol.
149. Chem. 246, 3485–3493.
175. Longsworth, L.G. (1959) in Electrophoresis (Bier, M., 207. Reed, B.C., & Rilling, H.C. (1975) Biochemistry 14,
Ed.) pp 137–177, Academic Press, New York. 50–54.
176. Tiselius, A. (1937) Trans. Faraday Soc. 33, 524–531. 208. Beytia, E., Dorsey, J.K., Marr, J., Cleland, W.W., &
177. Morris, C.J.O.R. (1966) in Protides of the Biological Porter, J.W. (1970) J. Biol. Chem. 245, 5450–5458.
Fluids (Peeters, H., Ed.) Vol. 14, pp 543–561, Elsevier, 209. Fenton, W.A., Hack, A.M., Willard, H.F., Gertler, A., &
Amsterdam. Rosenberg, L.E. (1982) Arch. Biochem. Biophys. 214,
178. Ferguson, K.A. (1964) Metabolism 13, 985–1002. 815–823.
179. Philippov, P.P., Shestakova, I.K., Tikhomirova, N.K., & 210. Arnold, W.J., & Kelley, W.N. (1971) J. Biol. Chem. 246,
Kochetov, G.A. (1980) Biochim. Biophys. Acta 613, 7398–7404.
359–369. 211. Norton, I.L., Pfuderer, P., Stringer, C.D., & Hartman,
180. Stahl, P.D., & Touster, O. (1971) J. Biol. Chem. 246, F.C. (1970) Biochemistry 9, 4952–4958.
5398–5406. 212. Mihalik, S.J., McGuinness, M., & Watkins, P.A. (1991) J.
181. Hames, D. (1990) in Gel Electrophoresis of Proteins Biol. Chem. 266, 4822–4830.
(Hames, B. D., & Rickwood, D., Eds.) pp 1–147, Oxford 213. Ohshita, T., Sakuda, H., Nakasone, S., & Iwamasa, T.
University Press, Oxford, U.K. (1989) Eur. J. Biochem. 179, 201–207.
182. Ornstein, L. (1964) Ann. N.Y. Acad. Sci. 121, 321–349. 214. O’Farrell, P.H. (1975) J. Biol. Chem. 250, 4007–4021.
183. Davis, B.J. (1964) Ann. N.Y. Acad. Sci. 121, 404–427. 215. Traniello, S., Melloni, E., Pontremoli, S., Sia, C.L., &
184. Kohlrausch, F. (1897) Ann. Phys. Chem. 62, 209–239. Horecker, R.L. (1972) Arch. Biochem. Biophys. 149,
185. Laemmli, U.K. (1970) Nature 227, 680–685. 222–231.
186. Kyte, J., & Rodriguez, H. (1983) Anal. Biochem. 133, 216. Kennedy, S.C., Rauner, R., & Gawron, O. (1972)
515–522. Biochem. Biophys. Res. Commun. 47, 740–745.
54 Purification
217. Gennis, L.S. (1976) Proc. Natl. Acad. Sci. U.S.A. 73, M., & Kleinschmidt, A.K. (1978) J. Mol. Biol. 123,
3928–3932. 595–606.
218. Kirschner, K., Eigen, M., Bittman, R., & Voigt, B. (1966) 227. Grant, G.A., Keefer, L.M., & Bradshaw, R.A. (1978) J.
Proc. Natl. Acad. Sci. U.S.A. 56, 1661–1667. Biol. Chem. 253, 2724–2726.
219. Sloan, D.L., & Velick, S.F. (1973) J. Biol. Chem. 248, 228. Suzuki, H., Li, S.C., & Li, Y.T. (1970) J. Biol. Chem. 245,
5419–5423. 781–786.
220. Mockrin, S.C., Byers, L.D., & Koshland, D.E., Jr. (1975) 229. Dixon, M., & Webb, E.C. (1964) Enzymes, pp 794–808,
Biochemistry 14, 5428–5437. Longmans, Green & Co., London.
221. Weber, K., Pringle, J.R., & Osborn, M. (1972) Methods 230. Cannata, J.J. (1970) J. Biol. Chem. 245, 792–798.
Enzymol. 26C, 3–27. 231. Shiokawa, H., & Noda, L. (1970) J. Biol. Chem. 245,
222. Lorand, L. (1970) Methods in Enzymology, Vol. 19, 669–673.
Academic Press, New York. 232. Iwai, K., & Taguchi, H. (1974) Biochem. Biophys. Res.
223. Lorand, L. (1976) Methods in Enzymology, Vol. 45, Commun. 56, 884–891.
Academic Press, New York. 233. Monteilhet, C., & Blow, D.M. (1978) J. Mol. Biol. 122,
224. Lorand, L. (1981) Methods in Enzymology, Vol. 80, 407–417.
Academic Press, New York. 234. McPherson, A. (1990) Eur. J. Biochem. 189, 1–23.
225. North, M.J. (1989) in Proteolytic Enzymes: A Practical 235. Pjura, P.E., Lenhoff, A.M., Leonard, S.A., & Gittis, A.G.
Approach (Benyon, R. J., & Bond, J. S., Eds.) pp 105–124, (2000) J. Mol. Biol. 300, 235–239.
IRL Press, New York. 236. Nunn, R.S., Artymiuk, P.J., Baker, P.J., Rice, D.W., &
226. Mackall, J.C., Lane, M.D., Leonard, K.R., Pendergast, Hunter, C.N. (1995) J. Mol. Biol. 252, 153.
Chapter 2
Electronic Structure
When proteins are submitted to chemical analysis, they tein are filled with lone pairs of electrons. Because s lone
are found to be composed of 20 amino acids: aspartic pairs of electrons are the only valence electrons that do
acid, asparagine, threonine, serine, glutamine, glutamic not participate in covalent bonds and because there are
acid, proline, glycine, alanine, cysteine, valine, methion- also lone pairs of electrons participating in p molecular
ine, isoleucine, leucine, tyrosine, phenylalanine, lysine, orbitals, to understand the details of molecular structure
histidine, tryptophan, and arginine. Each protein has dif- one must be able to distinguish localized s lone pairs of
ferent relative amounts of each of these amino acids. The electrons from delocalized p lone pairs of electrons. The
amino acids a protein contains are coupled together in a distinction between these two types of electrons is
particular order to create polymers 50–5000 amino acids reflected in their basicity, their ability to house a proton.
in length, referred to as polypeptides. To understand the Each lone pair of electrons in a molecule is a poten-
structure of molecules of protein, one must understand tial base, and each hydrogen in a molecule is a potential
the amino acids, the order in which they are connected, acid. Which lone pair will act as a base is determined by
and the way that these long polymers are folded up to the acid dissociation constant for its conjugate acid, and
produce the native conformation of the molecule. The which hydrogen will act as an acid is determined by its
first level of understanding is grounded in a firm knowl- own acid dissociation constant. Every lone pair is basic
edge of the bonding and molecular structure of small and every hydrogen is acidic, but most lone pairs are
molecules. The second level of understanding requires a such weak bases and most hydrogens are such weak
description of the complete covalent structure of the acids that their basicity or acidity can be ignored. To
polymers composing proteins. The third level of under- understand the atomic structure of a molecule of pro-
standing proceeds from crystallographic molecular tein, the significant acids and bases within it must be
models of proteins that are the products of X-ray crystal- identified and categorized. It is also necessary to distin-
lography. guish an acid dissociation, in which a proton leaves the
It is remarkable that each molecule of a particular molecule, from a tautomerization, in which protons
protein, if it has not been heterogeneously posttransla- redistribute among lone pairs of electrons within the
tionally modified, has the same covalent structure and molecule.
that when it is in its natural environment, the polypep- The chemical capacities available to a protein are a
tides from which it is composed assume the same few reflection of the amino acids from which it is con-
conformations even though the complete molecule of structed. Each of the 20 amino acids has its own peculiar
the protein is large. These two properties are foreign to a set of chemical capacities. These are mixed in a unique
synthetic chemist. Molecules produced synthetically are way by the amino acid sequence and the resulting native
either precise but small or large but heterogeneous. structure to produce those of the particular protein, but
Large heterogeneous polymers produced synthetically to understand the mixture, the properties of the ingredi-
seldom have defined structures. Yet a molecule of pro- ents must be understood. These properties include the
tein is made from atoms held together by the same cova- bonding and acid–base behavior of each of the 20 side
lent chemical bonds holding together the smaller chains of the amino acids. With the exception of the reg-
molecules to which one is already accustomed. All of the ular polyamide backbone of the polymer, the covalent
rules of bonding exerted with such inescapability in bonds, acidic hydrogens, and basic lone pairs of elec-
small molecules are as inescapable in a molecule of pro- trons that fill a molecule of protein are contributed by
tein. these side chains.
The covalent bonds holding the atoms together in
any molecule are pairs of electrons confined to molecu-
lar orbitals. The molecular orbitals are either localized p and s
s molecular orbitals or delocalized p molecular orbitals.
A distinction between these two types of molecular Molecules, including proteins, are arrays of atomic
orbitals is crucial to an understanding of bond lengths, nuclei required to maintain particular distances and
bond angles, and rotational motions about bonds. angular dispositions relative to each other by electrons
In addition to the covalent bonds, molecules of pro- confined to particular regions of space known as
56 Electronic Structure
: :
:
to one nucleus or distributed between or among particu- :O C O:
lar nuclei. The electrons, in their occupation of these ‘O O’
H C H
orbitals, create the covalent structure of the molecule.
H H
The electrons present in a molecule can be divided into H C H H ’O‘H H H
three categories, core electrons, p electrons, and s elec-
:
H N C C N: H N
:
trons, that reflect the degree to which they are confined
: :
C N H
:
and that define their chemical reactivity. O C H :O: H H H
H
Core electrons are the electrons that are immedi- H C H H
‘O’
ately adjacent to a nucleus. Aside from hydrogen, almost H
all of the atoms present in molecules of protein are either
Figure 2–1: Two ways of representing the electronic structure of
carbon, oxygen, or nitrogen. Each of these atoms has two N-acetylglutamate a-amide. (A) In the Lewis dot formula, each
core electrons spherically confined about the nucleus. main atom is surrounded by an octet of electrons and the total
Occasionally, sulfur or phosphorus occurs in a protein, number of electrons represented equals the sum of the number of
and these atoms each have 10 core electrons. Because valence electrons contributed by each neutral atom plus the ele-
they are confined close to the nucleus, the core electrons mentary molecular charge. The negative sign surrounded by a
circle locates formal charge. (B) In a s–p stereochemical represen-
provide the greatest electron density and are the promi- tation distinguishing types of electrons, a s bond is designated by
nent features in a map of electron density. They are, how- a line, a localized s lone pair of electrons is designated by two dots
ever, chemically inert. surrounded by a circle, a p bond is indicated by a second or third
Valence electrons are the outermost electrons sur- line between two atoms, and a p lone pair of electrons is shown by
rounding each atom. All of the chemistry of a molecule, two uncircled dots. The atoms are arranged in space to represent
the tetrahedral or trigonal geometry dictated by their respective
which is the consequence of its chemical bonds and its hybridizations.
sites of reactivity, results from these valence electrons.
Unless one electron is missing, as in the case of a radical,
or two electrons are momentarily missing, as in a carbo- Delocalization of a pair of electrons occupying one
cation, every carbon, nitrogen, oxygen, sulfur, or phos- p molecular orbital in such a system results from the fact
phorus in a molecule of protein can be formally that each p molecular orbital is a linear combination of
associated with eight valence electrons. By convention, the p orbitals that overlap. Each p molecular orbital is
these octets are assigned by Lewis structures. This for- spread over and shared by every atom that contributed a
malism divides valence electrons into bonding electrons p orbital to the system unless a node is located at that
and lone pairs of electrons and assigns formal charge to atom. When a pair of electrons occupies a p molecular
certain atoms. An example would be the Lewis structure orbital, it cannot be assigned to a particular atom,
of the model compound for glutamic acid in a polypep- notwithstanding the formal requirement of the Lewis dot
tide, N-acetylglutamate a-amide (Figure 2–1A). The structure that it be so localized for the purposes of book-
intent of a Lewis structure is to count valence electrons. keeping. Confusion between actuality and accounting
A pair of bonding electrons occupies a bonding sometimes leaves the impression that p electrons are
molecular orbital that is formed from the overlap of two localized.
or more atomic orbitals, each contributed by a different An example of a combination of p atomic orbitals is
atom in the molecule. These bonding electrons must be the system of p molecular orbitals that forms when four
clearly distinguished as occupants of either p molecular parallel p orbitals mix (Figure 2–2). The number of
orbitals, forming p bonds, or s molecular orbitals, form- p molecular orbitals that result from any combination of
ing s bonds. this type is always equal to the number of p orbitals that
The overlap of two or more adjacent and parallel have mixed; in this case there are four p molecular
p atomic orbitals on two or more adjacent atoms in a orbitals in the system. Each p orbital can be mixed in one
molecule creates a system of p molecular orbitals. Two of two phases, and adjacent p orbitals can be either in
adjacent p atomic orbitals can overlap only above and phase, in which case they overlap—a favorable interac-
below the line of centers between the two atoms from tion—or out of phase, in which case a node—an unfa-
which they are contributed (Figure 2–2). This geometry vorable interaction—occurs between them. A node is a
has two consequences: it prevents rotation about axes position at which the phase inverts. In a linear system
connecting the nuclei of adjacent atoms and it permits a such as the one shown in Figure 2–2, the number of
series of overlaps to occur simultaneously. Because rota- nodes increases by one for each molecular orbital in the
tion is prevented, structures containing a system of series.
p molecular orbitals are rigid. The fact that a series of Each of these four p molecular orbitals in Figure 2–2
overlaps can occur permits the electrons occupying a has an energy level associated with it that is equal to the
system of p molecular orbitals to be delocalized. energy one electron would experience were it confined
p and s 57
two resonance structures state that each of the three in the lowest bonding level would have half of their den-
atoms, the oxygen, the carbon, and the nitrogen, con- sity distributed over carbon, one-quarter over oxygen,
tributes a p orbital to the system of p molecular orbitals and one-quarter over nitrogen. The two electrons in the
because their bonding changes between the two Lewis middle nonbonding level would have half of their density
structures of the resonance pair. The resonance struc- distributed over nitrogen and half over oxygen.
tures state that the system of p molecular orbitals con- If the four p electrons were removed from the three
tains four p electrons because two of the pairs of atoms of the amide, the carbon and the oxygen would
electrons shift between the two structures. When three each have formal charges of +1 but the nitrogen would
adjacent p orbitals are mixed, three p molecular orbitals have a formal charge of +2, making it electron-deficient
are created (Figure 2–3). That four electrons occupy relative to the other two. If two pairs of p electrons
these three molecular orbitals places one pair in each of occupy the two undistorted p molecular orbitals of
the two molecular orbitals of lowest energy. If coulomb lowest energy, oxygen would end up with a formal charge
effects are disregarded for the moment, the two electrons of –"; carbon, 0; and nitrogen, +". This is the distribu-
tion of charge designated by the two resonance struc-
tures. Usually the resonance structures provide
information about the distribution of electrons in the
(H “
:O ) highest occupied molecular orbital or the distribution
N
H C ’ of electron deficiency in the lowest unoccupied molecu-
lar orbital. In the case of the amide, the resonance struc-
tures indicate that the pair of electrons in the highest
´
and each view, whether molecular orbitals or resonance drawn, especially if they contribute to what is being dis-
structures, has its appropriate use. cussed.
The first decision that must be made about the elec- Because of electron repulsion, a lone pair of elec-
tronic structure of any molecule is the location of all sys- trons on any oxygen or nitrogen unconjugated to a
tems of p molecular orbitals. Any carbon, nitrogen, or system of p molecular orbitals will occupy one of the
oxygen that has contributed a 2p atomic orbital to a sp3 orbitals of that atom. This s lone pair of electrons
system of p molecular orbitals has only two other resides at one of the tetrahedral vertices of the atom. A
2p atomic orbitals remaining to hybridize with its lone s lone pair of electrons is a lone pair confined to a single
2s atomic orbital, but any carbon, nitrogen, or oxygen atom because it resides within a hybridized or unhy-
that is not involved in a system of p molecular orbitals has bridized atomic orbital that does not overlap with any
three 2p atomic orbitals to hybridize with its 2s atomic other atomic orbital from another atom. A s lone pair of
orbital. It is these hybrids between s atomic orbitals and electrons is designated in a s–p stereochemical repre-
p atomic orbitals that overlap to form s bonds. These sentation by enclosing it within a circle (Figures 2–1B and
s bonds lie along the line of centers between the two 2–3) to symbolize its confinement. A s–p stereochemical
respective atoms that they connect, and they are local- representation (Figure 2–1B) is a drawing of the mole-
ized. Because they are localized, they are usually stronger cule that indicates the bond angles and angles of s lone
covalent bonds than p bonds, and as a result every pair of pairs and distinguishes s lone pairs of electrons from
atoms joined by one or more than one covalent bond p lone pairs of electrons.
must be joined by one s bond. These s bonds form the If an oxygen or nitrogen containing a lone pair of
molecular skeleton defining the structure of the mole- electrons is sterically able to rotate until that lone pair is
cule, in particular its bond angles. This skeleton is the parallel to an immediately adjacent system of p molecu-
s structure of the molecule. Each s bond is also an occu- lar orbitals and sterically able to rehybridize to sp2 at its
pied molecular orbital, but this realization is not inform- three remaining bonded positions, the lone pair of elec-
ative in issues of molecular structure. In the particular trons is capable of entering the system of p molecular
instance of molecules in biological situations, when an orbitals. For the lone pair of electrons to do this, the atom
atom has contributed one p orbital to a system of p mol- carrying it must rehybridize. This rehybridization
ecular orbitals, it will almost always be hybridized [p, sp 2, requires sufficient energy to overcome the electron
sp 2, sp 2]. At that atom in the s structure, the molecule will repulsion that originally placed the lone pair in an
be planar, and the s covalent bonds and s lone pairs will sp3 orbital. The favorable energy resulting from the
radiate within that plane in three directions from the delocalization of the lone pair into the system of p mole-
atom at approximately 120 ∞ angles. When an atom has cular orbitals must exceed this deficit. If it does, the lone
not contributed a p orbital to a system of p molecular pair of electrons becomes a delocalized p lone pair of
orbitals, it will almost always be hybridized [sp 3, sp 3, sp 3, electrons, occupying a p molecular orbital spread over
sp 3]. At that atom in the s structure, s covalent bonds or two or more atoms. It is so designated in a drawing by not
s lone pairs will radiate in four directions tetrahedrally, enclosing it within a circle (Figures 2–1B and 2–3) to indi-
at angles of approximately 109.5 ∞. cate its unconfinement.
Because the s structure incorporates these bond When either oxygen or nitrogen has contributed
orders and bond angles, it dictates the details of molecu- only one of its p orbitals to a system of p molecular
lar structure. These details cannot be appreciated until orbitals and is left with three valence orbitals, one
decisions on hybridization can be made correctly. To 2s orbital and two 2p orbitals, it is usually assumed that
pursue an earlier example, the oxygen, carbon, and they mix to form three sp2 orbitals that lie together within
nitrogen of an amide are each contributing a p orbital a plane normal to the system of p molecular orbitals and
to the system of p molecular orbitals, and each is are arrayed at 120 ∞ angles. If there are two or three cova-
hybridized [p, sp2, sp2, sp2]. In the s structure each of lent s bonds to the heteroatom, the hybridization is usu-
these three atoms and all of the s bonds and s lone pairs ally [p, sp2, sp2, sp2] because sp2 orbitals provide
of electrons radiating from them are in a plane, and each maximum overlap in a s bond. Thus a single lone pair
bond angle is approximately 120 ∞ (Figure 2–3). left on a nitrogen that has contributed only one p orbital
Lone pairs of electrons are identified by writing a and one of its valence electrons to a system of p molecu-
Lewis structure of the molecule. Thereafter, it is conven- lar orbitals and also participates in two s bonds is always
tional to ignore them, on the assumption that everyone a s lone pair in an sp2 orbital, and it is designated as such
knows that they are there. This assumption is somewhat by surrounding it with a circle. An example of such a lone
vain; it seems to say that if you do not always realize that pair is the lone pair on a nitrogen in an imine:
they are there, you are not someone. Because lone pairs
of electrons are of paramount importance in biochem- H
istry and because an understanding of a biologically N
important molecule is incomplete if ever they are forgot-
’
ten, it is safer to include them explicitly in any structure 2–1
60 Electronic Structure
The situation becomes ambiguous, however, in the tion, nor the positions of s lone pairs, which are local-
case of an oxygen that has contributed a p orbital and ized, can be affected by resonance. It necessarily follows
one valence electron to a system of p molecular orbitals, that when one draws two or more resonance structures,
participates in one s bond, and remains with two lone one must make certain that the same s structure is pres-
pairs of electrons. An example of such an oxygen would ent in each resonance structure and that only the dispo-
be an acyl oxygen or the oxygen of a carbonyl (Figure sition of p electrons differs among them. The best way to
2–4). The possibility arises that such an oxygen is ensure this is to draw a s structure for the second reso-
hybridized [p, p, sp, sp]. In this case, one lone pair would nance structure identical to the s structure of the first
occupy an sp orbital in line but opposite to the s bond resonance structure before putting in the p electrons,
between the carbon and the oxygen, and the other lone and to draw an identical s structure for each of the suc-
pair would occupy a p orbital normal to both the p bond cessive resonance structures before putting in the p elec-
and the axis of the two sp orbitals (Figure 2–4A). Indeed, trons. Always include all s lone pairs oriented as the
there is evidence from ultraviolet spectra and mass spec- hybridization of each atom requires. After the set of valid
tra of isolated carbonyl compounds that this occurs. The resonance structures has been exhausted, look closely at
alternative possibility is that oxygen is hybridized [p, sp 2, any lone pair that did not participate and decide if it
sp 2, sp 2] and that both lone pairs are in sp 2 orbitals might not be a s lone pair. If it is not completing an aro-
(Figure 2–4B). The decision between these two alterna- matic complement or being withdrawn by an adjacent
tives is not an insignificant one, for oxygens that have p bond, it is probably a s lone pair in a s orbital confined
contributed one p orbital and one valence electron to a to only the one atom.
system of p molecular orbitals and participate in only When the atoms contributing the p orbitals to a
one s bond are by far the majority of the oxygen atoms in system of p molecular orbitals form an unbroken ring
a molecule of protein. In a hydrogen-bonding environ- rather than being branched or linearly arrayed, the pos-
ment, such as the water in which all biochemistry occurs, sibility of aromaticity arises. In a continuous ring of
it appears that these oxygens place their two lone pairs in p orbitals of any size, the energy levels of the individual
two sp 2 orbitals. This follows from the fact that, in crys- p molecular orbitals are arrayed in a peculiar pattern.
tallographic molecular models of small molecules in The p molecular orbital with the lowest energy is always
which an N–H forms a hydrogen bond with such a car- the completely overlapping ring of p atomic orbitals in
bonyl or acyl oxygen, the nitrogen–hydrogen s bond of phase with no nodes other than the one at the nuclear
the N–H usually points to the location where an sp 2 lone plane. This p molecular orbital is occupied by two elec-
pair of electrons would be located.5 On the basis of this trons. If coulomb effects were disregarded, the other
observation, it will be assumed that acyl oxygens are bonding p molecular orbitals in the ring would always
hybridized [p, sp 2, sp 2, sp 2], and their two lone pairs will come in pairs that have identical energies. Because of
both be designated as sp 2 by enclosing them in circles at Hund’s rule, no such pair of orbitals can be filled with
120 ∞ angles to the carbon–oxygen bond (Figure 2–1B).6 electrons to form a stable closed shell until four electrons
These are s lone pairs of electrons, they lie within a plane have been provided simultaneously. These two features,
shared with the carbon–oxygen s bond and normal to the one continuous ring occupied by a pair of p electrons
the plane of the carbon–oxygen p bond (Figure 2–4B). and the pairs of orbitals of higher energy occupied by
The s structure of a molecule is the basic skeleton quartets of p electrons, define an aromatic system of
producing the s bonds, the bond angles, and the fixed p molecular orbitals. An aromatic p molecular orbital
positions of the localized s lone pairs. The p electrons system is an unbroken ring of parallel p orbitals occu-
are spread over this skeleton above and below the atoms pied by 2, 6, 10, 14, or 18 p electrons.
contributing the p orbitals. Therefore, neither the bond From these rules it is clear that a phenyl ring is aro-
angles of the molecule, which are defined by hybridiza- matic, but it is the aromatic nitrogen heterocycles such
as pyridine and pyrrole
A B
:
N
N
H
C C “
O O”
: ” ” pyridine pyrrole
2–2 2–3
Figure 2–4: Two alternative hybridizations for an oxygen in a car- that are more interesting examples. Pyridine is a neutral,
bonyl. (A) One lone pair is in an sp orbital collinear with the
carbon–oxygen bond, and the other is in the p orbital orthogonal to
six-membered ring with one nitrogen. Each carbon con-
the double bond. (B) Both lone pairs are in sp 2 orbitals in the tributes one valence electron to the p system, so nitrogen
s plane. can contribute only one to complete the sextet of the
p and s 61
aromatic system. This leaves a neutral nitrogen with two The chemical properties of s and p lone pairs of
remaining valence electrons that end up as a lone pair in electrons are remarkably different. This difference is most
the s structure confined to an sp2 orbital. Pyrrole, how- clearly expressed in their behavior as bases, and it is the
ever, is a five-membered ring. Each carbon again basicity of a lone pair of electrons that, in questionable
contributes one valence electron to the system of p mol- cases, indicates whether it is a s or p lone pair of elec-
ecular orbitals, and nitrogen provides the two required to trons. When the basicity of the lone pair is relied upon as
complete the sextet required for the aromatic system. a criterion, a proton is being used to probe its availabil-
Nitrogen is left with one valence electron and forms a ity. Lone pairs of electrons in p systems are far less basic
covalent N–H bond to finish the neutral molecule. than those in s orbitals because s lone pairs of electrons
Pyridinyl and pyrrolyl nitrogens appear throughout are localized and directionally oriented by the atomic
aromatic heterocycles. A nitrogen can be identified as orbital in which they are confined, whereas p lone pairs
one or the other by whether one or two of its valence of electrons are delocalized and immersed within the
electrons are used to complete the aromatic system of 6, system of p molecular orbitals.
10, 14, or 18 p electrons.
An interesting heterocycle that serves as an exam- Problem 2–1: Draw s–p stereochemical structures as in
ple of the application of these considerations is Figure 2–1B for the N-acetyl a-amides of aspartate,
porphine: asparagine, glutamine, proline, methionine, tyrosine,
tryptophan, phenylalanine, histidine, and arginine.
Acids and Bases Reaction 2–1 in the direction written and the more likely
is it in the opposite direction. Because water is the same
The quantitative measure of the basicity of a lone pair of in all acid dissociations, the difference in pK a between
electrons is the microscopic acid dissociation constant of two acids is proportional to the free energy for transfer-
its conjugate acid. In this way all lone pairs of electrons ring a proton from the one acid to the conjugate base of
are related to the lone pair on a molecule of water. The the other. The smaller the pKa, the more acidic is the acid
reaction that defines a microscopic acid dissociation for and the less basic, or less available, is the lone pair of
a particular proton in a molecule is electrons on its conjugate base, and vice versa. There are
several properties of the position from which the proton
’O‘ H (‘ dissociates that affect the value of its microscopic pKa.
O
XH + 1 X” + (2–1) The atomic number of the central atom from which
H H H H the proton dissociates and on which the lone pair
remains has a profound effect (Table 2–1). Within the
The central atom in a microscopic acid dissociation is same period of the periodic table, as electronegativity
the atom directly bonded to the proton that dissociates.* increases to the right, for example, carbon, nitrogen,
The lone pair on the resulting conjugate base is usually oxygen, the atom is more capable of supporting the lone
localized on the central atom (as represented in pair, and the acidity increases. Atoms in lower periods
Reaction 2–1) when it is oxygen, nitrogen, or sulfur but hold a lone pair of electrons in a larger atomic orbital,
usually delocalized when it is carbon. In a microscopic making it easier to support. For example, a proton on
acid dissociation, the acid is a position within the mole- sulfur is more acidic than one on oxygen. Because a
cule from which a proton can dissociate to produce a localized s lone pair of electrons on carbon is such a
lone pair of electrons, and the base is a lone pair of elec- strong base, the only time that there is a lone pair associ-
trons with which a proton can associate. Because the ated with carbon in biochemical situations is when it is a
reaction occurs in aqueous solution, a bare proton is delocalized p lone pair of electrons. Because nitrogen
transferred between the lone pair of the base and a lone and oxygen are more electronegative elements than
pair on a molecule of water and back again. Every acid is carbon, delocalized p lone pairs of electrons associated
always present in solution with a finite concentration of with these elements are rarely bases in biochemical situ-
its conjugate base, and every base is always present in ations, and bases on these atoms are almost always local-
solution with a finite concentration of its conjugate acid. ized s lone pairs of electrons.
The equilibrium constant for Reaction 2–1 is The successive creation of negative elementary
charge on the same polyprotic acid causes each dissoci-
[ H3O+ ][ 9X ]
K eq = (2–2)
[ H2O ][ HX ] Table 2–1: Electronic Properties Affecting Values of the
Acid Dissociation Constant
Because [H2O] = 55 M at all times, this term is passed to
the left, and for convenience [H3O+] is written as [H+].** effect of identity of central atom on acidity11
CH4 < NH3 < OH2 < SH2
These substitutions produce the definition of the micro- pKa = 48 pKa = 38 pKa = 15.7 pKa = 7.0
scopic acid dissociation constant:
effect of creation of charge on acidity12
PO4H3 > PO4H2– > PO4H2–
[ H+ ][ 9X ] pKa = 2.1 pKa = 7.2 pKa = 12.7
Ka ∫ (2–3)
[ HX ] +
H3NCH2CH2NH2 < +H3NCH2CH2NH3+
pKa = 9.98 pKa = 7.52
A microscopic acid dissociation constant is the acid dis- effect of hybridization of the central atom on acidity11–13
sociation constant of a particular proton in a polyprotic HC∫CH > H2C=CH2 > H3CCH3
acid. An acid dissociation constant is usually presented pKa = 25 pKa = 44 pKa = 50
as a pKa, where pKa ∫ -log Ka, solely for convenience. A HC∫NH+ > pyridine > H3CNH3+
theoretical justification of this practice is that the pKa is pKa = –10 pKa = 5.2 pKa = 10.6
directly proportional to the change in free energy for CH3HC=OH+ > CH3CH2OH2+
Reaction 2–1. The larger the pK a, the less likely is pKa = – 6 pKa = –2
anion, the conjugate base of ethanol (pKa = 16),12 is that The separation of charge in the structure on the right is
the basic lone pairs of electrons in the acetate anion are the reason that the lone pair of electrons on the alkylated
hybridized sp2 (Figure 2–5) rather than sp3. The system of oxygen or protonated oxygen is less delocalized than the
p molecular orbitals of the acetate anion, composed of p lone pair of electrons on an unalkylated or unproto-
four p electrons in a three-atom system (Figure 2–3), nated oxygen in the carboxylate anion. Nevertheless, the
does not provide a pair of electrons to be protonated, bond between the protonated or alkylated oxygen and
notwithstanding any drawing suggesting this to be the the acyl carbon retains some of the double-bond charac-
case. It is a s lone pair of electrons orthogonal to the ter indicated by the less advantageous form on the right.
system of p molecular orbitals that is protonated, and This is manifested in the almost 120 ∞ angle (116.5 ∞)
acetate anion cannot be used as an example of the between an alkyl carbon and an acyl carbon at the
decrease in basicity that results when the lone pair of oxygen of an ester and a shortening of the bond between
electrons created upon the departure of the proton is the oxygen of an ester and the acyl carbon by 0.09 nm rel-
conjugated to a p system. ative to carbon–oxygen bonds between sp 2 carbons and
There is an indirect effect of conjugation on the oxygens in aryl and vinyl compounds.16 Therefore, an
acidity of a carboxylic acid such as acetic acid. When one ester or the conjugate acid of a carboxylic acid retains the
of the s lone pairs on the acetate anion is protonated or overlap of the system of p molecular orbitals, but the
alkylated, the functional group is no longer symmetric, overlap is considerably weakened relative to the unalky-
and the oxygen that has been so modified becomes more lated or unprotonated anion. During protonation of a
electronegative. This change withdraws more electron carboxylate anion, the delocalization in the orthogonal
density onto the protonated or alkylated oxygen, as indi- p system is considerably diminished, and this effect
cated by the resonance structures: destabilizes the conjugate acid and lowers the pKa. A sim-
ilar but less pronounced effect of a decrease in delocal-
CH 3 CH 3 ization upon protonation occurs with phenol. In the case
“ “
O: O ´
(
O :O ) of phenol, the conjugation in the anionic conjugate base
‘ ’ ‘ ’ is weaker than that in the anionic conjugate base of a car-
boxylic acid because the elementary negative charge is
2–5 distributed over the oxygen and three carbons.
Consequently, the effect of diminishing this conjugation
upon protonation is less, and phenol is a weaker acid
syn syn than acetic acid.
“ :“ ) The acetate anion illustrates another property of a
O O system of p molecular orbitals—its ability to redistribute
anti ‘ C ’ anti
charge. The elementary negative charge in the acetate
anion is shared between the two oxygens because the
´
:
:
H N9 N9
:
2 4 2 4
N N3 N3
´ H “ O “ O
OH OH
HO HO
:
N( N OH OH
H H
2–11 2–12
2–7
Guanosine Adenosine
G A
so that the elementary positive charge on the nitrogen is
delocalized. In the opposite sense, an example of a shift
Each of these nucleosides, uridine, cytidine, guanosine,
of electron density away from the central atom occurs in
and adenosine, is composed of the base itself, uracil,
the p-nitrophenolate anion, whose associated pKa is 7.2,
cytosine, guanine, and adenine, respectively, and a ribo-
compared to the phenolate anion, whose associated pKa
syl group attached to N1 or N9 of that base. The nucleo-
is 10.0. This can be explained by the resonance structure
side bases are hybrid structures of aromatic heterocycles
and amides. The most aromatic base is adenine. It is sus-
“ “ )“ “)
( O) O ( O
ceptible to electrophilic aromatic substitution at carbons
‘O ’ ‘ ’ 2 and 8 but is also susceptible to nucleophilic substitu-
:
N
:
N
tion at carbon 6 in reactions that resemble acyl exchange.
The most amidic base is uracil. It is unambiguously an
´
N-acyl-N¢-alkenyl-N¢-ribosylurea. The carbon–carbon
double bond in uracil has almost olefinic character. It is
: ) susceptible to addition reactions, unlike the system of
‘O’ ‘O’ p molecular orbitals in an aromatic compound, which
2–8 would be susceptible only to substitution.
The nucleoside bases in adenosine, guanosine, and
In each of these examples, the redistribution of charge cytidine have exocyclic nitrogens resembling the nitro-
among electronegative atoms is accomplished by the gen in aniline. The lone pairs of electrons on these nitro-
highest occupied molecular orbital of the p system, gens are even more delocalized than the one on the
which is spread over the whole molecule. nitrogen in aniline (pKa = 4.6) because the pKa for the
The microscopic pKa of an acid–base is determined conjugate acid of each of these nitrogens in the nucleo-
by a combination of all of these properties: the elec- side bases17 is less than or equal to -2, similar to that for
tronegativity and hybridization of the central atom, any N-protonated urea (pKa < -4). Therefore, each of these
creation of charge, the inductive effect, any delocaliza- exocyclic nitrogens is planar and trigonal, as is always
tion of the lone pair of electrons, and any redistribution depicted in drawings of the base pairs.
of charge. The nucleoside bases in uridine, cytidine, and
The biological molecules that illustrate most exten- guanosine have exocyclic oxygens resembling the oxygen
sively the various aspects of bonding and acid–bases dis- in an amide. The values of pKa for the conjugate acids of
cussed so far are the bases of the nucleosides. these exocyclic oxygens are 0.5, <4.2, and <1.6, the upper
limits being the values of pKa for the N-protonated tau-
Pyrimidines tomers. These values can be compared to -0.7, the pKa
for the oxygen of acetamide.18 The values of pKa for these
’O‘ H
N:
H
oxygens in the iminolic tautomers of these three bases,
3 4 4 estimated from the measured19 or theoretical20,21 values
H
N: 5 ’N3 5
2 2
for the equilibrium constants between the amidic and
’O ’O iminolic tautomers, are 5, 4, and -3 for uridine, cytidine,
:
:
6 6
1N 1N
“ 5¢ “ and guanosine, well below the value of 10 for the pKa of
1¢ O O
OH OH phenol. These values of pKa as well as the fact that the
2¢ HO
HO 3¢
4¢
iminol tautomers are far less stable than the amidic tau-
OH OH tomers are the justification for depicting these oxygens
2–9 2–10 as acyl oxygens.
Uridine Cytidine There are two types of calculations performed with
U C acid–bases. The pH of a solution to which a weak acid or
66 Electronic Structure
weak base has been added can in theory be calculated, and conjugate base required for the buffer. A buffer is a
and the ratio of the molar concentrations of conjugate solution of an acid and its conjugate base, both present
acid and conjugate base in a solution of a given pH can at high enough concentrations so that the acid can
in practice be calculated. neutralize bases added to the solution and the base
The calculation of the pH of a solution upon the can neutralize acids added to the solutions, and between
addition of an acid–base is an exercise in simultaneous them they can keep the pH of the solution constant.
equations. The problem takes the form “Calculate the pH A problem concerning a buffer can be stated
of a solution to which 0.1 mol of sodium acetate has been “Calculate the number of moles of acetic acid and
added for every liter.” The equations always used are the sodium acetate that are present in 2.00 L of an 0.1 M
conservation of mass solution of acetate plus acetic acid at pH 5.5.” This prob-
lem requires only two simultaneous equations, Equa-
[ HOAc ] + [ –OAc ] = 0.1 M tions 2–7 and 2–9. Because [H+] is given as 3.16 ¥ 10–6 M,
(2–7)
there are only two unknowns, [HOAc] and [–OAc], which
are 0.015 and 0.085 M, respectively. The answer is
where HOAc is acetic acid and –OAc is acetate anion; the 0.03 mol of acetic acid and 0.17 mol of sodium acetate.
conservation of charge The quantitative behavior of the concentrations of
the conjugate acid and conjugate base of each acid–base
[ –OAc ] + [ –OH ] = [ Na+ ] + [ H+ ] (2–8) is described by a titration curve (Figure 2–6) that relates
the fraction of the acid–base in the form of the conjugate
where [Na+] = 0.1 M; the acid dissociation constant or acid or in the form of the conjugate base to the pH of the
constants solution. This can be presented as the fraction itself
(Figure 2–6A), as is usually done, but this presentation
leaves the erroneous impression that the fraction of acid
[ –OAc ][ H+ ] goes to zero about 2 pH units above the pKa and the frac-
Ka = M (2–9)
[ HOAc ] tion of base goes to zero about 2 pH units below the pKa.
This misimpression is corrected by examining the loga-
where pKa = 4.75 and Ka = 1.78 ¥ 10–5 M; and the water rithms of the fractions as a function of pH (Figure 2–6B).
constant It can be seen that finite fractions of both acid and base
are still present at high and low pH, respectively. At a dis-
tance of 2 pH units above the pKa, 1% of the acid–base is
[ H+ ][ –OH ] = 10 –14 M 2 (2–10) in the form of the acid, and this percentage drops off by
a factor of 10 for every rise of unit of pH but never reaches
These comprise four—or if necessary more, depending zero. The importance of this point is that often only one
on the number of dissociation constants the acid has— species of the acid–base participates in a chemical reac-
independent simultaneous equations with four, or if nec- tion, yet the reaction will occur quite well at a pH where
essary more, unknowns. In the case of acetate, the four the reactive species is present at only 1% or 0.1% or 0.01%
unknowns are [H+], [–OH], [–OAc], and [HOAc]. These or less of the total acid–base. Protonation and deproto-
four equations with four unknowns can be readily solved nation are extremely rapid, and as the minor but reactive
for [H+] (1.33 ¥ 10–9 M) if the assumption is made that [H+] species is consumed in the reaction, it is continuously
in Equation 2–10 is negligible relative to the other terms. replaced.
The value of this exercise is that the creation of the
simultaneous equations and the cancellation of certain Problem 2–3: Complete the following acid–base equi-
terms to avoid a cubic or quadratic equation requires an libria. Draw the structures of the conjugate base and the
understanding of the acid–base chemistry that is occur- acid in s–p stereochemical representation (Figure 2–1B).
ring in the solution. For example, one is required to know
that the only ions that can be present are H+, Na+, –OAc, O OH
and –OH and that sodium acetate is a base so the con- H
A. 3C C + H 3O + D. H 3C OCH 3 + H 3O +
centration of protons in the final solution will be small. H OH
The calculation of the concentrations of a conjugate
acid and its conjugate base at a given pH fulfills one of O
two purposes. First, if the solution contains an acid–base B. H 3C C + H 3O + E. CH 3CH 2OH + H 3O +
of experimental interest, such as an acid–base in a mole- H
cule of protein, this calculation will provide the molar con-
centrations of the conjugate acid and the conjugate base
NH 2 O
of that acid–base. Second, if a particular buffer is used to
stabilize the pH at a particular value, this calculation can C. H 3C OH + H O +
3 F. H 3C C + H 3O +
be used to determine the concentrations of conjugate acid OCH 3 NH 2
Acids and Bases 67
A 1
compound pKa
buffer pKa (
C2H5OH (CH3)3NC2H4OH
N-(2-sulfoethyl)morpholine (MES) 6.1 pK a = 16.0 pK a = 13.9
1,4-bis(2-sulfoethyl)piperazine (PIPES) 6.8
N,N-bis(2-hydroxyethyl)-2-aminoethanesulfonic acid (BES) 7.1
N
N-(3-sulfopropyl)morpholine (MOPS) 7.2
N NH N NH
N-(2-sulfoethyl)-2-amino-1,3-dihydroxy-2- 7.4
hydroxymethylpropane (TES)
pK a = 9.51 pK a = 14.5
1-(2-hydroxyethyl)-4-(3-sulfoethyl)piperazine (HEPES) 7.5
1-(2-hydroxyethyl)-4-(3-sulfopropyl)piperazine (EPPS) 8.0
N-[2-hydroxy-1,1-bis(hydroxymethyl)ethyl]glycine (Tricine) 8.1 m-nitroaniline p-nitroaniline
N,N-bis(2-hydroxyethyl)glycine (Bicine) 8.3 pK a = 4.88 pK a = 6.16
N-(3-sulfopropyl)-2-amino-1,3-dihydroxy-2- 8.4
hydroxymethylpropane (TAPS)
N-(2-sulfoethyl)cyclohexylamine (CHES) 9.3 CH3NO2 CH4
pK a = 10.3 pK a = 40
Why is HEPES more acidic than EPPS?
Problem 2–7: What two effects in combination cause
Problem 2–6: From the following list, select the reason the pKa of the methyl ester of 2-methoxypropenoic acid
for the difference in pKa between the two molecules in (–3.37) to be 0.9 unit less than that of dimethylether
each pair presented below. (–2.5)?13,23
(A) hybridization
(B) electronegativity Problem 2–8: What are the exact pHs of the following
(C) p donation solutions?
(D) s donation 10–2 M acetic acid
(E) p withdrawal 10–2 M imidazolium acetate
(F) s withdrawal–induction 5 ¥ 10–2 M sodium dihydrogen phosphate
(G) aromaticity 5 ¥ 10–2 M aniline
10–3 M pyridinium chloride
OH 10–2 M p-nitroanilinium chloride
OH
10–2 M morpholine
5 ¥ 10–2 M sodium 2,2-difluoroethoxide
O
Problem 2–9: Calculate the concentration of imidazo-
O CH3 CH3 late anion in a 0.1 M solution of imidazole at pH 9.52.
pK a = 8.05 pK a = 9.19
Problem 2–10: Determine the molar concentrations of
each species of the weak acids and weak bases in the fol-
lowing solutions.
N
solute and concentration pH
H N
pK a = 11.2 pK a = 5.2 0.4 M 1-aminobutane 6.5
0.2 M 1-aminobutane 11.0
0.05 M p-chlorophenol 12.0
(CF3)3COH (CF3)2HCOH 0.01 M p-chlorophenol 7.3
pK a = 5.4 pK a = 9.3 0.01 M p-methylaniline 5.0
0.001 M p-methylaniline 2.0
0.03 M 2-aminoethanethiol 9.2
0.08 M 2-aminoethanethiol 5.0
0.05 M morpholine 3.5
N NH 0.002 M piperazine 7.5
0.03 M ethanol 6.4
0.03 M diethyl ether 8.0
N NH CH3 0.03 M 3-hydroxypropyne 4.0
pK a = 7.05 pK a = 8.00
Tautomers 69
:
of buffer initial NaOH added N
(M) pH (mol L–1) )
“ R
imidazole 0.1 6.70 0.02 2–13
imidazole 0.03 6.50 0.02
phosphate 0.01 6.80 0.005
is a stable, anionic molecule, these interconversions are
phosphate 1.0 6.35 0.1
borate 0.2 9.50 0.002 rapid. In water, the tautomer of uridine that is normally
borate 0.15 8.40 0.05 written, the one in which the proton occupies the nitro-
imidazole 0.1 6.50 0.01 gen, is the dominant one, exceeding in concentration the
imidazole 0.05 7.00 0.02 other two combined by a factor of more than 4000.19
phosphate 0.2 7.20 0.05
By formal definition, two otherwise identical iso-
phosphate 0.3 6.20 0.15
borate 0.05 9.40 0.001 mers are tautomers of each other only when the tau-
borate 0.02 8.60 0.01 tomeric proton sits on two different atoms in the two
isomers, as in the case of the three tautomers of uridine
in Equation 2–11. If the proton sits on two different lone
pairs on the same atom, the two isomers are, by formal
Tautomers definition, conformational isomers of each other. An
example of two such confomational isomers would be
One isomer is a tautomer of another isomer if the only the syn and anti conformations of acetic acid:
difference between them is the position of a proton.
There are several tautomers of uridine (2–9): ’ ’
O” O”
’O‘ H 3C 1 H 3C (2–12)
O‘
H
:
OH O”
‘
:
H
H
N: ’N
1 syn anti
’O ’O
:
N N
“ Because the barrier to rotation around the
R “ R carbon–oxygen bond is large24 due to the conjugation in
the acid (2–5) and because protons shuttle on and off the
1
1
water should damp both the dipolar repulsion and the The most stable form of the acid, the ketoester (KE), is in
electron repulsion,30 it has been proposed that the differ- equilibrium with two tautomers, the enol at carbon 3 (E3)
ence in stability between these two tautomers is the same and the enol at carbon 1 (E1). The common conjugate
in water as in the gas phase.31 If this were the case, the base of all three of these tautomers is the enolate anion
microscopic acid dissociation constant for the anti iso- (enolate). The enolate anion has a five-atom system of
mer should be about 3000 times larger than that for the p molecular orbitals, and each of the five atoms of the
syn isomer; or in other words, the syn lone pairs of elec- system is hybridized [p, sp 2, sp 2, sp 2]. Two of the six
trons should be 3000 times more basic than the anti. p electrons of the anion, however, must be protonated at
There is, however, experimental evidence suggesting carbon to form the ketoester, an event requiring rehy-
that in water the difference in basicity between the syn bridization at that central carbon.
and anti lone pairs is much less significant.32 In the case of the two enols of ethyl acetoacetate, in
If the rotation about the carbon–oxygen bond in contrast to the tautomers of acetic acid or the tautomers
each of the two tautomers of uridine in which the oxy- of uridine, the proton can be readily transferred
gens are protonated (Equation 2–11) is also sufficiently intramolecularly between the two oxygens. In fact, in
hindered that neither interconverts significantly by rota- either enol the proton forms a hydrogen bond to the
tion around the carbon–oxygen bond during its lifetime, adjacent carbonyl oxygen. These comparisons illustrate
then there would be syn and anti conformations of each the specific geometric requirements for intramolecular
of them that would be in fact two tautomers of each of proton transfer. Not counting the proton transferred,
them. In this case, the five actual tautomers of uridine efficient intramolecular proton transfer requires that a
would be the five molecules resulting from the protona- ring of five or six atoms can be formed.
tion in turn of the five respective s lone pairs on anion There are three aspects of the situation that must
2–13. As the protons shuffle, the s structure of the uri- be clearly distinguished from each other. One is the set
dine remains constant, and a proton is simply found on of tautomers itself (Equations 2–11 and 2–13). The
a different s lone pair of electrons. second is the resonance structures that can be written
In some sets of tautomers, however, rehybridiza- for each member of the set of tautomers. The third is the
tion of the atoms in the acid–base occurs during tau- microscopic acid dissociations of the individual tau-
tomerization. Such rehybridization is required to take tomers.
place when one of the lone pairs that is protonated is a Each of the tautomers in the set can often be drawn
p lone pair in the intermediate base. The usually cited as a subset of resonance structures. For example, just
example of this is that of the keto and enol tautomers of one of the tautomers of uridine can be examined in this
a carbonyl compound such as ethyl acetoacetate: way:
’O” ) )
KE ’O C 2H 5 ’O‘ ’O‘ ’O:‘ ’O‘ ’O:‘
: K aKE
H 3C
H
1
± H+
C 2H 5 ’N ’N ’N
:
:
O” H 3C ) O ´ ´
“ ’O ’O ’O
:
:
:
:
H “ H
(H
N N N(
H H
´
1
R R R
K 3K
´
´
´
)
) ’O‘ ’O‘
’O:H ’O‘ K aE3 ’O:‘ ’O‘ :
E3 1
± H+
)
’N: ’N
C 2H 5 C 2H 5
´
:
:
H 3C O
H 3C O ’O ’O
:
:
H “ H “ N( N(
H H
´
enolate R R
1
K 13 2–14
+
C 2H 5 ± H C 2H 5 have independent existences. In such a subset of reso-
:
:
the p electrons and which atoms—in the case of uridine, are considered to be indistinguishable. As a result, the
all of them—are contributing p orbitals to the system of molar concentrations of all tautomers with the same
p molecular orbitals. Each of the three tautomers of uri- number of protons must be summed, and only those
dine can be submitted to this treatment to generate three undivided sums can appear in the expression defining a
subsets of resonance structures. It becomes clear that if macroscopic acid dissociation constant. The expression
the hierarchy of tautomers and resonance structures is for the macroscopic dissociation constant of ethyl ace-
not always clearly recognized, significant confusion toacetate is
ensues.
Because it is a tautomer, any one of the tautomers
in a set can simply lose a proton in a microscopic acid [ H+ ][ enolate ]
K aEAA = = 2.1 ¥ 10 –11 M
dissociation that produces its conjugate base. Although [ KE ] + [ E3 ] + [ E1 ]
the conjugate base may itself be a member of a set of (2–16)
tautomers existing at its level of protonation, in the
examples discussed so far, none of the conjugate bases
have had acidic protons. For example, the enolate is the Were there more than one tautomer of the enolate, the
common conjugate base produced upon the dissocia- molar concentrations of all these tautomers would be
tion of a proton from any one of the three tautomers of summed and that sum would be multiplied by [H+] in the
ethyl acetoacetate (Equation 2–13). The ratios between numerator.
the concentrations of each of the pairs of the members It is the macroscopic pKa that is measured during
of a set of tautomers is independent of the pH of the the titration of an acid–base because such a measure-
solution because a proton appears on neither side of ment makes no distinction among all of the tautomers
any chemical equation interconverting the two. For yielding a proton at a particular pH or among all of the
example, as the pH increases, the molar concentration tautomers produced upon the surrender of the proton.
of the enolate increases according to a function of the All that is measured is the consumption of hydroxide
same form as that displayed for the conjugate base ions or protons by the solution. A tautomeric acid
in Figure 2–6, and the sum of the molar concentrations behaves as if it were a simple acid with an acid dissoci-
of the three tautomers decreases accordingly, but the ation constant equal to its macroscopic acid dissocia-
ratio between their concentrations remains unaltered tion constant. The total concentrations of conjugate
at all values of pH, even when the conjugate base bases and conjugate acids behave as if they were the
accounts for almost all of the molecules in the solution. concentrations of one simple base and one simple acid.
In the case of ethyl acetoacetate, these ratios are Because only the macroscopic pKa is the result of an
defined by the three equilibrium constants among the acid–base titration and because measurements of the
tautomers: ratios of tautomers or their microscopic acid dissocia-
tion constants are more difficult, it is always the macro-
scopic pKa that appears in a table. The tabulated value12
[ KE ] [ KE ] [ E1 ] K 3K for the macroscopic pKa of ethyl acetoacetate (pKaEAA) is
K 1K = K 3K = K 31 = =
[ E1 ] [ E3 ] [ E3 ] K 1K 10.68.
(2–14) By simple manipulation it can be shown that
K aE3 = K aEAA ( 1 + K 3K + K 31 )
-
(2–18) SOH
‘
)
”O
S‘
:
If it is assumed that the enol at carbon 3 is the more
”:
”
HO
stable (K31 < 1) and that the measured equilibrium con- ± H+ ’ K a-SOH
1
stant between the enols and the ketoester (250)11 is K aHOSH ± H+
ú
approximately K3K, then the pKa for the microscopic
‘ ‘
acid dissociation of the enol at carbon 3 is approxi- ”O )
”O
mately 8.3. ”: K SO S‘
”
S
:
As Equation 2–18 suggests, the ratios among the
”: ”)O
”
H HO
tautomers can also be calculated from their microscopic ’ ’
acid dissociation constants. In fact, all of the equilibrium -
HSOH K aHSOH ± H+ SO-
1
1
constants governing the tautomers and the conjugate
± H+ ‘ K a-OSH
base of ethyl acetoacetate are dependent upon each ”O
other, or linked. The linkage is reflected in the relation- ”:
”
S
:
ships H ”O
)’
HSO-
K aE1 K aE3 K aE3
K 1K = K 3K = K 31 = (2–20)
K aKE K aKE K aE1
(2–19) The equilibrium constant for the tautomerization (KSO) is
defined with the thiol–carboxylate as product and the
thiolate–carboxylic acid as reactant. The linkage rela-
The equalities of Equations 2–19 simply state that the tionships are
ratio between the concentrations of any two tautomers is
equal to the inverse of the ratio of their respective micro- [ HSO– ] K aHSOH K a–SOH
scopic acid dissociation constants, which makes chemi- K SO = – = = (2–21)
[ SOH ] K aHOSH K a–OSH
cal sense. The stronger the bond between the
heteroatom and the proton, the smaller will be its intrin-
sic acid dissociation constant but the greater its relative and the relationships between the macroscopic dissoci-
concentration. ation constants and the microscopic dissociation con-
A molecule of protein has a large number (>100) of stants are34
acidic protons and basic lone pairs distributed over the
side chains of its amino acids. As a result it is a waste of ([ HSO– ] + [ –SOH ])[ H+ ]
time even to imagine all the tautomers of that protein K a1 = = K aHOSH + K aHSOH
[ HSOH ]
that are present in solution, but usually these
acid–bases on the side chains are separated widely (2–22)
enough from each other that each behaves as an inde-
pendent acid–base and can be treated as such. and
Occasionally, however, two or three amino acids are not
only of functional significance, so attention is paid to 1 [ HSO– ] + [ –SOH ] 1 1
= = +
them, but also close enough to each other that tau- K a2 [ –SO– ][ H+ ] K a–OSH K a–SOH
tomers and the distinction between macroscopic acid
dissociation constants and microscopic acid dissocia- (2–23)
tion constants become important.33 In thioredoxin from
The equation describing the titration curve for the
Escherichia coli, there is an aspartate (Aspartate 26)
cysteine is
close enough in the structure to a cysteine (Cysteine 32)
that their acid dissociations become linked.33 When
both are protonated or both are unprotonated, there K aHOSH ([ H+ ] + K a–SOH)
are no tautomers; but when one is protonated and the f cysteinate =
K aHOSH ([ H+] + K a–SOH) + [ H+]([ H+] + K aHSOH)
other is not, there are two tautomers, one in which the
proton is on the aspartic acid and the cysteinate is in (2–24)
the form of the anionic base, and the other in which the
proton is on the cysteine and the aspartate is the where fcysteinate is the fraction of the cysteine that is the
anionic base: anionic base. It is possible to walk through the titration
Tautomers 73
curve. Assume that the first and second macroscopic in the titration curves (Equation 2–26), is 1.3. These
acid dissociation constants are well separated, that the values, when inserted into the equations, give micro-
respective pairs of microscopic acid dissociation con- scopic acid dissociation constants for the cysteine and
stants are close together (KaHOSH @ KaHSOH > Ka–OSH @ the aspartic acid of pKaHOSH = 7.6, pKaHSOH = 7.5,
Ka–SOH), that the initial pH is low, and that the titration is pKa-OSH = 9.2, and pKa-SOH = 9.1. The titration curve for the
performed by adding hydroxide ion. As the concentra- aspartic acid has tautomeric ratios and macroscopic acid
tion of protons decreases into the range of the first dissociation constants of about 1.3, 7.2, and 9.5, as
macroscopic acid dissociation constant, the inequality expected.33,35,36
[H+] ≥ KaHOSH @ KaHSOH > Ka–OSH @ Ka–SOH holds and From the microscopic acid dissociation constants,
it can be seen that when the aspartic acid is protonated,
K aHOSH the thiol is a much better acid (pKaHOSH = 7.6) than
f cysteinate @ (2–25) when the aspartate is unprotonated and anionic
(K aHOSH + K aHSOH ) + [ H+ ] (pKa-OSH = 9.2). Because of the linkage, the same differ-
ence in microscopic acid dissociations is necessarily
This equation describes a normal titration curve for a seen for the aspartic acid (DpKa = 1.6) when the cysteine
conjugate base (Figure 2–6A) with a macroscopic acid is the neutral thiol or the anionic thiolate. These differ-
dissociation constant equal to Ka,HOSH + Ka,HSOH, the sum ences make electrostatic sense because it should be
of the two lower microscopic dissociation constants, significantly more difficult to produce two adjacent
which is the macroscopic dissociation constant Ka1, and negative charges than a single negative charge. For
that reaches a plateau at example, the pKa of the first macroscopic acid dissocia-
tion of succinic acid is 1.29 units less than that of the
K aHOSH second. Before the ionization of these two acid–bases in
f cysteinate = = thioredoxin was analyzed in terms of tautomeric equi-
K aHOSH + K aHSOH
libria and microscopic acid dissociation constants,33
there was considerable confusion as to what was hap-
K a–OSH 1 pening.35,37,38
=
K a–OSH + K a–SOH 1 + K SO Glutamate 172 and Glutamate 78 in the endo-1,4-
b-xylanase from Bacillus circulans are close enough to each
(2–26)
other in the native protein to be linked by a tautomeric
equilibrium.39 The microscopic pKa of Glutamate 172
which is the fraction of cysteinate in the tautomeric mix-
when Glutamate 78 is the neutral acid is 5.5, but when
ture. The plateau is reached when KaHOSH @ KaHSOH > [H+]
Glutamate 78 is the anionic carboxylate, it is 6.7.
> Ka-SOH @ Ka-OSH.
As [H+] is decreased further during the titration into
the range of the second macroscopic dissociation con- Problem 2–12:
stant and on above it, the inequality KaHOSH @ KaHSOH > O O NH2
Ka-OSH @ Ka-SOH @ [H+] holds and
H N H N
N N N
K a–OSH
K a–OSH + K a–SOH
(K a–SOH + [ H+ ]) H 2N N N O N N O N
f cysteinate @ R H R R
+
K a–OSH K a–SOH
[H ] + guanosine xanthosine cytidine
K a–OSH + K a–SOH
R ribose
(2–27)
(A) Draw complete s structures for the above hetero-
which is the equation for a normal titration curve begin- cycles in the above tautomeric forms including all
ning at the tautomeric fraction (Equation 2–26), having a s lone pairs. Draw them with proper bond angles.
macroscopic acid dissociation constant equal to the term Abbreviate the ribose as R.
Ka-SOHKa-OSH(Ka-OSH + Ka-SOH)–1, which is the macroscopic (B) Indicate which protons are involved in tautomeric
dissociation constant Ka2, and reaching a final level at shifts between which lone pairs of electrons. Draw
which all of the cysteine is unprotonated (the fully ion- some of the tautomeric forms of these neutral
ized form on the right of Equation 2–20). molecules.
The titration curve for the cysteine of thioredoxin
(C) How many p electrons are there in each com-
conforms to these expectations.33,35–37 The values
pound?
observed for the macroscopic acid dissociation con-
stants are pKa1 = 7.2 and pKa2 = 9.5 and the tautomeric (D) The macroscopic values of pKa for guanosine are
ratio, as determined by the level of the plateau observed 1.6, 9.2, and 12.5; those for xanthosine are 0.0, 5.5,
74 Electronic Structure
and 13.0; and those for cytidine are 4.2 and 12.5. Amino Acids
Draw vertically chemical equations for the two or
three acid dissociations that have these values of The fundamental, covalent scaffold of a molecule of pro-
pKa and horizontally next to the molecule in each tein is the polypeptide (see 2–15 below).
level of protonation draw two of its tautomers. A polypeptide is a long (50–5000 amino acids)
(E) How many of the tautomers at each level of pro- linear polymer, the monomers of which are L-a-amino
tonation are insignificant because they require acids. Because a protein constructed entirely of
separation of charge? D-a-amino acids is functionally indistinguishable from
(F) Draw the s structure of a tautomer of xanthine its biological enantiomer,41 the original choice of
that could substitute for adenine in the A–T base L-a-amino acids was arbitrary. The covalent bonds that
H 3N N N
:
N
O H O H O
m H O “)
‘ ’ H R2 ‘ ’H R4 ‘ ’ H Rn-2 ‘ ’ H Rn
2–15
Amino Acids 75
tureless random coil47 and if that amino acid does not structures of these four amino acids than with most of
have an immediate neighbor in the polypeptide with an the others. The view down every carbon–carbon bond
ionized side chain. When the polypeptide folds to form a should be staggered, and methyl or other alkyl groups
globular protein, however, significant shifts in the values should be anti to each other in the most stable con-
of pKa for its side chains occur.36,38,54,55 Neighboring formers.
charged functional groups in the compact folded struc- Proline (P, Pro) and glycine (G, Gly) are amino acid
ture can affect the pKa of a particular acid–base. An adja- residues the effect of which on the polypeptide is almost
cent anion makes it harder to remove a proton from an entirely steric. Glycine has no side chain at all, merely a
acid and raises its pKa (Equation 2–20). An adjacent ele- hydrogen, and as such can occupy positions in the native
mentary positive charge makes it easier to remove a structure of a protein that are cramped. A proline,
proton from an acid and lowers its pKa. If, upon the fold- because it is a ring
ing of the protein, the acid–base finds itself in an aprotic
H H
environment, secluded from water, the more charged H3 O
5 4
form of the acid–base will be less stable relative to the H 1N H
less charged form than it would be in water. This shifts R1 2
N R2
the pKa in the direction favoring the less charged form of O H
H
the acid–base. A simple paradigm for such an effect 2–20
would be the shift in the pKa of acetic acid in dimethyl
sulfoxide, a relatively polar but aprotic solvent, to 12.9 forces the polypeptide to assume particular orientations.
from its value of 4.75 in water, which occurs because the Phenylalanine (F, Phe) is aromatic by virtue of its
anionic conjugate base is poorly solvated by the phenyl ring:
dimethyl sulfoxide relative to the solvation provided by
water. For all of these reasons, when the polypeptide
folds to form the native structure, the values for the pKa
C C H
of the various amino acids shift away from their ideal
H C C C
values. C C
H H H
Alanine (A, Ala), valine (V, Val), leucine (L, Leu),
and isoleucine (I, Ile) have unsaturated alkyl groups as
side chains:* 2–21
H The six p electrons are delocalized above and below the
H
H plane of the ring in three bonding molecular orbitals over
H H H H the six carbons that contribute the six p orbitals. This
H H
H causes the s structure of the ring to be planar, and it is
sandwiched between two circular clouds of p electrons.
alanine valine A phenylalanine side chain absorbs ultraviolet light
2–16 2–17 (lmax = 253 nm, e = 1550 M–1 cm–1), and its absorption
spectrum displays the usual fine structure seen in
unadorned alkylbenzenes.
H
H H The side chain of tryptophan (W, Trp) is an indole,
H which is a benzopyrrole. The indole is entirely aromatic,
H H
H H
H consisting of an unbroken ring of nine atoms each con-
H H H H H H tributing a p orbital, and the aromatic system of p mole-
H H H cular orbitals contains 10 p electrons:
leucine isoleucine
2–18 2–19
C C
H C C H
All of their carbons are hybridized sp 3. Because alkyl C C H
groups are sterically more bulky than functional groups H N C C C
containing atoms hybridized [p, sp 2, sp 2, sp 2], steric H H
considerations are more important in examining the
2–22
* The drawings of all of the side chains in this section, except for
proline, are for the entire functional group that is attached through
a carbon–carbon bond to the respective a-carbon in the backbone The hydrogen on the pyrrole nitrogen of indole
of the polypeptide. The open bond in each drawing indicates the (pKa = 17.0)12 is even less acidic than the hydrogen on a
point of this attachment. molecule of water (pKa = 15.7):
Amino Acids 77
H
4 b H
5
3 ) C C H
H+
”:
2 O C C C
C C
:
6 H
N1 H H
7 H
1
2–23
± H+ pK a = 17
H H H The six p orbitals from the ring and the one p orbital from
H H H
) ) the exocyclic oxygen that overlap in the anion are dis-
tributed above and below the plane of the ring. As indi-
:
:
:
is somewhat less acidic (pKa = 17.5) than indole ´ ´ :
3 5
(pKa = 16.9).12 The indolyl group of tryptophan is planar )
4
with hydrogens directed outward along its edge and :
clouds of p electrons above and below the s plane. It has ‘O H O
‘( H
O
‘( H
the strongest ultraviolet absorption of any functional
1
group in an amino acid (l max = 281 nm, e = 5690 M–1 ± H+ pK a = 9.7
cm–1)56,57 and is the principal contributor to the
absorbance of protein at 280 nm (Figures 1–6 and
H H H H H H
1–10).
The side chains of serine (S, Ser) and threonine (T, :)
Thr) are primary and secondary aliphatic alcohols
resembling ethanol and 2-propanol, respectively, except
´ ´ :
)
that they are more acidic because of the electron with-
drawal of the immediately adjacent polypeptide. Their :
oxygens are hybridized sp3 and have two s lone pairs that
O
‘) ’ ‘O’ ‘O’
can act as bases as well as an acidic hydrogen: (2–30)
”: ”:
”
H”
”
:
H OH K a1 H OH K a2 H O’ the lowered pKa of the hydroxyl reflects the lowered pKa
( 1
± H+
1
± H+
) of an sp2 oxygen–hydrogen bond. To the extent that the
CH 3 CH 3 CH 3 lone pair is not so delocalized in the conjugate acid as it
is in the anion, the lowered pKa reflects the stability of the
threonine anion relative to the neutral acid resulting from the abil-
(2–29) ity of the system of p molecular orbitals to spread its
excess electron density over one oxygen and three
The values of pKa for these acid–base reactions can be carbons. Because of this increase of delocalization in
estimated (Table 2–2) from a series of alcohols contain- the anion, a significant change in the ultraviolet
ing appropriate electron-withdrawing substituents. spectrum of a tyrosine side chain occurs when the acid
Tyrosine (Y, Tyr) resembles phenylalanine because (lmax = 275 nm, e = 1410 M–1 cm–1) becomes the conjugate
it is aromatic and serine because it has a hydroxyl group. base (lmax = 293 nm, e = 2380 M–1 cm–1) upon acid disso-
As a phenol, however, its properties are distinct from ciation.56 It is for this reason that proteins absorb more
either. Tyrosine (pKa = 9.7) is more acidic than serine light at 280 nm when the pH of the solution is raised.
(pKa = 14.2) because of the ability of the neighboring The side chain of histidine (H, His) is also an aro-
p system to delocalize the excess electron density of the matic acid–base by virtue of its imidazolyl group
anion: (Equation 2–31):
78 Electronic Structure
´
tributed by the p system over both nitrogens, and
H H resonance structures can be drawn to show this
(Equation 2–31). The first proton adds to one of the two
H H s lone pairs in the imidazolate anion to form an
HN: NH sp2 covalent bond in an acid–base reaction with a
(
macroscopic pKa2 = 14.5. Thus, the imidazolate anion is
HN:
´
N”
less basic than the pyrrolate anion (pKa = 17.5)12
± H+ 1-H K aH3 because its system of p molecular orbitals can spread
1
H K aH3H1 ± H+ H
the excess electron density over two nitrogens, but the
H H adenosinate anion (pKa = 12.5) is less basic than the
H 2+
1
2–24 and
Amino Acids 79
[ 1-H ][ H+ ] 1.0
f 1H
f H-
K aH3H1 = (2–33)
Æ
Æ
[ H2+ ]
Æ
0.8 pKa1 pKa2
Fraction
f H2+
Æ
0.6
and pKaH1H3 Æ f 3H pKaH1 Æ
0.4
Æ
0.2
[ 3-H ][ H+ ] Æ
pKaH3H1 pKaH3 Æ
K aH1H3 = (2–34) 0.0
[ H2+ ] 2 4 6 8 10 12 14
pH
then the ratio of the two microscopic dissociation con- Figure 2–7: Titration curves for the three conjugate acids of histi-
stants is 4 as it should be.59 If the macroscopic dissocia- dine (Equation 2–31): H2+, 1-H, and 3-H. The titration curves for
tion constant50 histidine graphically illustrate the equations (Equations 2–32 to
2–37) governing the tautomerization. The ratio of the two tau-
tomers remains constant over the entire range. The value of each
([ 1-H ] + [ 3-H ])[ H+ ] microscopic pKa is defined by the intersection between the curve
K a1 = = K aH3H1 + K aH1H3 = 10– 6.6 representing the concentration of the respective tautomer and the
[ H2+ ] curve representing the concentration of its nontautomeric conju-
gate base or conjugate acid. As a result, the microscopic acid dis-
(2–35) sociation constant for a reaction producing a tautomer from a
nontautomer is always less than the corresponding macroscopic
then acid dissociation constant, and the microscopic acid dissociation
constant for the reaction in which a tautomer dissociates to form a
nontautomer is always greater than the corresponding macro-
K aH3H1 = 0.8K a1 = 10 – 6.7 (2–36) scopic acid dissociation constant. The pKa for each of the two
macroscopic dissociations coincides with the pH at which half of
the histidine is in the form of the nontautomer, the H2+ cation or the
and H– anion, respectively.
´
´
are more widely separated from each other (pKa1 = –0.7 defines a plane in which its central carbon, three nitro-
and pKa2 = 17) than the two for imidazole (pKa1 = 7.1 and gens, five hydrogens, and the d carbon all reside. The
pKa2 = 14.5). hydrogens bristle from the three nitrogens at 120 ∞ angles
The side chain of arginine (R, Arg) contains a around the periphery, and the flat clouds of p electrons
cation sandwich the s structure from above and below. The
entire structure bears a net elementary positive charge
H H H H H H that is neutralized by removing a proton from any one of
the three nitrogens.
The side chain of lysine (K, Lys), the other strongly
H H H H H H H H H
H H H basic amino acid side chain (pKa = 10.5), is a simple pri-
( H H H( mary ammonium cation at neutral pH:
N N N N N N
H H H H H H
:
:
:
: ´ ´ : H H
N N N
H H H H H H
(
H H H
2–25 H
: Ka
H H
duce the four p molecular orbitals, one bonding (y1) and
two nonbonding (y2 and y3) molecular orbitals shown,
H S
( CH 3
1
± H+
H S
CH 3
(2–39)
”: )
”
methionine cysteine S’ thiolate anion
’ S‘ H 2C
’ S‘ Ka
1± H+
thioether H 2C CH 3 thiol H 2C H – 2e- , – 2H+ cystine
CH 2
’
””:
CH 2SH ‘S S
+ H 2O C C
– 2e- H2 H2
+ H 2O – 2H +
– 2H + OH -
1
– 2e-
CH 2S- Me 2+
sulfenic acid
) ”: ”:
”O‘ ”O‘
”
”O HO
S‘ S‘ Ka S ± H+ S
1 ‘ ’ 1 ‘ ’
sulfoxide H 2C CH 3 H 2C H ± H+ H 2C K a H 2C
CH 2
CH 2SOH
– 2e- CH 2SH
+ H 2O + H 2O
– 2H +
– 2H +
sulfinic acid – 2e-
‘ “ ‘ “ ‘ ‘ :
” O :“O ”
”
”O O” ”O O” )
”O ”OH
S S Ka S ± H+ S
1
± H+ “ 1
K “
H 2C CH 3 H 2C H
a
sulfone H 2C H 2C
CH 2
+ H 2O
– 2H + cysteic acid
– 2e- a sulfonic acid
‘ “ ‘“
”O O” ”O O”
S ± H+ S
sulfonate 1
:
:
O” K a
OH
H 2C “ ) H 2C “
Figure 2–8: Products of the oxidation of cysteine and methionine side chains and their conjugate acids and bases. When a cysteine side chain
is oxidized by the removal of two electrons, the sulfenic acid is formed, and when a methionine side chain is oxidized by the removal of two
electrons, the sulfoxide is formed. One of the tautomers of a sulfenic acid is the lower homolog of a sulfoxide. Cystine is the disulfide of two
cysteines formed either by their direct oxidation or by the reaction of the sulfenic acid of one cysteine with the thiol of another cysteine. When
a cysteine side chain is further oxidized by the removal of two more electrons, the sulfinic acid is formed, and when a methionine side chain
is further oxidized by the removal of two more electrons, the sulfone is formed. One of the tautomers of a sulfinic acid is the lower homolog
of a sulfone. Cysteine can be further oxidized by the removal of two more electrons to produce the sulfonic acid.
82 Electronic Structure
sulfate dianion with dp p bonds, formed by the overlap of Problem 2–19: Two compounds (A and B) have been
d orbitals on phosphorus and p orbitals on oxygen. These isolated from a protein by enzymatic hydrolysis. Both
bonds are indicated by the four resonance structures for have the composition C5H10N2O3. The titration behavior
the phosphate trianion, indicating the equivalence of all of the compounds is the following:
of the bonds between phosphorus and oxygen. The
compound A compound B
s lone pairs of electrons in the unperturbed trianion
must be distributed around each oxygen in such a way pKa1 3.85 pKa1 2.15
that the tetrahedral symmetry of the entire anion is pKa2 8.25 pKa2 9.19
retained. This symmetry is, however, readily perturbed After acid hydrolysis for 20 h in 6 M HCl, both com-
because the dp p bonds are polarized owing to the differ- pounds have the composition C5H9NO4 and the follow-
ence in electronegativity between phosphorus and ing titration behavior:
oxygen. For example, the donors of hydrogen bonds are
compound A¢ compound B¢
oriented at randomly assumed angles around each of the
three equivalent oxygens in the hydrogen phosphate pKa1 2.16 pKa1 2.16
dianion bound to phosphate-binding protein from E. coli pKa2 4.32 pKa2 4.32
as if there were no incontrovertible geometry for the lone pKa3 9.95 pKa3 9.95
pairs on each of them.64 (A) What are compounds A and B?
The acid–base properties of inorganic phosphate
(B) Explain their behavior on titration.
(Table 2–1) and monoesters and diesters of phosphoric
acid reflect this ability of the system of the dp p molecu-
lar orbitals to spread negative charge over two or more Problem 2–20: Draw a linkage relationship between the
oxygens because the acid dissociation constants (Table microscopic acid dissociation constants of glycine and
2–1) are much closer together than one might expect for its two tautomers in the form of Equation 2–20. The
a series of steps that each increase the negative charge values of pKa for the two macroscopic acid dissociation
number of a small acid–base by 1 unit. The acid dissoci- constants of glycine are pKa1 = 2.34 and pKa2 = 9.6. The
ation constants for an alkyl monoester of phosphoric macroscopic pKa for glycolic acid is 3.82 and that for
acid (pKa1 = 1.7 and pKa2 = 6.7)12 and for a dialkyl diester acetic acid is 4.75. Estimate the equilibrium constant
of phosphoric acid (pKa = 1.5)12 are close to those of phos- between the two tautomers of glycine and its four micro-
phoric acid itself, but sugar phosphates, and presumably scopic equilibrium constants.
also serine phosphate and threonine phosphate, are
more acidic (pKa1 = 0.9 and pKa2 = 6.1)12 because of induc-
tive electron withdrawal. References
1. Bennet, A.J., Wang, Q.P., Slebockatilk, H., Somayaji, V.,
Problem 2–17: Brown, R.S., & Santarsiero, B.D. (1990) J. Am. Chem. Soc.
112, 6383–6385.
(A) At pH 7.0, what fraction of the lysine in the pep- 2. Wang, Y., Purrello, R., Georgiou, S., & Spiro, T.G. (1991)
tide Gly-Pro-Lys-Ala-Thr would be in the neutral J. Am. Chem. Soc. 113, 6368–6377.
nucleophilic form? What fraction at pH 12? 3. Kuhn, H., Eggert, L., Zabolotsky, O.A., Myagkova, G.I., &
(B) The e-amino group of lysine in a polypeptide Schewe, T. (1991) Biochemistry 30, 10269–10273.
reacts readily with acetic anhydride. Write a 4. Wall, M.A., Socolich, M., & Ranganathan, R. (2000) Nat.
mechanism for this reaction. Struct. Biol. 7, 1133–1138.
5. Taylor, R., Kennard, O., & Versichel, W. (1983) J. Am.
(C) At pH 12, 10 ∞C, and 0.1 M KCl, the lysine in the Chem. Soc. 105, 5761–5766.
above pentapeptide would react with acetic anhy- 6. Jelsch, C., Teeter, M.M., Lamzin, V., Pichon-Pesme, V.,
dride at a rate of 1.3 ¥ 105 M-1 min-1 (kN). Write a Blessing, R.H., & Lecomte, C. (2000) Proc. Natl. Acad. Sci.
kinetic mechanism for this reaction at any pH that U.S.A. 97, 3171–3176.
involves only this rate constant and the acid dis- 7. Jiang, J.C., Wang, Y.S., Chang, H.C., Lin, S.H., Lee, Y.T.,
sociation constant KaK of the lysine, and solve it & Niedner-Schatteburg, G. (2000) J. Am. Chem. Soc. 122,
for the initial velocity (vi ) of the reaction between 1398–1410.
lysine and acetic anhydride. Assume that the acid 8. Eigen, M. (1964) Angew. Chem., Int. Ed. Engl. 3, 1–19.
dissociation equilibrium is rapid compared to kN. 9. Yang, X., & Castleman, A.W., Jr. (1989) J. Am. Chem. Soc.
111, 6845–6846.
10. Wei, S., Shi, Z., & Castleman, A.W., Jr. (1991) J. Chem.
Problem 2–18: In the peptide CH3CO-Gly-Glu-Gly-His- Phys. 94, 3268–3270.
NH2, which acid–bases would be titrating in the region 11. March, J. (1985) Advanced Organic Chemistry : Reactions,
between pH 2 and 11? What are the approximate values Mechanisms, and Structure, 3rd ed., pp 220–223, Wiley,
of each pKa? Plot as a function of pH the fraction of each New York.
of the three major ionic forms of the peptide present in 12. Jencks, W.P., & Regenstein, J. (1976) in Handbook of
solution. Biochemistry and Molecular Biology, 3rd Edition:
84 Electronic Structure
Physical and Chemical Data (Fasman, G.D., Ed.) Vol. I, 39. McIntosh, L.P., Hand, G., Johnson, P.E., Joshi, M.D.,
pp 305–351, CRC Press, Cleveland, OH. Korner, M., Plesniak, L.A., Ziser, L., Wakarchuk, W.W., &
13. Taft, R.W., Gal, J.F., Geribaldi, S., & Maria, P.C. (1986) J. Withers, S.G. (1996) Biochemistry 35, 9958–9966.
Am. Chem. Soc. 108, 861–863. 40. Holler, T.P., & Hopkins, P.B. (1988) J. Am. Chem. Soc. 110,
14. Fersht, A.R. (1971) J. Am. Chem. Soc. 93, 3504–3515. 4837–4838.
15. Capon, B., & Zucco, C. (1982) J. Am. Chem. Soc. 104, 41. Zawadzke, L.E., & Berg, J.M. (1992) J. Am. Chem. Soc. 114,
7567–7572. 4002–4003.
16. Zacharias, D.E., Murray-Rust, P., Preston, R.M., & 42. Wolfenden, R.V., Cullis, P.M., & Southgate, C.C. (1979)
Glusker, J.P. (1983) Arch. Biochem. Biophys. 222, 22–34. Science 206, 575–577.
17. Abrams, W.R., & Kallen, R.G. (1976) J. Am. Chem. Soc. 98, 43. Nozaki, Y., & Tanford, C. (1967) J. Biol. Chem. 242,
7789–7792. 4731–4735.
18. Cox, R.A., Druet, L.M., Klausner, A.E., Modro, T.A., Wan, 44. Keim, P., Vigna, R.A., Morrow, J.S., Marshall, R.C., &
P., & Yates, K. (1981) Can. J. Chem. 59, 1568–1573. Gurd, F.R. (1973) J. Biol. Chem. 248, 7811–7818.
19. Schollhorn, H., Thewalt, U., & Lippert, B. (1989) J. Am. 45. Ballinger, P., & Long, F.A. (1960) J. Am. Chem. Soc. 82,
Chem. Soc. 111, 7213–7221. 795–798.
20. Sambrano, J.R., de Souza, A.R., Queralt, J.J., & Andres, J. 46. Calvin, M. (1954) in Glutathione (Colowick, S., Lazarow,
(1976) Chem. Phys. Lett. 317, 437–443. A., Racker, E., Schwartz, D. R., Stadtman, E., & Waelsch,
21. Gorb, L., & Leszczynski, J. (1998) J. Am. Chem. Soc. 120, H., Eds.) p 9, Academic Press, New York.
5024–5032. 47. Tanford, C. (1962) Adv. Protein Chem. 17, 69–165.
22. Good, N.E., Winget, G.D., Winter, W., Connolly, T.N., 48. Tanford, C. (1968) Adv. Protein Chem. 23, 121–282.
Izawa, S., & Singh, R.M. (1966) Biochemistry 5, 467–477. 49. Martin, R.B., Edsall, J.T., Wetlaufer, D.B., &
23. Kresge, A.J., Liebovitch, M., & Sikorski, J.A. (1992) J. Am. Hollingsworth, B.R. (1958) J. Biol. Chem. 233, 1429–1435.
Chem. Soc. 114, 2618–2622. 50. McNutt, M., Mullins, L.S., Raushel, F.M., & Pace, C.N.
24. Peterson, M.R., & Csizmadia, I.G. (1979) J. Am. Chem. (1990) Biochemistry 29, 7572–7576.
Soc. 101, 1076–1079. 51. Lennette, E.P., & Plapp, B.V. (1979) Biochemistry 18,
25. Miyazawa, T., & Pitzer, K.S. (1959) J. Chem. Phys. 30, 3933–3938.
1076–1086. 52. Edsall, J.T., & Wyman, J. (1958) Biophysical Chemistry,
26. Allinger, N.L., & Chang, S.H.M. (1977) Tetrahedron 33, Vol. I, Academic Press, New York.
1561–1567. 53. Keim, P., Vigna, R.A., Nigen, A.M., Morrow, J.S., & Gurd,
27. Blom, C.E., & Gunthard, H.H. (1981) Chem. Phys. Lett. F.R. (1974) J. Biol. Chem. 249, 4149–4156.
84, 267–271. 54. Westheimer, F.H. (1995) Tetrahedron 51, 3–20.
28. Hocking, W.H. (1976) Z. Naturforsch., A 31A, 1113–1121. 55. Stites, W.E., Gittis, A.G., Lattman, E.E., & Shortle, D.
29. Li, Y., & Houk, K.N. (1989) J. Am. Chem. Soc. 111, (1991) J. Mol. Biol. 221, 7–14.
4505–4507. 56. Gratzer, W.B., & Minalyi, E. (1976) in Handbook of
30. Jung, M.E., & Gervay, J. (1991) J. Am. Chem. Soc. 113, Biochemistry and Molecular Biology, 3rd Edition:
224–232. Proteins (Fasman, G.D., Ed.) Vol. I, pp 186–191, CRC
31. Gandour, R.D. (1981) Bioorg. Chem. 10, 169–176. Press, Cleveland, OH.
32. Tadayoni, B.M., Parris, K., & Rebek, J., Jr. (1989) J. Am. 57. Edelhoch, H. (1967) Biochemistry 6, 1948–1954.
Chem. Soc. 111, 4503–4505. 58. Reynolds, W.F., Peat, I.R., Freedman, M.H., & Lyerla, J.R.,
33. Chivers, P.T., Prehoda, K.E., Volkman, B.F., Kim, B.M., Jr. (1973) J. Am. Chem. Soc. 95, 328–331.
Markley, J.L., & Raines, R.T. (1997) Biochemistry 36, 59. Matthew, J.B., & Richards, F.M. (1982) Biochemistry 21,
14985–14991. 4989–4999.
34. Edsall, J.T., Martin, R.B., & Hollingworth, B.R. (1958) 60. Gundlach, H.G., Moore, S., & Stein, W.H. (1959) J. Biol.
Proc. Natl. Acad. Sci. U.S.A. 44, 505–518. Chem. 234, 1761–1764.
35. Jeng, M.F., Holmgren, A., & Dyson, H.J. (1995) 61. Muchmore, C.R., Krahn, J.M., Kim, J.H., Zalkin, H., &
Biochemistry 34, 10101–10105. Smith, J.L. (1998) Protein Sci. 7, 39–51.
36. Qin, J., Clore, G.M., & Gronenborn, A.M. (1996) 62. Hirs, C.H.W. (1967) Methods Enzymol. 11, 59–62.
Biochemistry 35, 7–13. 63. Johnson, D., & Travis, J. (1979) J. Biol. Chem. 254,
37. Takahashi, N., & Creighton, T.E. (1996) Biochemistry 35, 4022–4026.
8342–8353. 64. Luecke, H., & Quiocho, F.A. (1990) Nature 347, 402–406.
38. Jeng, M., & Dyson, H.J. (1996) Biochemistry 35, 1.
Chapter 3
Sequences of Polymers
By direct chemical analysis of purified proteins, it has defined by the sequences in which the sugars are linked
been shown that they are composed of linear polymers of together in these oligomers.
amino acids, referred to as polypeptides. These polymers With the exception of the unexpected posttransla-
are formed by a ribosome that reads the messenger RNA tional modifications, which are relatively infrequent,
and converts the sequence of codons into a sequence of defining the covalent structure of a mature polypeptide
amino acids coupled covalently together in the dictated is an exercise in the sequencing of polypeptides, nucleic
order. Every polypeptide begins its existence as a single acids, and oligosaccharides.
polymer of amino acids of a precise length coupled in a
precise order. By and large, this polymer of amino acids
is conserved in the mature protein. On its way to matu- Sequencing of Polypeptides
rity, however, various alterations can occur. One class of
Each naturally occurring polypeptide (2–15) has its own
such alterations is the one that includes changes to the
length and its own amino acid sequence. The amino acid
sequence of the amino acids. Short segments of amino
sequence is the order in which the side chains of the
acids are often removed from the amino-terminal or car-
amino acids (Ri in 2–15) are arranged along the polymer.
boxy-terminal ends of the protein or cut out of the
The continuous lengths of the polypeptides found in
middle, leaving a broken chain. If such an alteration
molecules of protein, and hence the lengths of their
occurs, it causes the actual amino acid sequence of the
unique sequences, can be quite long. For example,
polypeptide in a mature protein to differ from the
human apolipoprotein B100 is 4560 amino acids (aa)
sequence encoded in the messenger RNA.
long,1 human mucin MUC2 is 5159 aa long,2 and human
The sequence of the amino acids in a mature
cardiac titin is 26,926 aa long.3 The amino acid sequence
polypeptide can be determined directly, but this is rarely
of a given polypeptide is written as a word, each of whose
done anymore. It is far easier to sequence the messenger
letters stands for an amino acid. The word begins at the
RNA for the protein and translate the sequence of
amino terminus, ends at the carboxy terminus, and is
nucleotides into a sequence of amino acids. Because an
usually spelled correctly.
amino acid sequence determined today is almost always
The amino acid sequence of a polypeptide deter-
the one encoded by the messenger RNA, alterations in
mines which protein it will become. Bovine pancreatic
the amino acid sequence of a protein that occur naturally
ribonuclease can be defined as the protein produced in
during its maturation often escape detection initially.
the pancreas of a steer that can cleave ribonucleic acid at
Eventually, however, most are detected, for example, as
random along its length in a reaction that leaves the
unexpected behavior of the protein upon electrophoresis
phosphate on the 2¢- and 3¢-positions of the products, or
or an incorrect mass on mass spectrometry, and then the
it can be defined as the folded polypeptide, 124 amino
sequence of the mature protein must be defined by
acids long, with the amino acid sequence KETAAAKFER-
direct analysis. This direct analysis always relies heavily
QHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFV-
on the knowledge of the amino acid sequence encoded
HESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDC-
by the messenger RNA because the lion’s share of the
RETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV.
original amino acid sequence usually remains in the
That the sequence is sufficient to define ribonuclease has
polypeptides forming the mature protein.
been demonstrated by total synthesis.4 A similar demon-
As part of the process that produces a mature pro-
stration was made for the peptidase from human
tein, other changes are often made to the constituent
immunodeficiency virus.5
polypeptides. These changes are either chemical modifi-
The complete amino acid sequences of polypeptides
cations of the amino acids themselves or the attachment
were, in the past, determined directly. The amino acids in
of other compounds to the amino acids. For the most
a polypeptide can be removed in single steps from the
part, these posttranslational modifications are unpre-
amino-terminal end by the Edman degradation6 (Figure
dictable, and each presents a challenge in analytical
3–1).* The strategy of the Edman degradation relies upon
chemistry. There is a series of such modifications, how-
ever, that consists of the addition of oligosaccharides to * From here on only those lone pairs involved in each step of a
particular amino acids, and these modifications are chemical mechanism will be drawn.
86 Sequences of Polymers
R1 H O R2
H H R2 H+
H H
N H H
”:
N N HN
“ H H N: “ H+ ’S O
O H R2 ’ S‘ O
””:
’ ± H+
1
:
aqueous C R1
:
N C S”
”: N N R1 anhydrous N N H
base (H
‘ H H H TFA H
H+ Ú ‚
phenylisothiocyanate
NH R2 H
H Ú H
R2 H
H
O N H N
R2 H 3N aqueous N
H
( “
O base O
H 3N
(
+ Ó
separate H
’ H
O”
”:
O O ”
”
“
”:
H O
’S ’S H O H+
: : N:
’
R1 + H 2O R1
’S
:
N N N R1
(N H (H
H
H H+, H 2O N
H
H H “ H
anilinothiazolinone 4
convert - H 2O
Figure 3–1: Steps in the mechanism of the Edman degradation.6 Phenyl isothiocyanate is
used under basic conditions to produce an N-phenyl-N¢-peptidylthiourea at the amino ter-
minus. The nucleophilic sulfur of the thiourea then can attack intramolecularly the acyl ’O ‘
carbon of the first peptide bond, but only under conditions of strong general acid catalysis, H
which promote protonation of the acyl oxygen. Anhydrous trifluoroacetic acid (TFA) is used
to prevent any unwanted hydrolytic side reactions at this step. The shortened polypeptide :N R1
that leaves during this second step is unreactive at its amino terminus under these condi- N
”S H
:
tions owing to protonation. The anilinothiazolinone and the shortened polypeptide are then
separated from each other. The shortened polypeptide is recycled through coupling and ’
cleavage. The anilinothiazolinone is opened and recyclized in aqueous acid to produce the
phenylthiohydantoin of the first amino acid. phenylthiohydantoin
the separation of the chemistry into two discrete, con- pyrrolidones8 also increase noise and lower yield, respec-
trolled steps (labeled Ú and ‚ in Figure 3–1) that permit tively. Methods for sequencing a polypeptide from its
the removal of one amino acid at a time from the polypep- carboxy terminus9 and alternative methods for sequenc-
tide as the phenylthiohydantoin. The phenylthiohydan- ing one from its amino terminus10 have been described.
toins from each step can be positively identified on So far, the former have been far less reliable than the
chromatography by adsorption.7 Edman degradation and the latter have been supplanted
Only in fortuitous circumstances, however, can the by automated machines the chemistry of which relies on
Edman degradation be run for more than 20 or 30 cycles. the Edman degradation.7 These machines are able to
The necessity for two steps in each cycle as well as the provide a sequence from tens of picomoles of a peptide,
step separating the shortened polypeptide from the thi- but they have not overcome the inherent shortcomings
azolinone, none of which can be performed in 100% of the chemistry.
yield, causes the cumulative yield of phenylthiohydan- In its present applications, the automated Edman
toin to decrease inexorably and noise to increase apace. degradation is performed on peptides or polypeptides
Side reactions such as random hydrolysis of the poly- noncovalently11 attached to thin membranes of glass
peptide and cyclization of amino-terminal glutamines to fiber7 or poly(vinylidene difluoride).12 Because the pep-
Sequencing of Polypeptides 87
tide remains bound to a solid phase, the reagents, in paradigm of chemical cleavages is that produced to car-
solution or as gases, can be sequentially applied to and boxy-terminal sides of methionines by cyanogen bro-
removed from the peptide efficiently. It is also possible to mide (Figure 3–3).29 Several other chemical cleavages of
transfer polypeptides that have been separated by elec- more limited usefulness have been developed. 2-Nitro-
trophoresis onto these supports and then submit them to 5-thiocyanatobenzoate induces cleavage on the amino-
sequencing.12–14 terminal side of cysteine residues (Figure 3–4), but the
Because polypeptides cannot be sequenced in their yield is less than quantitative and the amino terminus of
entirety by the Edman degradation, they are cleaved into the carboxy-terminal product is blocked.30 Cleavage at
pieces, or peptides, that can be. This cleavage can be per- tryptophan residues can be performed chemically with
formed with endopeptidases that hydrolyze the peptide brominating agents under heterolytic conditions:31
bonds at the locations of specific amino acid residues in
the sequence (Figure 3–2).15–25 All of these enzymes, H H(
except the papain from Zingiber officinale, have been N N
:
used to digest long polypeptides specifically during elu- H H
cidations of their complete amino acid sequences. Br
Because these enzymes cleave only peptide bonds adja- Br + ’
O Æ O”
cent to specific amino acids, high yields of a reasonable
number of peptides, each with a specific sequence, can N N
be obtained from a long polypeptide. H N H
:N
H H (3–1)
If polypeptides are to be cleaved by endopeptidases,
they must be unfolded or denatured. A folded, compact H
molecule of protein is usually resistant to digestion by H N
:
endopeptidases for steric reasons. Although the most Br
common way to denature a protein to prepare it for diges- Æ O”
”:
tion is to precipitate it irreversibly at high temperature,
this approach can fail. If it does, denaturing the protein
N (
that is to be cleaved without simultaneously denaturing H N
the endopeptidase, which is itself a protein, requires H
some strategy. Usually the chemical modification of one
type of amino acid in the polypeptide while it is unfolded This reaction proceeds through a bromonium cation that
in a solution of a salting-in solute such as urea is suffi- results from insertion of Br+ into the olefin between car-
cient to prevent it from refolding after the denaturant is bons 2 and 3 of the indole to create an electrophilic
removed. The carboxymethylation of cysteines with center. A nucleophilic attack of the acyl oxygen five
2-iodoacetate, after the cystine side chains in the protein atoms away then occurs as in the cleavage with cyanogen
have been reduced,26 and the maleylation of lysines17,27 bromide. The resulting iminolactone hydrolyzes as it
are examples of this strategy. When proteins that are nor- does in the cleavage with cyanogen bromide to release a
mally embedded in biological membranes are removed fragment with a free amino terminus from the carboxy-
from the membrane, their polypeptides often remain sol- terminal side of the tryptophan. The olefin between car-
uble and unfolded and can be cleaved with endopepti- bons 2 and 3 in indole is an easily brominated position,
dases.28 Some endopeptidases are themselves quite and the mildest brominating agent capable of reacting at
stable and will function in solutions of denaturants suffi- this location should be used under the mildest condi-
cient to unfold the protein to be cleaved. tions to avoid widespread bromination of the polypep-
At times it is useful to cleave a polypeptide at only tide elsewhere.32
one or two specific locations in its sequence so that long A chemical cleavage that can produce large frag-
fragments can be isolated from it. The most common ments from a polypeptide is the cleavage that occurs
way that this is done is to take advantage of the resistance preferentially at the peptide bond between an aspartate
of the native, properly folded protein to digestion by and a proline under mildly acidic conditions (Figure
endopeptidases. The consequence of this resistance is 3–5).33 This cleavage with acid results from intramolecu-
that when a properly folded protein is treated with an lar attack of the carboxylate anion of the 3-carboxy group
endopeptidase such as trypsin or chymotrypsin, often of the aspartate on its own acyl carbon, the acyl oxygen
only one or two of its peptide bonds are exclusively of which has been protonated, to produce, upon depar-
hydrolyzed, and this hydrolysis produces the long frag- ture of the amide nitrogen of the proline, an anhydride,
ments desired. Because this is completely the result of which is subsequently hydrolyzed.34 The cleavage occurs
steric effects, no control over the location of the sites of preferentially at proline because the amine in the initial
cleavage, other than that exerted by the intrinsic speci- tetravalent intermediate is by far the poorer leaving
ficity of the endopeptidase, can be exercised. group, but proline, because it is a hindered secondary
Polypeptides can also be cleaved chemically. The amine, is the best leaving group of all the amino acids.
88 Sequences of Polymers
O O H H
+ H
O
H 3N H R1H +
O- +
H N
N R3 O + H 3N
N
O H O H
H
O-
-
O
+
H 2N NH2 +
peptidyl-Asp
NH3
metalloendopeptidase
N papain from
H
Z. officinale (+Arg)
O O
+ +
-
O + H 3N O- + H3N
N N
H O H O
O R H
arginyl trypsin
H
endopeptidase N
N
+
NH3 H
O glutamyl O-
lysyl endopeptidase O
endopeptidase
O
O +
+ O - + H 3N
O - + H 3N N
N H
H O
O
thermolysin
chymotrypsin
H 3C
CH3
O O H
+
O + H3N +
N
N O-
+ HN
3
H O- O
(+Val, Ile, Phe)
(+Tyr, Trp)
Figure 3–2: Specific cleavage of a polypeptide with endopeptidases. Pancreatic trypsin hydrolyzes the peptide bonds on the carboxy-ter-
minal sides of lysine and arginine residues with high specificity to produce a series of peptides. Each of these peptides has the respective
lysine or arginine at its carboxy terminus.15 The lysine side chains can be rendered incapable of being recognized by trypsin by modification
with succinic anhydride, maleic anhydride,17 or citraconic anhydride.16 The latter two modifications are reversible, and the lysines can be
regenerated, after cleavage with trypsin, to yield a series of unmodified peptides the carboxy-terminal residues of which are the respective
arginines. Glutamyl endopeptidase (Glu-C) from the bacterium Staphylococcus aureus, strain V8, hydrolyzes polypeptides with high speci-
ficity at the peptide bonds on the carboxy-terminal sides of glutamate residues.18 Under the proper conditions, the same enzyme also can be
made to hydrolyze the bonds on the carboxy-terminal side of aspartate residues. Thermolysin, an endopeptidase from the bacterium Bacillus
thermoproteolyticus, hydrolyzes polypeptides at peptide bonds on the amino-terminal sides of leucine, isoleucine, valine, phenylalanine,
methionine, and occasionally alanine and tyrosine.19 Pancreatic chymotrypsin usually catalyzes the hydrolysis of the amide bonds on the
carboxy-terminal sides of phenylalanine, tyrosine, and tryptophan.20 Lysyl endopeptidase (Lys-C) from either of the bacteria Achromobacter
lyticus21 or Lysobacter enzymogenes22 hydrolyzes polypeptides with high specificity at the peptide bonds on the carboxy-terminal sides of
lysines. Arginyl endopeptidase (Arg-C) from murine submaxillary gland hydrolyzes polypeptides at the peptide bonds on the carboxy-ter-
minal sides of arginines.23 Peptidyl-Asp metalloendopeptidase (Asp-N) from the bacterium Pseudomonas fragi hydrolyzes polypeptides at
the peptide bonds on amino-terminal sides of aspartate residues24 and, occasionally, glutamate residues. Papain from Zingiber officinale
hydrolyzes peptides at the next peptide bond beyond the one to the carboxy-terminal side of proline residues with little preference for the
amino acids immediately adjacent to the peptide bond that is cleaved.25
Sequencing of Polypeptides 89
Br
C N‘
H3C ” H3C “ C
”:
S N
’ H+ (S
H3C “ Br
O -HBr O
O
1 S C
( NH
Æ “ Æ
O
N “ N ’
H H
N N:
H H
cyanosulfonium cation
N
H 3C C
S
”:
”
H
:O ” O O” :O ”
”:
O ” O O
”:
”:
”
N H 1 N O Æ
N
N N H O”
H H( H ‘ H ‘
:
H
iminolactone
H+ (
+ H3N
Figure 3–3: Mechanism of cyanogen bromide cleavage of a polypeptide on the carboxy-terminal side of a methionine. At acidic pH, a
methionine side chain, because it is not protonated, remains nucleophilic enough to react in an acyl exchange reaction with cyanogen bro-
mide to produce a cyanosulfonium cation. This cationic center causes the carbon of the adjacent methylene to be electrophilic. This elec-
trophile is five atoms away from the weakly nucleophilic acyl oxygen of the same amino acid, and an intramolecular, nucleophilic
substitution ensues. The conjugate acid of the iminolactone formed in this nucleophilic substitution is susceptible to hydrolysis under the
acidic conditions. This hydrolysis produces a mixture of the lactone and the open g-hydroxycarboxylic acid of homoserine at the carboxy ter-
minus of the resulting peptide.
H+
’
NO2 H
N
’N ”
”
C ”N
:
C S
””:
C - S S
S COO H
)
– -SC6H3NO2COO- N
’
H
”:
”
1
:
S N
O -
– H+
H COO O O
N
N NO2
H O
H+ “
N HN S
’N C S -
H C
S
-
ó ± OH O N
ú ’ 1 N H Æ + H H
:
)N H N O N
:
N O O
O
‘
HO )
O
O
”
”:
Figure 3–4: Cleavage of a polypeptide to the carboxy-terminal side of cysteine by cyanylation with 2-nitro-5-thiocyanatobenzoate.
90 Sequences of Polymers
O O
O O O O
NH N NH
H
O”
” :
-
O O ± H+ O
1 ) 1 O
: O”
N N N N: N
N
”
H H H
””
O O”
H
( ”:O” H
+
”
)
O H O
OH NH NH
” H
” :
+ H+ O O O O
”:
Æ :O” + Æ H +
+ H2O O
‘
N HN N HN
:
H O” H( H O H(
”
Figure 3–5: Cleavage of a polypeptide at the peptide bond between an aspartate and a proline under acidic conditions.
Absorbance at 280 nm
Absorbance at 230 nm
14 7
and a glycine can be produced with hydroxylamine at 11
12
alkaline pH and elevated temperature.37,38 Both the 13 0.3
cleavage between aspartate and proline and the cleavage 1.0
between asparagine and glycine produce large fragments 2
of the polypeptide because the frequency at which
4 0.2
6
aspartylprolyl and asparaginylglycyl positions occur 8
within the amino acid sequence of a protein is low. 0.5 9
10 0.1
Each of these enzymatic or chemical cleavages pro-
duces a particular set of peptides from a given polypep-
I II III IV V VI VII VIII IX X
tide, and the complex mixtures that result must be
separated by chromatography. Chromatography by 20 40 60 80 100 120
molecular exclusion can be used to separate the mixture Fraction number
into groups of peptides of different lengths (Figure 3–6).39 Figure 3–6: Separation of peptides produced by cleavage of S-car-
The larger peptides from this first step can be further sep- boxymethylated human phosphoglycerate kinase with cyanogen
arated on chromatography by ion exchange with matri- bromide.39 The protein (50 mg) was dissolved in 70% formic acid
ces of cellulose or dextran.40 Because these larger and solid cyanogen bromide was added to a final concentration of
20 mg mL–1. After 24 h, the solution was frozen and the water and
peptides often aggregate or precipitate, these columns cyanogen bromide were removed by sublimation. The cyanogen
are generally run in solutions of trifluoroacetic acid41 or bromide fragments (50 mg) were applied to a column (1.9 cm ¥
formic acid42 or denaturants such as urea. At high or low 150 cm) of Sephadex G-75 run in 0.2 M ammonium bicarbonate.
pH, the net charges on all of the peptides are negative or The fractions of the effluent were monitored by absorbance at
positive, respectively, and aggregation is discouraged by 230 nm (2) and 280 nm (3). Pools (I-X) were made as indicated.
The numbers indicate which fragments, identified later in other
mutual electrostatic repulsion. Large peptides can also separations, were in each pool. Reprinted with permission from ref
be made more soluble by modification of all the lysine 39. Copyright 1980 Journal of Biological Chemistry.
side chains with citraconic anhydride to increase their
net negative charge at neutral and basic pH.42
The smaller peptides, either those isolated first on arate large peptides such as cyanogen bromide frag-
chromatography by molecular exclusion or those in the ments.41 The resolution obtained with either chroma-
whole digest, can be separated on chromatography by tography by cation exchange or high-pressure
cation exchange with sulfonated polystyrene15,43 or high- chromatography by adsorption are similar, but the latter
pressure liquid chromatography by reverse-phase has become the method of choice because of its rapidity,
adsorption under acidic conditions on alkylated silica gel the continuous spectrophotometric monitoring it permits,
(Figure 3–7).36,41 The latter method can also be used to sep- and its adaptability to samples containing small quanti-
Sequencing of Polypeptides 91
Acetonitrile (%)
0.1 these enzymes alone.
A strategy similar to those just described has been
developed for the mass spectrometric analysis of mix-
35 tures of peptides produced by digesting a protein.49
A mass spectrometer is an instrument that separates
0.0 a population of ionic molecules in the gas phase in the
0 order of their mass to charge number ratio (m/z). The ionic
0 20 40 molecules, after they have been separated by the mass
spectrometer, can be registered individually by a detector
Time (min) to produce a mass spectrum (Figure 3–8),49 which records
Figure 3–7: Separation of peptides from cytochrome c peroxidase the amount of each ion in the sample as a function of its
on chromatography by adsorption.36 The hemoprotein cytochrome mass to charge number ratio. A mass spectrometer can
c peroxidase from Paracoccus denitrificans was dissolved in 8 M also be used to select only ions of a particular mass to
urea containing HgCl2, and after 20 h, the heme was separated charge number ratio, which can then be directed into
from the protein by molecular exclusion chromatography per-
formed in 5% formic acid. The solvent was evaporated and the another instrument. Quadrupole mass spectrometers
resulting solid protein (30 nmol) was suspended in 0.1 M ammo- and ion-trap mass spectrometers separate the ionic mol-
nium hydrogen carbonate. Lysyl endopeptidase from L. enzymo- ecules by passing them through specifically designed,
genes (30 mg) was added to the suspension, and after 4 h at 37 ∞C, oscillating electric fields. Time-of-flight (TOF) mass spec-
the solution had clarified. The sample was evaporated to dryness trometers accelerate the entire population of ionic mole-
and redissolved in a dilute solution of trifluoroacetic acid. The pep-
tides were injected onto a column (0.46 ¥ 25 cm) of a reverse-phase cules in a uniform electric field and then pass them
chromatographic medium of octadecylated silica equilibrated with through a vacuum chamber. Because eaExx = "mz -1v 2 and
0.1% trifluoroacetic acid. The peptides were eluted with a linear the electric field (Ex) accelerates all of the ionic molecules
gradient between 0.1% trifluoroacetic acid and 70% acetonitrile, over the same distance (x), the time it takes each of them
0.1% trifluoroacetic acid (solvent B).41 Peptides were detected by to arrive at the end of the chamber is proportional to the
their absorbance at 220 nm. Peaks were pooled as indicated.
Reprinted with permission from ref 36. Copyright 1997 American square root of its mass to charge ratio.
Chemical Society. There are currently three ways to transfer a biolog-
ical molecule such as a peptide, an oligosaccharide, a
nucleic acid, or a molecule of protein from the aqueous
ties of peptide. In all cases, the art of the chromatography solution in which it is normally found to the gas phase in
lies in choosing solvents and buffers that will dissolve the the form of a monodisperse vapor of individual, ionized
peptides, meet the demands of the chromatographic molecules.
process chosen for the separation, and be easily removed The first method is to pass a dilute aqueous solution
from the peptides after they have been separated. containing the macromolecule through an electrospray
Once the peptides have been purified, their amino atomizer,50 which produces a mist or electrospray so fine
acid composition can be determined by hydrolysis,36 that each macromolecule finds itself in its own droplet.
performed under vacuum in 6 M HCl, followed by The solvent in the droplet evaporates and leaves the
quantitative cation-exchange chromatography with sul- intact macromolecule in the gas phase bearing one or
fonated polystyrene (Figure 1–3). In this way, if the more of the elementary positive charges or negative
peptide is pure and not too long, the amount of each charges that were generated on the surface of the droplet
amino acid it contains can be determined. Usually, how- by the atomizer. For proteins and oligosaccharides, the
ever, the peptides are sequenced directly because proce- atomizer is usually polarized to produce positive ions,
dures for sequencing by automated Edman degradation7 while for nucleic acids, which are already negatively
have become more sensitive than procedures for amino charged in solution, it is polarized to produce negative
acid analysis. ions. The elementary charges generated by the atomizer
Exopeptidases, such as carboxypeptidase A,44 are the result of an excess or a deficit of protons, just as
carboxypeptidase B,45 serine-type carboxypeptidase,46 or charge is produced on a macromolecule in solution
leucyl aminopeptidase,47 can be used to assist in deter- (Figure 1–11). Electrospray produces a family of ions
mining or confirming the sequence of a peptide. These from each macromolecule, each one of the ions differing
92 Sequences of Polymers
Ser + Gln Xle Thr Ala Phe Leu Asp Ser Asn Xle
Leu Phe Ala Thr Xle Gln Ser
Relative abundance
b2 b6
b7
b3 b4 b5 y6 b10
b9
b8
y5 a6 y7 y8 y9 a 10
a7 a8
y4 y 10
100 200 300 400 500 600 700 800 900 1000 1100 1200
m/z
Figure 3–8: Mass spectrometry of a tryptic peptide from thioredoxin.49 Thioredoxin from Chromatium vinosum was dissolved in 6 M guani-
dinium chloride and 0.1 M tris(hydroxymethyl)aminomethane, pH 8.5. The cystines in the protein were reduced with dithiothreitol, and the
resulting cysteines were aklylated with iodoacetamide. The product was separated from the small molecules by molecular exclusion chro-
matography and evaporated to dryness, and a portion (50 nmol) of the dry powder was suspended in 0.1 M ammonium hydrogen carbonate
and 0.1 mM CaCl2. Bovine pancreatic trypsin (12 mg) was added to the suspension and the digestion proceeded for 2 h at 37 ∞ C. The peptides
produced by the digestion were collected by evaporating the solvent, and they were redissolved in a dilute solution of trifluoroacetic acid and
injected into a column of octadecylated silica equilibrated with 0.05% trifluoroacetic acid. They were eluted with a linear gradient from 0%
to 50% acetonitrile in 0.05% trifluoroacetic acid. The peptides in one of the 10 pools of peaks from this chromatographic step were vaporized
by fast-atom bombardment from a matrix of glycerol and passed into a tandem mass spectrometer. The beam of monocationic (M + H+) pep-
tide ions of mass 1208.2 Da was selected, fragmented by a beam of helium atoms of high kinetic energy, and passed into the second mass
spectrometer. The abundances of the various fragments produced are displayed as a function of their mass. The fragment patterns are labeled
as in Equation 3–2, and the amino acids, identified by the distances in mass units between each of the steps, are indicated above the respec-
tive steps. Fragments are produced by cleavage at successive points from each end of the peptide. Reprinted with permission from ref 49.
Copyright 1987 American Chemical Society.
from the others in the number of elementary charges that assisted-laser-desorption ionization, MALDI).53 The
it bears. For example, ions of cytochrome c (naa = 104) heat evolved from the absorption of the light by the
carrying between 11 and 21 elementary positive charges matrix produces an explosive vaporization of its top
were generated by such an atomizer.50 layer, ejecting the macromolecules into the gas phase
The other two methods rely on the initial monodis- mostly as monoactions53 and presumably neutral mole-
persion of the individual macromolecule into a solid glass cules and monoanions as well.
or liquid of low volatility referred to as a matrix. The Electrospray and fast-atom bombardment are both
matrix is formed from a small molecule such as nicotinic continuous processes that produce a continuous flux of
acid, a solid, or glycerol, a liquid. An aqueous solution of ionic molecules. This continuous flux can be directed
the macromolecule, at a low molar concentration, and the into a quadrupole mass spectrometer to produce contin-
molecule that will form the matrix itself, at a high molar uous streams of separated ionic molecules. Matrix-
concentration, is applied to a solid surface, and the water assisted-laser-desorption ionization, however, is
is evaporated to produce the dilutely occupied matrix. accomplished with short pulses of the laser (<10 ns) to
There are two ways to shatter the matrix and in the avoid overheating the sample. As a result, the source
process eject the macromolecules within it into the gas emits short pulses of ionic molecules, and each pulse
phase. A beam of neutral argon atoms of high kinetic contains only a few of each of the individual ionized mol-
energy can be directed onto the matrix (fast-atom bom- ecules. In such a situation, a time-of-flight mass spec-
bardment, FAB), and the explosive collisions of these trometer, which requires a pulsed source, is usually used
atoms with the matrix vaporize the macromolecules as a to separate the gaseous ionic molecules.
mixture of mainly neutral intact molecules, monoproto- Each of these three procedures has its advantages
nated neutral molecules (monocations), and singly and, unfortunately, its disadvantages. To its detriment,
unprotonated neutral molecules (monoanions) dis- electrospray produces a mass spectrum in which each
persed in the gas phase51,52 Alternatively, a neodinium- macromolecule is represented by an envelope of many
Yag laser emitting light of wavelength 266 nm, which is individual peaks. For example, the envelope for
absorbed by the molecules of the matrix, for example, a-amylase contained more than 30 peaks, each repre-
nicotinic acid, can be directed onto the sample (matrix- senting a molecule of a-amylase with a particular number
Sequencing of Polypeptides 93
(30–60) of elementary positive charges.50 These large These fragment ions are then passed into the second
numbers of peaks generated from each molecule com- mass spectrometer of the tandem, which can be either a
plicate the analysis of mixtures of molecules such as the quadrupole mass spectrometer or a time-of-flight mass
mixtures of peptides obtained from endopeptidolytic spectrometer. The resulting pattern of masses that is
digestion of a protein. Electrospray, however, is the observed (Figure 3–8) is a set of four separate arrays
mildest method for producing a high yield of a vapor of (a1–an, b1–bn, y1–yn, and x1–xn), one from each type of
ionized molecules, and large molecules of protein can be fragmentation (Equation 3–2). The number of mass units
vaporized. Matrix-assisted-laser-desorption ionization between each step in each of these arrays provides the
and fast atom bombardment both produce mainly mono- sequence of the peptide. In this procedure, the first mass
cations or monoanions of a molecule, thereby providing spectrometer performs the separation of the peptides in
only one unambiguous molecular ion for each molecule. each chromatographic pool that would normally be per-
The former technique is able to vaporize significantly formed by subsequent steps of chromatography, and the
larger molecules (up to 200,000 Da) than the latter (up to second mass spectrometer performs the sequencing that
20,000 Da),50 but the former has the disadvantage that the would normally be performed by automated Edman
yield of ions for each pulse is low. Nevertheless, mass degradation.
spectrometry has become a routine procedure, and in the If only the identity of the polypeptide is desired, not
sequencing of peptides it is rapidly supplanting chemical its complete sequence, it is possible to slice a band con-
sequencing based on the Edman degradation. taining that polypeptide from a polyacrylamide gel, digest
When mass spectrometry is applied to dissecting it with trypsin, and introduce the entire digest into a
proteins and sequencing the resulting peptides,49 the tandem mass spectrometer without performing the ini-
polypeptide of the protein is first digested with an tial chromatography. Peptide ions that are well resolved
endopeptidase. Usually the digest is then separated with by the first mass spectrometer can be selected for frag-
one chromatographic step (Figure 3–7). Each pool of a mentation, and the pattern of the fragments obtained pro-
peak from the chromatogram is subjected to vaporiza- vides the amino acid sequence of those peptides.54 In this
tion, and the flux of cations produced is directed into a way, a protein appearing on an electrophoretogram can
tandem mass spectrometer. In addition to being able to be positively identified from the amino acid sequences of
register the mass of each peptide in the pool, the first many of its constituent peptides.
mass spectrometer of the tandem can choose the stream The grand strategy for determining the complete
of only one of the ionic molecules and hence only one of sequence of a polypeptide directly is to separate and
the monocationic peptides. This beam of purified pep- sequence all of the peptides from one particular cleav-
tide cations is passed through an orthogonal beam of age, to cleave the protein at a set of different locations, to
helium atoms of high kinetic energy that cleave the mol- identify all of the peptides in this second set that contain
ecules of peptide by collision-induced dissociation (CID) the points of cleavage for the first set, and to sequence
into characteristic fragments: these overlapping peptides to learn the order in which
the first peptides are arranged in the intact polypeptide.
R1 ’O‘ R3 The dramatic epics,55–62 in each of which this strategy was
H H O
H H applied to another protein and its sequence was
N ( N revealed, are now seldom produced.36 The expectation
H2N N OH
H HH H and excitement surrounding each of them is only dimly
O R2 O R4 remembered. In their place are myriads of short essays
that present the sequences of often long polypeptides.
Æ
H
” H
H O H thioredoxin from Chromatium vinosum determined by high-
N R2 C N
H2N ( ( N OH performance tandem mass spectrometry, Biochemistry 26,
H H H 1209–1214.
O O R4
a2 x2 Problem 3–1: Write a complete mechanism for the fol-
(3–2) lowing chemical reaction.10 Draw in important lone pairs
94 Sequences of Polymers
and indicate the combination of nucleophiles and elec- (D) Edman degradation
trophiles with arrows. Use protons where appropriate.
For what purpose is this chemical reaction used? Write cycle
the step-by-step cycle for using this reaction to accom- peptide 1 2
plish this purpose.
(3) V E
O
(4) F S
S CH3 H R1 H O H R3 H
-
O +
N N (E) Reaction with 2,3-butanedione, followed by tryp-
S H2N N
R2 H H tic digest
O O
(6) (E, F, 2 G, K, R, S, V)
Æ
(7) (Hse, L)
pH 9.3
What is the sequence of the fragment? With which
amino acid side chain does 2,3-butanedione react?
O
Problem 3–4: Deduce the sequence of an unknown pep-
-
O
SH + ? tide from the following information.
(A) Amino acid composition of intact peptide
Æ
CF3COOH (A, 2 E, G, L, K, R, 2 S, T)
(B) Tryptic peptides
(1) (E, T)
S O H R2 H O (2) (G, K, S)
H 3C + N (3) (A, E, L, R, S)
R1 H3N N
N (
H O R3 H H (C) Trypsin followed by one cycle of Edman degrada-
tion yields the phenylthiohydantoins of S, A, and
+ HSCH2COOH T.
(D) Peptides produced by digestion with thermolysin
Problem 3–2: Write a complete mechanism for this (4) (A, G, K, 2 S)
reaction: (5) (2 E, L, R, T)
(E) At pH 8.0, tryptic peptide 3 moved on elec-
HCl
N-( a -aspartyl)phenylalanine Æ trophoresis with a positive charge
110 ∞C
aspartic acid + phenylalanine Problem 3–5: Deduce the sequence of a peptide from
the following information.
(A) Tryptic peptides
Problem 3–3: A cyanogen bromide fragment has been
purified from a digest of certain protein. Consider the (1) (A, E, F)
following information. The compositions shown in (2) (Q, S, R, V)
parentheses are those obtained following complete acid (3) (H, K, V)
hydrolysis in 6 M HCl, 110 ∞C, for 24 h.
(B) Carboxypeptidase A
(A) Complete acid hydrolysis (4) A then F and E
(1) (E, F, 2 G, homoserine (Hse), K, L, R, S, V)
(C) Modification with methyl acetimidate followed by
(B) Amino terminus trypsin
(2) (V) (5) (H, K, Q, R, S, 2 V)
(C) Amino acid composition of peptides from tryptic (6) (A, E, F)
digest (D) Amino-terminal amino acids
(3) (E, G, R, V) peptide (1), F
(4) (F, G, K, S) peptide (2), V
(5) (Hse, L) peptide (3), H
Cloning, Sequencing, Expressing, and Mutating of Deoxyribonucleic Acids 95
(E) Edman degradation guanine (G) is the base in the nucleosides guanosine
(2–11) and 2¢¢-deoxyguanosine, and adenine (A) is the
cycle base in the nucleosides adenosine (2–12) and
peptide 1 2 3 2¢¢-deoxyadenosine. The ribonucleoside 5¢-monophos-
phates are incorporated into RNA, and the 2¢-deoxyri-
(2) V S Q bonucleoside 5¢-monophosphates are incorporated into
DNA. Uracil (U) is incorporated into RNA on the
5¢-monophosphate of the nucleoside uridine (2–9).
Problem 3–6: What are the expected masses of the 28
Uridine, however, is converted by dehydroxylation and
cations produced by fragmentation of the protonated
methylation into thymidine, the 2¢-deoxyribonucleo-
gaseous cation (M + H+) of the peptide GGEVEATK?49
side of 5-methyluracil, before its 5¢-monophosphate is
incorporated into DNA. The base 5-methyluracil is
called thymine (T).
Cloning, Sequencing, Expressing, and Mutating Within each nucleoside, the base is attached to its
of Deoxyribonucleic Acids respective ribose or 2¢-deoxyribose in an azaacetal
(N-glycosidic) linkage (see 2–9 to 2–12) between a
Nucleic acids are linear polymers (see 3–1 below) the pyridine nitrogen or an imidazole nitrogen of the
monomers of which are nucleoside 5¢¢-mono- pyrimidine or purine, respectively, and the aldehydic
phosphates. The covalent bonds that link the monomers carbon at position 1 in the furanose ring. In the unpoly-
together to form the polymer are the oxygen–phospho- merized nucleoside phosphates, a monophosphate,
rus bonds that connect the 3¢-hydroxyl group of one diphosphate, or triphosphate group is found on the
nucleoside and the 5¢-phosphoryl group of the next. Each 5¢-carbon. A nucleoside 5¢-monophosphate, 5¢-diphos-
of these bonds produces a diester of the respective phos- phate, or 5¢-triphosphate is referred to as a nucleotide.
phoryl group (a phosphodiester linkage). Nucleic acids Each nucleic acid has its own length and its own
are divided structurally and biologically into ribonucleic sequence in which the nucleotide bases, Ri in 3–1, are
acids (RNA), which have a 2¢-hydroxyl group on each of arranged. The sequence of a nucleic acid is written as a
their furanosyl rings as in 2–9 to 2–12, and deoxyribonu- word, each of whose letters stands for the base of the
cleic acids (DNA), which are unsubstituted at the 2¢-posi- respective nucleotide. Unless otherwise noted, the word
tion of their furanosyl rings as in 3–1. Aside from this begins at the 5¢-end and ends at the 3¢-end.
distinction, every nucleic acid has the same polymer Deoxyribonucleic acid usually and ribonucleic acid
backbone. One end of a molecule of single-stranded often occur as double helices. In a double helix, two mol-
nucleic acid is a phosphorylated 5¢-hydroxyl group ecules of nucleic acid, running in opposite directions, are
(5¢-phosphate); the other end is a 3¢-hydroxyl group. The wrapped around each other (Figure 3–9). The bases in
5¢-phosphate and 3¢-hydroxyl group are the 5¢¢-end and the core of the double helix are paired, adenine next to
3¢¢-end, respectively, of the polymer. At the pH usually thymine and guanine next to cytosine. Because the posi-
encountered in living organisms (pH 7–8), the oxygens tions in the sequence of the one strand of DNA occupied
on each of the phosphoryl diesters in the backbone of a by deoxyadenosine, deoxyguanosine, thymidine, and
nucleic acid are unprotonated and each monomer bears deoxycytidine are paired with positions in the sequence
a full elementary negative charge except for the of the other strand of DNA occupied by thymidine,
monomer at the 5¢-end, the phosphate of which bears an deoxycytidine, deoxyadenosine, and deoxyguanosine,
average of between 1.5 and 2 elementary negative respectively, the sequence of one strand read 5¢ Æ 3¢
charges, depending on the exact pH. complements the sequence of the other strand read
There are four nucleoside 5¢-monophosphates 3¢ Æ 5¢ (Figure 3–10). For example, the sequence
incorporated into a particular nucleic acid as it is syn- –AGCAGA– complements the sequence –TCTGCT–.
thesized biologically. These four nucleoside 5¢-mono- A polypeptide can be cleaved at specific sites with a
phosphates are distinguished by the heterocyclic bases particular endopeptidase (Figure 3–2), and DNA can be
they contain (Ri in 3–1). Cytosine (C) is the base in the cleaved at specific sites with site-specific deoxyribonu-
nucleosides cytidine (2–10) and 2¢¢-deoxycytidine, cleases (restriction enzymes). Just as trypsin or ther-
R1 R2 Ri Rn-1 Rn
1¢ 1¢
O O O O O
O 2¢ O O O O 2¢
P O- P O- P O- P O- P O-
- 5¢ O
O O 4¢ 3¢ O O O O O O O 5¢ 4¢ 3¢ OH
m
5¢-end 3¢-end
3–1
96 Sequences of Polymers
molysin catalyzes hydrolysis of amide bonds in a protein sites, and the fragments of DNA produced by these cleav-
only next to particular amino acids to produce specific ages are known as restriction fragments. Unlike the situ-
peptides, site-specific deoxyribonucleases catalyze the ation in dissecting proteins, a much larger number of
hydrolysis of double-helical DNA only at phosphate site-specific deoxyribonucleases63 are available, the speci-
diesters within particular sequences of nucleotides (Figure ficities of which vary in their complexity. The particular
3–10). The particular sequences of nucleotides and the sequence recognized by a given site-specific deoxyri-
associated points of cleavage are known as restriction bonuclease can be anywhere from four to 12 nucleotides
long. The longer the sequence recognized, the less fre-
quently will it occur in the DNA, and the longer will be the
restriction fragments produced. By using the appropriate
site-specific deoxyribonuclease and carrying the digestion
to the appropriate degree of completion, restriction frag-
ments can be obtained of a desired size range containing
within their population the complete sequence of the orig-
inal DNA, just as a digest of a protein contains within its
population of peptides the complete sequence of the pro-
tein.
Site-specific deoxyribonucleases produce restric-
tion fragments with blunt ends or sticky ends. If the par-
ticular enzyme used cleaves phosphodiester linkages in
the two strands that are directly opposite each other, the
two new ends it produces will both be completely double-
stranded and blunt. If the particular enzyme used cleaves
phosphodiester linkages on the two strands that are
offset relative to each other (Figure 3–10), a short segment
of single-stranded DNA will protrude from each of the
new ends. Because they were before the cleavage, these
two segments will necessarily be complementary to each
other in sequence, will adhere to each other when they
come in contact, and, consequently, are sticky.
In addition to site-specific deoxyribonucleases,
there are several other enzymes that are used to manip-
ulate DNA (Figure 3–10). The phosphate on the 5¢-end of
a nucleic acid can be removed with polynucleotide
5¢-phosphatase
5¢-phosphopolynucleotide + H2O 1
polynucleotide + HOPO32-
The dashed lines represent hydrogen bonds.
ecules of water that assume fixed positions in the crystal.
black. The unattached white circles are the oxygens of mol-
gray; atoms of phosphorus, dark gray; and atoms of carbon,
figure. Atoms of oxygen are white; atoms of nitrogen, light
generated. A stereo view of the model is presented in the
crystallized and a crystallographic molecular model was
antiparallel to each other. The double-helical dimers were
segments of DNA containing two identical strands running
individual molecules paired up and formed double-helical
chemically synthesized. When dissolved in solution, the
complementary
mation.545 A segment of single-stranded DNA with the self-
Figure 3–9: Double-helical DNA in the standard B confor-
(3–3)
1 ADP + DNA
sequence
ATP + 5A-dephospho-DNA
(3–4)
(3–6)
Cloning, Sequencing, Expressing, and Mutating of Deoxyribonucleic Acids 97
dATP Figure 3–10: Enzymes that are used to manipulate DNA. A double-stranded segment of DNA is
repair single-stranded breaks in a double helix. The com- Polymerases usually require a particular arrangement of
plementarity of the bases around the break usually juxta- double-helical nucleic acid. There must be a shorter
pose the two ends to be ligated. strand of nucleic acid, the primer, associated in a double
Site-specific deoxyribonucleases and DNA ligases helix through complementary base pairing with a longer
are used to insert one molecule of DNA into another strand of nucleic acid. The longer strand of nucleic acid,
molecule of DNA. The molecule to be inserted has been the template, must extend beyond the 3¢-end of the
prepared by cleaving a longer molecule of DNA with a primer. The polymerase elongates the primer from its
particular site-specific deoxyribonuclease, usually one 3¢-end by adding a nucleotide at each step that is com-
that produces sticky ends. The molecule of DNA into plementary to the adjacent base on the template. DNA-
which the restriction fragment is to be inserted is then directed DNA polymerase synthesizes a single strand of
cleaved with the same site-specific deoxyribonuclease to DNA that is complementary to a template of DNA and
produce a break with the same sticky ends as those on that remains associated with the template of DNA in a
the restriction fragment to be inserted. The two prepara- double helix. RNA-directed DNA polymerase synthe-
tions of DNA are then mixed, and the various sticky ends, sizes a single strand of DNA that is complementary to a
for example, the two 3¢ sticky ends, –TGCA, produced by template of RNA and that remains associated with the
PstI (Figure 3–10), spontaneously pair up. The pairs of template of RNA in a double helix. The enzyme pairs A
resulting offset breaks in the two strands of the double with U and T with A. DNA-directed RNA polymerase
helices are then repaired with DNA ligase (Figure 3–10) to synthesizes a single strand of RNA that is complementary
produce a new unbroken molecule of DNA in which the to a template of DNA but that does not remain associated
restriction fragment has been inserted into the other with the template. It pairs A with T and U with A. The
molecule of DNA at the restriction site specific to the site- DNA polymerases use the 2¢-deoxyribonucleoside
98 Sequences of Polymers
triphosphates as reactants; the RNA polymerases use, the before it is read by the ribosome. Although they cause no
ribonucleoside triphosphates. problems for the organism, introns make it difficult if not
Polypeptides are synthesized biologically by ribo- impossible to read the sequence of a eukaryotic protein
somes that translate the sequence of nucleotides in a from the sequence of the gene that encodes the sequence
single-stranded messenger RNA (mRNA) into a of that protein. Consequently, it is the messenger RNA
sequence of amino acids in a polypeptide. The two cor- for a eukaryotic protein that must be sequenced.
responding words written in the respective sequences Almost every protein molecule present at a partic-
are in the same language, the language of the structure of ular time in a living cell is being continuously produced
the protein, and they have the same spelling, but the by ribosomes from messenger RNA molecules, and it
alphabets are different. The alphabet of the polypeptide follows that if a protein is found in a eukaryotic cell or
sequence consists of the 20 amino acids; the alphabet of tissue, the messenger RNA encoding it should be there
the messenger RNA consists of triplets of nucleotides. as well. Messenger RNA can be isolated as a complex
The correspondence between the letters in the two mixture of all of the messages normally being expressed
alphabets is known as the genetic code. Each triplet in a particular tissue. This isolation is assisted by the fact
specifies a particular amino acid, and the triplets are that all eukaryotic messenger RNAs have a segment of
sequentially arranged in the same order as the amino poly(adenosine monophosphate) about 200 bases in
acids of the protein encoded by the message (Figure length at their 3¢-ends. Affinity adsorption with a sta-
3–11). Because the sequence of the nucleotides, however, tionary phase to which poly(thymidine monophos-
is continuous and does not indicate how they are phate) has been attached covalently is used to separate
grouped as triplets, there are three ways to divide any the messenger RNA from all of the other RNA in the
sequence of nucleotides into triplets, or three distinct homogenate.
reading frames, only one of which encodes the sequence The stratagem devised to obtain the nucleic acid
of the protein. If the sequence and the correct reading sequence of a particular single-stranded messenger RNA
frame of a messenger RNA have been determined, it can in this purified mixture is to transcribe all of the single-
be immediately translated on paper into the sequence of stranded messenger RNAs in the mixture into a mixture
the polypeptide which it encodes. of double-helical DNAs of the same respective sequences,
Messenger RNA is synthesized by DNA-directed separate these molecules of DNA biologically, select the
RNA polymerase from a gene in the double-helical DNA DNA derived from the messenger RNA of interest, and
of the genome of the organism. Its sequence matches sequence that DNA. Deoxyribonucleic acid that has the
that of one of the strands of DNA in the double helix, the same sequence in one of its two complementary strands
sense strand, except that uridine monophosphate as the sequence of a messenger RNA is referred to as com-
replaces thymidine monophosphate. During the synthe- plementary DNA (cDNA). Messenger RNA is transcribed
sis of messenger RNA, the other strand of the DNA, the into complementary DNA in the laboratory by first using
antisense strand, serves as the template (Figure 3–11). RNA-directed DNA polymerase to synthesize single-
The sequence of the sense strand of a prokaryotic gene is stranded DNA complementary in sequence to the mes-
identical to that of the messenger RNA transcribed from senger RNA. The single strands of DNA end up in hybrid
it, and the sequence of the protein encoded by that sense double helices with the messenger RNAs. The RNA is then
strand can be read directly from the sequence of the removed by digesting it with RNase, and then DNA-
genomic DNA. The genomic DNA of eukaryotes, how- directed DNA polymerase is used to synthesize the com-
ever, contains introns. An intron is a segment of unre- plements to the single strands of DNA. Each strand of this
lated DNA that has been inserted during evolution into newly synthesized DNA remains associated with its tem-
the genomic DNA of the eukaryote and that interrupts plate in a double helix. In its sense strands, this double-
the sequence on the sense strand that encodes the pro- helical DNA contains the original sequences of the
tein. These introns are spliced out of the messenger RNA messenger RNAs. One advantage of the complementary
DNA derived from a particular tissue is that it catalogs all bacteria carrying one of the plasmids will replicate in the
of the genes that are being expressed in that tissue. presence of the antibiotic. In the library, each antibiotic-
To clone a particular segment of DNA is to insert resistant bacterium contains a plasmid and most of the
that DNA, usually present in a complex mixture of other plasmids contain a copy of one of the original comple-
DNAs, into the DNA of a bacteriophage or a bacterium mentary DNAs or fragments of genomic DNA. Not only
and then isolate a population of identical bacteria or do plasmids and bacteriophage permit the inserted DNA
identical bacteriophage, all of which carry just that one to be replicated as they themselves are replicated, they
segment of DNA. A bacteriophage is a virus that infects also provide a way of storing the inserted DNA, because
and replicates within a bacterium. For the purposes of once it has been incorporated into the bacteriophage or
this discussion, the segment of DNA to be cloned is a seg- its plasmid has been incorporated into a bacterium, it is
ment encoding a protein of interest. In the cloning of stable for long periods of time if the bacteriophage or
eukaryotic DNA encoding a protein, complementary bacterium is stored in its dormant state.
DNA is usually the starting point because of the problem Occasionally, the messenger RNA in a tissue pro-
of the introns in the genomic DNA. Complementary DNA ducing mainly one protein is so enriched for the messen-
is also advantageous because tissues producing signifi- ger RNA encoding that particular protein that the most of
cant amounts of the protein can be chosen as the source the individuals in the library carry complementary DNA
for the messenger RNA, a strategy that increases the for that one messenger RNA, and one of these can be
chances of finding its complementary DNA. In the picked out from the rest directly.65 Usually, however, the
cloning of prokaryotic DNA encoding a protein of inter- library has to be screened to find an individual carrying
est, genomic DNA is usually the starting point64 because the desired complementary DNA or fragment of genomic
it is already double-helical DNA, and in prokaryotes there DNA. To screen a library is to isolate bacteriophage or
are no problems with introns. The genomic DNA of the bacteria that carry complementary DNA or fragments of
bacterium is cut into restriction fragments long enough DNA encoding one particular protein from the vast
to contain all or most of the gene encoding the protein. majority of the bacteriophage or bacteria that carry com-
The complementary DNAs or fragments of genomic plementary DNA or fragments of genomic DNA encoding
DNA in one of these complex mixtures are then usually other proteins.
incorporated into a library in which they can be stored, When the library is stored in bacteriophage, a con-
replicated, and screened. A library is a large population tinuous lawn of a particular bacterium growing on an
of bacteriophage or bacteria, each of which contains agar plate is infected with a dilute solution of those bac-
within its DNA one of the complementary DNAs or frag- teriophage carrying the inserted DNA. Small circular
ments of genomic DNA from the original mixture just as holes or plaques appear in the lawn. Each plaque results
the usual library contains a large population of different from the infection and lysis by bacteriophage of the bac-
books. Each piece of foreign DNA is integrated into the teria that had been happily growing within the lawn. All
DNA of one of the bacteriophage or one of the bacteria in of the bacteriophage in one of the plaques are offspring
the library in such a way that it is replicated along with its of a single bacteriophage from the original solution that
genomic DNA, ensuring that all of the progeny of that fell upon the lawn at the position of the center of the
one bacteriophage or bacterium contain the inserted plaque and then replicated outward by consecutive
complementary DNA or genomic DNA. Each of the frag- infections of the bacteria. Ultimately, each plaque con-
ments of foreign DNA is inserted into the same location tains millions of the progeny of that one bacteriophage,
in the DNA of the bacteriophage or bacteria in the library. and each one of the progeny contains only the one
If the library consists of bacteriophage, the foreign inserted DNA its common ancestor contained.
DNA is inserted at the same site in the genomic DNA of When the library is stored in plasmids in a popula-
each bacteriophage. These genomic DNAs containing tion of bacteria, a suitably diluted suspension of those
the inserts can be biologically replicated to a high con- bacteria is spread on a plate of agar containing the
centration by infecting a suspension of bacteria, usually antibiotic. Only bacteria containing plasmids can grow,
Escherichia coli, with the bacteriophage. and each of these replicates until a round colony of bac-
Complementary DNAs or fragments of genomic teria appears on the plate at the location where the orig-
DNA are incorporated into a population of bacteria by inal one fell. Each of the bacteria in the colony contains
first inserting them into plasmids. A plasmid is a circular a plasmid because it survived the antibiotic, and each of
molecule of double-stranded DNA that is able to repli- the plasmids within the same colony contains the same
cate independently of the chromosome in a bacterium. segment of inserted complementary DNA or genomic
The species of bacteria usually used to carry a plasmid is DNA because all the bacteria are offspring of the original
E. coli. In addition to the inserted DNA, each of the vari- one.
ous plasmids used for cloning contains a gene causing Each plaque or each colony contains copies of a dif-
the bacterium that carries it to be resistant to a particu- ferent complementary DNA or fragment of genomic DNA
lar antibiotic. Consequently, when the plasmids have or lacks an insert. The trick is to discover which of the
been incorporated into a population of bacteria, only the plaques or colonies, respectively, clearly visible to the
100 Sequences of Polymers
naked eye but numbering in the thousands to hundreds plaques out of the 500,000 screened for DNA containing
of thousands, happens to contain the complementary a nucleic acid sequence that would capture the probe.67
DNA or genomic DNA that encodes the protein of inter- Bacteriophage from each of these 15 clones were sepa-
est. rately grown on a large scale, and the inserted DNA was
The most rapid and unambiguous method of cut out of the DNA of the bacteriophage that had been
screening is to synthesize chemically or biologically a carrying it with site-specific deoxyribonucleases.
fragment of radioactive single-stranded or double- The polymerase chain reaction68 can be used to
stranded DNA, referred to as a probe, the sequence of produce probes for screening plaques or colonies or even
one strand of which encodes the amino acid sequence of a segment of the DNA encoding a significant portion of
the protein of interest (Figure 3–11). When the double- the protein of interest. This is a method for replicating to
helical DNA in a plaque or a colony containing that par- a high concentration only a specific segment from any
ticular short nucleic acid sequence is heated so that it source of DNA. To replicate only a particular segment of
unwinds and becomes single-stranded DNA, the DNA in a complex mixture or within a much longer mol-
sequence on the antisense strand or the sequences on ecule of DNA by the polymerase chain reaction, all that is
the sense and the antisense strands that are complemen- required is that the segment of double-stranded DNA to
tary to the sequence or the two sequences of the probe be replicated is flanked on either side by known
will become accessible for hybridization. Hybridization sequences of nucleotides. Two short primers of single-
is the formation in the laboratory of double-helical DNA stranded DNA are synthesized, one complementary to
from two complementary single strands of DNA. Because the flanking sequence at one end of the segment to be
hybridization is usually performed in a complex mixture replicated and the other complementary to the flanking
of single-stranded DNAs such as the denatured DNA sequence at the other end. These two primers for the two
from a plaque or colony, the mixture is cooled slowly or ends, however, must be complementary to the
annealed, to give the pairs of complementary single- sequences on opposite strands of the initial double-
stranded molecules of DNA enough time to find each stranded DNA. The initial double-stranded DNA is
other and form a double helix. If the clone contains melted, and the two primers are hybridized. DNA-
sequences of DNA that are complementary to those of directed DNA polymerase is then used to elongate from
the probe, those sequences, after the DNA has been the 3¢-end of each primer (Figure 3–10). This produces
denatured and probe has been added, will hybridize with two copies of duplex DNA over the segment of interest.
the sequences of the probe to form short segments of The new DNA is melted and rehybridized with the same
double-helical DNA, and in this way the probe is cap- two primers and elongation is performed again to pro-
tured. This trapping of the probe makes the plaque or the duce four copies of duplex DNA for the segment of inter-
colony containing the desired complementary DNA est and so forth. If the heat-stable DNA-directed DNA
radioactive, marking the position of the bacteriophage or polymerase from Thermus aquaticus69 or Pyrococcus
bacteria carrying the desired complementary DNA and furiosus70 is used for the elongation, new polymerase
allowing that one plaque or colony to be isolated. does not have to be added after each melting cycle. After
An example66,67 will illustrate this screening proce- repeated cycles of melting, annealing, and elongation,
dure. Factor VIII is one of the proteins that are together essentially all of the newly synthesized DNA is a copy of
responsible for the cascade of events leading to the clot- the double-stranded segment of the original DNA
ting of the plasma of mammalian blood. Human Factor between and including the sequences of the priming
VIII was digested with trypsin, and the peptides that DNA, and the concentration of this segment increases
resulted from the digestion were separated66 on chro- exponentially with each step.
matography by adsorption. Several of these peptides An example of the use of this procedure is the syn-
were resolved cleanly from their neighbors, and they thesis of a probe for screening a library containing the
were submitted to Edman degradation. The amino acid gene for extensin from Volvox carteri.71 The amino-
sequence determined for one of these peptides was terminal sequence of the protein, AVSYSVSVYNNIAVT-
AWAYFSDVDLEK. A segment of radioactive, single- GAP–, and the sequence of a tryptic peptide from the
stranded DNA with the nucleic acid sequence protein, IDPPSNFGNLPVK, were used to guide the syn-
CTTTTCCAGGTCAACGTCGGAGAAATAAGCCCAAGC thesis of two primers, GT(T/C/A/G)TA(T/C)AA(T/C)-
(Figure 3–11), one of the many possible antisense AA(T/C)AT(T/C/A)GC and GG(G/T)AGGTT(T/C/A/G)-
sequences to that encoding the peptide, was synthesized CCGAA(G/A)TT, where letters in parentheses indicate
chemically to act as a probe. Long restriction fragments that two or more nucleotides were coupled in that step of
(15 kb) of human genomic DNA were inserted into bac- the synthesis to allow for the redundancy of the genetic
teriophage l Charon, and these bacteriophage were used code. When complementary DNA from sperm packets of
to produce plaques on lawns of E. coli. The DNA in the V. carteri was amplified with these primers in a poly-
plaques was then denatured. During subsequent anneal- merase chain reaction, a segment of 410 bp of double-
ing and hybridization, the radioactive probe was cap- stranded DNA was produced beginning with the
tured by the denatured, single-stranded DNA in 15 sequence GTCTACAACAACATCGC– and ending with the
Cloning, Sequencing, Expressing, and Mutating of Deoxyribonucleic Acids 101
sequence –AACTTTGGCAACCTGCC on its sense strand. site-specific deoxyribonuclease PstI, which cleaves at
This sequence encoded 136 aa of the amino acid CTGCAØG. This DNA was digested with the following
sequence of the protein. This segment of DNA was site-specific deoxyribonucleases: AluI, which cleaves at
inserted into a plasmid, replicated with radioactive pre- the nucleic acid sequence AGØCT; TaqI, which cleaves at
cursors, and used successfully as a probe to screen a TØCGA; HpaII, which cleaves at CØCGG; HaeIII, which
library of genomic DNA from V. carteri. By use of this cleaves at GGØCC; RsaI, which cleaves at GTØAC; and
probe, a clone of bacteria was identified that carried a HincII, which cleaves at GTPyØPuAC, where Py is either
plasmid containing a segment of complementary DNA pyrimidine and Pu is either purine.
1392 bp long, encoding 464 aa from the amino acid The pattern of restriction fragments obtained when
sequence of extensin. these enzymes were used in various combinations was
The complementary DNA or the fragment of consistent with only one restriction map (Figure 3–12).
genomic DNA encoding the protein of interest that has For example, the HpaII restriction fragment between
been produced by replicating the bacteriophage or bac- positions 478 and 1063 would give three restriction frag-
terium identified by the screen, or the segment of DNA ments about 60, 240, and 280 base pairs in length upon
encoding a portion of the protein that has been amplified digestion with site-specific deoxyribonuclease AluI. The
by the polymerase chain reaction, can be quite long, order in which these three subfragments occur in the
from thousands to tens of thousands of nucleotides. The HpaII restriction fragment could be determined by gath-
sequence of a particular piece of single-stranded DNA ering the following observations. Deoxyribonuclease
can be read only to a certain length (300–400 TaqI would cut only the AluI restriction fragment that is
nucleotides). Therefore, long DNAs must be cleaved into about 280 base pairs in length to yield the same restric-
smaller restriction fragments with site-specific deoxyri- tion fragment, about 120 base pairs long, that it would
bonucleases, just as polypeptides have to be cleaved into produce from one end of the HpaII restriction fragment.
peptides before they can be sequenced. By trial and Deoxyribonuclease HincII would cut only the AluI
error, a pattern of restriction fragments ideally suited to restriction fragment that is about 240 base pairs in length
the demands of sequencing can be prepared. to give a restriction fragment about 140 base pairs in
The shorter double-helical restriction fragments length. This restriction fragment, together with the AluI
produced from a longer double-helical DNA are rapidly restriction fragment about 60 base pairs in length, would
separated by preparative electrophoresis on gels of form the restriction fragment about 200 base pairs in
agarose. They are usually visualized by use of fluorescent length produced during the digestion of the HpaII
dyes. Their length can be estimated from their elec- restriction fragment with deoxyribonuclease HincII
trophoretic mobilities. The order in which a given set of alone.
restriction fragments is arranged in the original DNA is When restriction fragments of a convenient size
determined by restriction mapping. To produce a had been produced from this complementary DNA
restriction map of a large piece of DNA, it is cleaved sep- encoding the a polypeptide of the murine acetylcholine
arately with several site-specific deoxyribonucleases. receptor, a group of single-stranded DNAs within the set
The restriction fragments produced in each of these sep- were chosen for sequencing (arrows in Figure 3–12).
arate digestions are isolated and assigned a length by These single-stranded DNAs were subcloned in the
electrophoresis. Each of these restriction fragments of single-stranded bacteriophage M13, and each was sub-
DNA is then submitted to digestion by the other sets of mitted to sequencing from its 5¢-end.
site-specific deoxyribonucleases, and the shorter restric- The property of denatured, single-stranded nucleic
tion fragments that result are separated and assigned a acids that allows them to be sequenced is that they
length. This dissection is continued until the restriction behave with extraordinary regularity upon elec-
fragments observed, which are designated by the pedi- trophoresis. For example, when 4.5S ribosomal RNA
gree of the cleavages that produced them, are consistent from the chloroplasts of spinach,73 which is 107 bases
with only one distribution of restriction sites through the long, is elongated with RNA ligase (ATP) from T4 bacte-
original piece of long DNA as well as being of the desired riophage by one nucleotide at its free 3¢-hydroxyl group
length. This unique distribution of restriction sites, the by use of [5¢-32P]cytidine 3¢,5¢-bisphosphate and then
restriction map, orders the different restriction frag- submitted to partial alkaline hydrolysis, a random mix-
ments that have been obtained relative to the complete ture of fragments of all possible lengths and all possible
sequence. beginning and ending points within the sequence is pro-
An example will serve to illustrate the complete duced. Only those fragments that begin at the original
process.72 A clone containing the complementary DNA 3¢-end, however, are radioactive. In the case of the
encoding the a polypeptide of the murine nicotinic 4.5S rRNA, these formed a set of 108 unique fragments
acetylcholine receptor within the tetracycline-resistant that were of all the possible lengths between 1 and 108
plasmid pBR322 (Figure 3–12)72 was identified by screen- nucleotides. When this mixture was submitted to elec-
ing. The cloned complementary DNA was cut from the trophoresis under denaturing conditions on a gel cast
plasmid as an intact double-helical polymer with the from 12% acrylamide and the radioactive components
102 Sequences of Polymers
TET
O/I
pMARa15
PstI
PstI
5A 3A
HaeIII (1521)
HaeIII (1596)
HpaII (1063)
RsaI (1686)
AluI (1325)
HpaII (478)
RsaI (1140)
TaqI (950)
AluI (434)
TaqI (182)
AluI (376)
AluI (785)
AluI (540)
AluI (29)
HincII
Figure 3–12: Restriction map of a fragment of DNA cut out of a plasmid.72 A large fragment of complementary DNA (17 kb) was removed
with the site-specific deoxyribonuclease PstI from the circular plasmid pMARa15, which had been originally constructed from the circular
plasmid pBR322. The plasmid contained a gene for resistance to the antibiotic tetracycline (TET) so that only bacteria carrying the plasmid
would grow on a medium containing tetracycline. The origin of replication for the plasmid is indicated (O/I). The plasmid pMARa15 was iso-
lated during a screening procedure for complementary DNA encoding the a-polypeptide of the murine acetylcholine receptor. The fragment
of complementary DNA was purified by electrophoresis and submitted to a series of digestions with the noted site-specific deoxyribonucle-
ases. The patterns of fragments established the restriction map displayed. The arrows below the restriction map indicate which restriction
fragments were submitted to sequencing from which 5¢-end. The positions in the nucleic acid sequence cleaved by each site-specific deoxyri-
bonuclease are identified by numbers in parentheses. Reprinted with permission from ref 72. Copyright 1985 Oxford University Press.
were located by placing the polyacrylamide gel on a pho- surprising that this sieving, accomplished at the molecu-
tographic film, a regular array of bands, referred to as a lar level by the strands of polyacrylamide, should be a
ladder, could be observed (Figure 3–13).73 Each of these regular, continuous, monotonic function of the lengths
bands, with one interesting exception that will be dis- of the nucleic acids (Figure 3–13).
cussed later, represents a single-stranded RNA that Suppose that a single-stranded deoxyribonucleic
begins at the labeled 3¢-end of the original 4.5S rRNA, acid, labeled at its 5¢-end by phosphorylation with
because it is radioactive and is one nucleotide longer [32P]phosphate, has been cleaved in a low yield and ran-
than the nucleic acid in the band below it in the figure. domly on the 5¢-side of each of the deoxyguanosines in
The ability of electrophoresis on polyacrylamide its sequence. This partial cleavage will have produced a
gels to separate nucleic acids only on the basis of their series of radioactive fragments of different length, each
length arises from the properties of these polymers and of which ends at a nucleotide whose only distinction is
the nature of the electrophoresis. The free elec- that it preceded a deoxyguanosine in the original
trophoretic mobility of denatured single-stranded DNA sequence. When the products of this partial cleavage are
at Ic = 0.01 M, pH 7.5, and 0 ∞C is (1.82 ± 0.02) ¥ 10–4 cm2 submitted to electrophoresis, a series of radioactive
V–1 s–1 and does not vary74 with its length. The free elec- bands will appear the mobilities of which correspond to
trophoretic mobility of denatured RNA under the same only those rungs in the ladder the 3¢-terminal nucleotide
conditions is the same, (1.77 ± 0.05) ¥ 10–4 cm2 V–1 s–1, and of which precedes a deoxyguanosine. The knowledge
it also shows no tendency to vary with length.74 The elec- that the cleavage occurred only at deoxyguanosines and
trophoretic mobilities of single-stranded DNA and RNA the position of the products in the ladder identifies the
on polyacrylamide gels also conform to Equation 1–81,75 relative positions of every deoxyguanosine in the original
and the free mobilities extrapolated from their behavior sequence.
on polyacrylamide gels are in reasonable agreement with Suppose further that four samples have been pre-
those measured directly.76 Because their free elec- pared from the original single-stranded deoxyribonu-
trophoretic mobilities are all the same, it is only the cleic acid such that they contain radioactive fragments,
resistance posed by the polyacrylamide, exp(–Kr,iTa), that all of which begin at the original 5¢-end because they
separates the nucleic acids of the various lengths. It is not were made radioactive by phosphorylating only that
Cloning, Sequencing, Expressing, and Mutating of Deoxyribonucleic Acids 103
plementary strands of DNA by use of the single-stranded method the sequence of the original single-stranded
DNA to be sequenced as a template in four separate elon- DNA is being read. In the enzymatic method the
gations catalyzed by DNA-directed DNA polymerase sequence of the complement of the original single-
(Figure 3–10). The nucleotides inserted by the poly- stranded DNA is being read. Since DNA is normally
merase are present in solution as their activated double-helical with two antiparallel strands of comple-
5¢-triphosphates. In the original method, the newly syn- mentary sequence, either sequence is formally the
thesized polymer of DNA is made radioactive by includ- sequence of the DNA, as long as the correct direction
ing [a-32P]dATP in the synthetic mixture. The successive (5¢ Æ 3¢) is assigned to the sequence by the observer.
fragments that have at their 3¢-end only a particular These original methods, the chemical and the enzy-
nucleotide are produced by including a small amount of matic, were both based on the use of fragments of
3¢-deoxythymidine triphosphate, 2¢,3¢-dideoxycytidine nucleic acid made radioactive by incorporating
triphosphate, 2¢,3¢-dideoxyguanosine triphosphate, or [32P]phosphate (Figure 3–14A, B), but in the automated
2¢,3¢-dideoxyadenosine triphosphate, each in one of the DNA sequencers currently in use, end-labeled fluores-
four elongations, along with the thymidine triphosphate, cent fragments of nucleic acid are used. Although chem-
2¢-deoxycytidine triphosphate, 2¢-deoxyguanosine ical methods have been developed87 that may eventually
triphosphate, and 2¢-deoxyadenosine triphosphate pres- be more efficient, the current automated procedures for
ent in all of them. Occasionally, a 2¢¢,3¢¢-dideoxynu- sequencing DNA are based on the original enzymatic
cleotide is incorporated into one of the growing method of Sanger, Nicklen, and Coulson.85 The products
polymers by the DNA-directed DNA polymerase, and its of the terminations by the dideoxynucleotides are all
incorporation terminates polymerization because that separated together on the same gel of polyacrylamide,
polymer then lacks the 3¢-hydroxyl group necessary for which is continuously scanned by a fluorometer.88 The
further elongation. In this way fragments satisfying two products from the four respective termination reactions
of the requirements for electrophoretic sequencing are are end-labeled with four different fluorescent dyes that
produced. can be distinguished by the fluorometer on the basis of
The last requirement, that every fragment have as the colors of their fluorescence. The separate fluorescent
its 5¢-terminus the same position in the complete tags are applied one of two ways.
sequence, is satisfied by taking advantage of the require- Synthetic derivatives of 2¢,3¢-dideoxyadenosine
ment of DNA-directed DNA polymerase for a primer to triphosphate, 2¢,3¢-dideoxyguanosine triphosphate,
provide a 3¢-hydroxyl group from which the new strand 3¢-deoxythymidine triphosphate, and 2¢,3¢-dideoxycyti-
can be elongated. To initiate the reaction, a primer that dine triphosphate have been prepared that each have a
is complementary to a segment of the DNA to be different fluorescent dye covalently attached to their het-
sequenced is annealed to the template to provide the erocyclic bases.89,90 When these derivatives are used to
necessary 3¢-hydroxyl group. Because the DNA-directed terminate the single-stranded fragments and thereby
DNA polymerase starts at the primer when it synthesizes label them at their 3¢-ends, the fluorometer can distin-
a complementary, radioactive single strand of DNA, the guish strands of DNA terminated at deoxyadenosines,
sequence of the primer can be chosen so that the newly deoxyguanosines, thymidines, or deoxycytidines from
synthesized DNA will begin at a particular point in the each other by their differences in fluorescence.
sequence of the template. The complementary sequence Alternatively, four distinguishable fluorescent dyes
to which the primer is annealed can be a short piece of can be attached separately to the 5¢-ends of four identi-
DNA of known sequence that has been deliberately cal samples of the primer that will be used,88 and a dif-
attached to the 3¢-end of the DNA to be sequenced,86 or ferent one of the resulting fluorescent primers can be
it can be any internal sequence for which a complemen- used in each of the four termination reactions. When the
tary fragment of single-stranded DNA happens to be separate dideoxy terminations have been completed, the
available.85 Often this complementary fragment is a products of the four reactions are mixed. When the mix-
probe that had been made for purposes of screening. It is ture is separated by electrophoresis, the fluorometer dis-
also possible to use an oligonucleotide that has the same tinguishes each strand by the color of the fluorescence
sequence as a segment near the 3¢-end of the longest emitted by the fluorescent dye on its 5¢-end, which iden-
single-stranded fragment that provided readable tifies the termination mixture in which it arose.
nucleotide sequence in the last set of polyacrylamide The DNA-directed DNA polymerase used in these
gels, to extend the sequencing of the template further to automated sequencers is an improved version. The orig-
its 5¢-end. In this way, one can walk along a long template inal enzyme used, the Klenow fragment of DNA-directed
and read its entire sequence. DNA polymerase from E. coli, terminates the elongation
The polyacrylamide gels that result from the appli- at each position with a different yield that can vary sig-
cation of these two methods, the chemical and the enzy- nificantly (Figure 3–14B). This variability can result in
matic, are similar in appearance (Figure 3–14A, B).77,85 uncertainty in reading the sequence, especially when it is
Sequence is read from the bottom (shortest fragments) to to be read by a machine. DNA-directed DNA polymerase
the top (longest fragments), 5¢ to 3¢. In the chemical from bacteriophage T7 produces a much more uniform
Cloning, Sequencing, Expressing, and Mutating of Deoxyribonucleic Acids 105
A B C D
–5 –1 1 10
Ser Ser Ala Gly Leu Val Leu Gly Ser Glu His Glu Thr Arg Leu Val Ala Lys Leu Phe Glu Asp Tyr Ser Ser Val Val
5'_______TC TCG TCC GCT GGC CTT GTT CTG GGC TCC GAA CAT GAG ACG CGT CTG GTG GCA AAG CTC TTT GAA GAC TAC AGC AGT GTA GTC
–20 –1 1 20 Å AluI 40
20 30 40
Arg Pro Val Glu Asp His Arg Glu Ile Val Gln Val Thr Val Gly Leu Gln Leu Ile Gln Leu Ile Asn Val Asp Glu Val Asn Gln Ile
CGG CCA GTG GAG GAC CAC CGT GAG ATT GTA CAA GTC ACC GTG GGT CTA CAG CTG ATC CAG CTT ATC AAT GTG GAT GAA GTA AAT CAG ATT
60 80 100 120 140
50 60 70
Val Thr Thr Asn Val Arg Leu Lys Gln Gln Trp Val Asp Tyr Asn Leu Lys Trp Asn Pro Asp Asp Tyr Gly Gly Val Lys Lys Ile His
GTG ACA ACC AAT GTA CGT CTG AAA CAG CAA TGG GTC GAT TAC AAC TTG AAA TGG AAT CCA GAT GAC TAT GGA GGA GTG AAA AAA ATT CAC
160 180 Å TaqI 200 220
80 90 100
Ile Pro Ser Glu Lys Ile Trp Arg Pro Asp Val Val Leu Tyr Asn Asn Ala Asp Gly Asp Phe Ala Ile Val Lys Phe Thr Lys Val Leu
ATC CCC TCG GAA AAG ATC TGG CGG CCG GAC GTC GTT CTC TAT AAC AAC GCA GAC GGC GAC TTT GCC ATT GTC AAA TTC ACC AAG GTG CTC
240 260 280 300 320
GAGGCTGAGCTAAGCCTACCTCTGTCCCAGCCATAGCCATCGCTAGGAAAGATGGAAGAGAGGAAGGTCTGTCTCCTTGAAGCCTTTCACACTTACCAAACATGCAGTGTTCTACATG
1320 Å AluI 1340 1360 1380 1400 1420
TCCTACATGTTAATGAGAGTGATCTCTGCTCACACGGCTGTATTCTTGAAGTGTCTCCCCTTTGCTTCCTGCTTTTAACACTATGGGCCTCCTTAAAGGGCGAACCCTTTGAAGTAAA
1440 1460 1480 1500 1520 Å HaeIII 1540
TAAAAGTGAGCCCTCAAAAGAAGTGTTTGCTTCTAAATGGCCCCTGGGAGATTTTGCTTGGATACTCAAGGTTTTCTGTTTCTATTGCCATGGCTAGTTGTTTTTGTTTTCTTTCCTT
1560 1580 HaeIII Å 1600 1620 1640 1660
TAATAAATATAATTGTACTTAAAAA_______3'
1680 Å RsaI
Figure 3–15: Nucleic acid sequence and deduced amino acid sequence for the a polypeptide of murine nicotinic acetylcholine receptor.72
The nucleotides are presented in the 5¢ to 3¢ direction for the coding strand. Both sequences are numbered starting with the first amino acid
in the mature protein. The first eight amino acids in the presented sequence are removed posttranslationally. The initiation codon for trans-
lation was not on the cloned piece of complementary DNA. The asterisk marks the codon at which translation is terminated. The restriction
sites that produced the restriction map (Figure 3–12) are identified in the nucleic acid sequence. The PstI restriction sites situated at the two
ends of the insert that were used to remove complementary DNA from the plasmid (Figure 3–12) are lost during the insertion of the restric-
tion fragments of the complementary DNA into the M13 bacteriophage, but the sequence shown begins just after the initial PstI site and ends
just before the final PstI site. Reprinted with permission from ref 72. Copyright 1985 Oxford University Press.
108 Sequences of Polymers
line phosphatase promoter, a tacII promoter, or a trc pro- late the protein of interest without the associated fusion
moter. These promoters are segments of DNA that serve protein, an amino acid sequence is often introduced
as unusually active sites for the initiation of the synthesis between the two proteins that is a target for an endopep-
of messenger RNA by DNA-directed RNA polymerase. tidase of stringent specificity, such as activated factor Xa
The DNA preceding the point of insertion on the plasmid or renin, so that the unwanted portion can be removed
must also have sequences necessary for the active trans- by cleavage with that endopeptidase.
lation of the messenger RNA into protein. A fusion protein can also be one between the pro-
The DNA inserted into the restriction site on the tein of interest and a portion of a protein such as an
expression vector is often complementary DNA or enterotoxin that contains a signal for secretion from
genomic DNA that has just been used for sequencing, E. coli. In this case, the protein produced ends up in the
and that DNA is cut out of the bacteriophage or plasmid medium rather than in the cells. In one instance, how-
in which it was screened and amplified. Occasionally, ever, expression of a protein that is normally excreted
the inserted complementary DNA is from an organism from E. coli was toxic to the cells at the levels produced,
the codon usage of which is so different from that of and the sequences signalling excretion had to be
E. coli that poor expression occurs because of this mis- removed to keep the protein inside the cells.124
match. One solution to this problem is to synthesize the One problem with expression of a foreign protein
complementary DNA with compatible codons.117 The in E. coli is its precipitation to form large inclusion
insertion into the expression vector is accomplished bodies. In this precipitated form, the protein being
most effectively if the fragment has sticky ends that are expressed is inactive and indistinguishable from any
compatible with the restriction site on the expression other precipitated protein. It is often possible, however,
vector. One way this is accomplished is to use primers to dissolve these precipitates in a solution of a salting-in
for the polymerase chain reaction that contain the solute such as urea or guanidinium chloride and rena-
sequences of DNA necessary to anneal to complemen- ture functionally active, fully soluble protein from this
tary sequences of the DNA at the beginning and end of solution.
the coding sequence but in addition contain sequences Proteins can be expressed in cells other than those
of DNA for the appropriate endonucleolytic cleavage of E. coli. Expression plasmids containing promoters
sites64,118,119 and even sequences necessary for transla- active in Saccharomyces cerevisiae125 that can be incor-
tion.120 The final DNA produced in the polymerase porated into cells of this species of yeast are available
chain reaction will incorporate these additional for expressing proteins.126 One of the difficulties of
sequences even though they did not exist in the initial expressing animal proteins in bacteria or fungi is that
DNA used as the template. If the complementary DNA these cells are unable to perform normal posttransla-
encodes a segment of amino acid sequence that is nor- tional modifications. An animal protein that is normally
mally removed from the native protein by a posttransla- modified posttranslationally is usually expressed in
tional process absent from E. coli, the portion of the animal cells capable of such modifications. One such
DNA encoding that segment sometimes has to be animal system that provides high yields of protein is
removed before a fully functional protein can be cells of the insect Spodoptera frugiperda. These insect
expressed.121 cells, grown in culture, can be infected with virions con-
A piece of DNA encoding another amino acid taining an expression vector constructed from viral
sequence is often inserted ahead of the DNA encoding DNA of the nuclear polyhedrosis virus Autographa
the protein of interest. For example, a portion of DNA californica127 just as a culture of E. coli can be infected
encoding a strong promoter as well as a short segment of with bacteriophage l. If the DNA encoding the protein
the protein that promoter usually controls, such as a seg- of interest is inserted at a point in the viral genome
ment of b-galactosidase or the lcII protein, can be placed under the control of the promoter for the viral coat pro-
in front of the DNA to be expressed to guarantee that it is tein, high yields of the expressed protein are produced.
produced efficiently. In this instance, a stop codon fol- Even higher yields can be produced if larvae (caterpil-
lowed by a start codon can be inserted between the two lars) of Trichoplusia ni are infected with such a virus.128
coding regions so that the fragment of DNA promoting These insect expression systems produce proteins with
transcription is not translated attached to the protein many of the normal posttranslational modifications of
being expressed. It has been found in many instances, animals.129
however, that fusion proteins, proteins in which the To ensure that posttranslational modifications of
protein of interest is coupled during translation to mammalian proteins that are foreign to insects are cor-
another complete protein such as glutathione trans- rectly made or to express a mammalian protein in the
ferase, b-galactosidase, or ubiquitin, are expressed in biological context of a mammalian cell, proteins are
much higher yield than the unfused, intact protein of often expressed in cultured mammalian cells by use of
interest. Often this is due to the fact that the fusion pro- an expression vector carrying a promoter from an animal
tein resists the endopeptidases of the E. coli122,123 that virus, such as cytomegalovirus or simian virus. Such
would otherwise degrade the protein of interest. To iso- expression vectors can be inserted into the genomic DNA
110 Sequences of Polymers
of an animal cell such as Chinese hamster ovary cells or amino acids from the sequence of a polypeptide or insert
murine L cells by transfection. extra amino acids at a particular location with this tech-
When a protein is expressed in any of these expres- nique. The method requires that the complementary
sion systems, the final product of the expression is usu- DNA or genomic DNA for the protein of interest has been
ally a pellet of cells that is then homogenized, producing cloned and that the encoded protein can be expressed, in
a complex mixture of proteins. Even if the expression quantities sufficient for the contemplated experiments.
has been so successful that the protein that has been The site-directed mutation is incorporated into the DNA,
expressed accounts for the majority of the protein in this and the mutated DNA is used to direct the production of
mixture, it still must be purified. This purification is the modified polypeptide in which one particular amino
usually performed by the standard procedures because acid has been deliberately changed. For example, a col-
they are simple to implement, but it is possible to design lection of 13 mutated versions of the lysozyme from T4
the expressed protein to ease its purification. For exam- bacteriophage, in which Threonine 157 had been
ple, the protein can be expressed with a string of six his- changed to 13 of the other 19 amino acids, was produced
tidines attached at its carboxy terminus or amino by site-directed mutation. Each of these 13 different pro-
terminus. An affinity adsorbent to which Ni2+ has been teins was obtained as a pure crystalline product in quan-
attached through a covalently bound iminodiacetic tities sufficient for crystallographic analysis.134
acid130 binds such histidine tails with high specificity, A site-directed mutation can be introduced into a
and the expressed protein can be eluted, often in pure particular segment of DNA by annealing a short piece of
form, with a gradient of imidazole.131 It is also possible synthetic DNA, the mutagenic oligonucleotide, to one
to purify expressed proteins specifically if they have of the two strands of the unmutated DNA to form a
been designed to contain a short amino acid sequence short section of double-helical DNA in which one or
on one of their termini recognized by a specific more of the nucleotide bases are mismatched.132 The
immunoglobulin immobilized on a solid phase. Fusion mutagenic oligonucleotide is designed so that the
proteins between the protein of interest and glu- desired mismatches occur in the middle of the duplex
tathione transferase can be purified by using an affinity formed by the annealing and there are sufficiently long
adsorbent on which glutathione has been covalently regions of complementary nucleotide sequence on each
attached and eluting with glutathione. All of these flank to guarantee that a stable and specific duplex is
strategies require that a short sequence of amino acids formed. The original way this was accomplished is the
or even another protein be fused with the protein of following.
interest, but if a short sequence recognized by a strin- A restriction fragment of the DNA encoding the
gent endopeptidase is incorporated between the two, protein of interest and containing the site to be mutated
the protein of interest can be released in its unmodified is inserted into the genome of an M13 bacteriophage, a
form by digestion. bacteriophage that carries its genome as single-stranded
One advantage of expressing a protein in a system DNA. Infection of a suspension of E. coli with the altered
in which it is produced as a major fraction of the cellular bacteriophage produces virus particles containing the
protein or it has been tagged for affinity adsorption is enlarged genome on a closed, single-stranded circle of
that its purification often requires fewer steps than DNA.135 Closed, single-stranded circles containing the
purification from its natural source. Because the steps of strand of the inserted DNA complementary to the muta-
a purification are often accompanied by slow degrada- genic oligonucleotide are selected133 for hybridization.
tion of the protein, the fewer the steps, the more homo- The mutagenic oligonucleotide is complementary to
geneous will be the final purified protein. Crystals are sequences on this single-stranded DNA except at the
more readily obtained from a protein the purification of central, mismatched positions, chosen to produce the
which has been simple and rapid. For this reason, if they desired change in a particular codon. For example, the
are available in high yield, expressed proteins are usually deoxyribonucleotide sequence –CTCTACTGCGGGTT-
used in crystallographic studies in preference to the TG– occurs in DNA encoding the sequence of tyrosyl-
same proteins purified from natural sources. Often, how- tRNA synthetase from Bacillus stearothermophilus. It
ever, expressing a protein in cells, even in E. coli, pro- encodes the amino acid sequence –LYCGF–, which con-
vides far less of the purified protein than can be obtained tains amino acids 33–37 in the sequence of the intact
by starting with 10 kg of liver, heart, blood, or skeletal protein. The mutagenic oligonucleotide –CAAACCCGC-
muscle. In such instances, if all that is desired is the pure CGTAGAG– was chemically synthesized.136 It is comple-
protein, using an expression system is inefficient and mentary to the coding sequence of the unmutated
costly. If, however, one experimental goal is to mutate complementary DNA except at its tenth residue, which is
specific amino acids in the sequence of the protein, an a C instead of the complementary A. When it was
expression system is unavoidable. annealed to a single-stranded, circular DNA containing
Site-directed mutation132,133 converts one particu- DNA with the unmutated sequence, it formed a short
lar amino acid in the sequence of a polypeptide into self-complementary segment of double-stranded DNA in
another of the 20 amino acids. It is also possible to delete which its C was mismatched with the T of the unmutated
Cloning, Sequencing, Expressing, and Mutating of Deoxyribonucleic Acids 111
sequence. It was this mismatch that eventually produced Site-directed mutations can also be produced by
the mutated DNA with the sequence –CTCTACG- insertion of cassettes of synthetic double-stranded
GCGGGTTTG–, encoding the mutated protein sequence, DNA into a particular complementary DNA. In this
–LYGGF–. method, preexisting or purposely designed restriction
The short mutagenic oligonucleotide sits upon the sites for site-specific deoxyribonucleases that flank the
single-stranded, circular M13 DNA as a primer offering a region to be mutated are chosen. These restriction sites
free 3¢-hydroxyl group. This hydroxyl is used to initiate are designed or chosen so that the piece of double-
the synthesis of DNA by DNA-directed DNA poly- stranded DNA produced by the site-specific deoxyri-
merase.132,133 The enzyme synthesizes a single strand of bonucleases is short and has single-stranded, sticky
DNA upon the circular template until it comes around ends, such as those produced by PstI (CTGCAØG). A
the circle to the 5¢-end of the mutagenic oligonucleotide double-stranded segment of DNA is synthesized so that
where it stops. The newly synthesized, single-stranded it has the appropriate sticky ends and incorporates
circle is then closed with DNA ligase to produce a closed, complementary nucleotide sequences that encode the
double-stranded circle of DNA, completely complemen- desired mutation. This is the cassette, which is then
tary except at the designed mismatch. This double- inserted into the hole in the original complementary
stranded circular DNA is then replicated in a suspension DNA produced by the site-specific deoxyribonucleases.
of E. coli. Half of the resulting viral DNA should contain The advantage of the cassette is that the mutation is
the mutated sequence of the segment of the inserted produced directly by insertion of synthetic double-
DNA because it is the progeny of the single strand into stranded DNA. The disadvantage is that two comple-
which the mutagenic oligonucleotide was incorporated mentary pieces of synthetic single-stranded DNA have
originally. to be synthesized. Nevertheless, mutation with cas-
Plaques produced by the viruses are screened to settes has particular advantages when sets of mutants
locate ones producing the mutated DNA,133 double- are prepared in which all of the possible 19 substitu-
stranded DNA is produced from one of these mutants tions need to be made at a particular location.146 A sim-
and amplified, and the desired restriction fragment con- ilar but much more ambitious strategy is to synthesize
taining the mutation is isolated and reintroduced into fragments of DNA that when ligated together constitute
the original DNA to create full-length DNA incorporating the entire coding sequence for a protein. In this way a
the mutation. The mutant protein expressed from this mutation can be introduced anywhere by synthesizing
full-length, mutant DNA should contain the designated the corresponding fragment that has the altered
substitution. For example, in the case of the mutated sequence at the position to be mutated and ligating it
tyrosyl-tRNA synthetase, it was shown by direct sequenc- with the remaining unmutated fragments.147
ing of the purified protein that it had a glycine rather than One of the supposed drawbacks of site-directed
a cysteine at position 35.136 That the modification has mutation is that only the 19 other natural a amino acids
occurred, however, is usually verified by sequencing the are available for substitution at the mutated site. It is
mutated DNA rather than the protein itself. rather easy to synthesize an a amino acid. A large
Several improvements in the original method for number are available commercially and if one that has
site-directed mutation just described have been made. been drawn on a piece of paper is not available com-
The most important is the adaptation of the procedure so mercially, it can usually be synthesized. It is now possi-
that double-stranded plasmids, rather than single- ble to replace an amino acid at any position in a
stranded M13 DNA, can be mutated directly.137 Another polypeptide with any one of these unnatural amino
improvement has been the development of strategies acids. To do this, advantage is taken of the fact that
permitting the removal of the parental unmutated there are three stop codons for translation: UAA (ochre),
strands of DNA that served as the template for the muta- UAG (amber), and UGA. A rare tRNA, the amber sup-
tion so that all of the newly synthesized DNA carries the pressor tRNA, reads the codon UAG and normally
mutated sequence,138–141 increasing the percentage of the inserts phenylalanine at that position. The triplet encod-
product that bears the mutation. A related method that ing the chosen amino acid in the coding sequence of the
also selects for DNA bearing the mutation is to use two protein is mutated by usual site-directed mutation to
primers, one that mutates the position of interest and the TAG, and an amber suppressor tRNA (tRNACUA) to which
other that mutates a unique restriction site on the plas- the unnatural amino acid to be inserted has been syn-
mid outside of the DNA inserted into it. In this way only thetically attached is used to effect the desired substitu-
the DNA containing the desired mutation, which also has tion in a cell-free system for transcription and
the mutated restriction site, is immune to cleavage at the translation.148,149 The requirements for chemically syn-
restriction site.142 Finally, the PCR method has been thesizing the derivative of the suppressor tRNA and the
applied to produce mutated DNA.143–145 Because of its low yields of protein from the cell-free translation
importance, many different procedures are now avail- system have limited the application of these procedures,
able for site-directed mutation, and each investigator but in at least one instance protein sufficient for crystal-
believes that the one she is using is the best. lographic studies has been prepared.150
112 Sequences of Polymers
Suggested Reading The alkaline hydrolysis (I) and the enzymatic digestions
(II–V) were carefully controlled so that only a small
Chaiyen, P., Ballou, D.P., & Massey, V. (1997) Gene cloning, amount of cleavage occurred at each sensitive position.
sequence analysis, and expression of 2-methyl-3-hydroxypyri-
The five mixtures were then placed in adjacent lanes on
dine-5-carboxylic acid oxygenase, Proc. Natl Acad. Sci. U.S.A.
94, 7233–7238. a polyacrylamide gel and submitted to electrophoresis
followed by autoradiography. A tracing of that autoradi-
Foulon, V., Antonenkov, V.D., Croes, K., Waelkens, E., Mannaerts,
G.P., VanVeldhoven, P.P., & Casteels, M. (1999) Purification, ogram is presented below. An autoradiogram only regis-
molecular cloning, and expression of 2-hydroxyphytanoyl-CoA ters radioactive fragments.
lyase, a peroxisomal thiamine pyrophosphate-dependent
enzyme that catalyzes the carbon-carbon bond cleavage during
a-oxidation of 3-methyl-branched fatty acids, Proc. Natl Acad
Sci. U.S.A. 96, 10039–10044.
PvuII 1100
1950 1950 HindIII
1800
700
PvuII
1170 1170
HindIII 900
KpnI
EcoRI fragment 1500
750 600
DdeI PvuII
SalI fragment 820 1200
HindIII BamHI
70 2000 2000
800
BamHI 3900
PvuII 290 EcoRI fragment
460 1200
170 Construct a restriction map.
1950
DdeI Posttranslational Modification
2020
70 With the exception of the evanescent Na-formyl group on
its amino terminus and perhaps the 21st primary amino
acid, selenocysteine,152 the infant polypeptide as it
1170 emerges from the peptidyltransferase site on the ribo-
PvuII DdeI
SalI fragment 1340 some is a polymer containing only the 20 natural amino
170 acids. Each amino acid is coupled to its neighbors by the
amides of the peptide backbone, and the amino acids are
arranged in the sequence encoded by the particular mes-
750 senger RNA. It is this covalent structure and only this
DdeI
1040 covalent structure that can be read by the investigator
290 from the sequence of the messenger RNA or genomic
DNA. The covalent structures of many proteins, however,
Construct a restriction map.
do not remain in this untouched state but are biologically
modified. A posttranslational modification is any
change in the covalent structure of a polypeptide that
occurs after its emergence from the ribosome.
Problem 3–11: A piece of double-stranded DNA about
Although a thiopeptide bond
5300 base pairs in length has been produced by the
action of the site-specific deoxyribonuclease EcoRI. S
When this fragment was digested with the site-specific H
deoxyribonucleases HindIII, KpnI, and BamHI, the fol- N
N
lowing results were obtained. The numbers are the HH H
O
approximate lengths of the fragments.
3–2
BamHI 800
2000 has been observed at Glycine 445 of coenzyme-B sul-
KpnI
1200 foethylthiotransferase,153,154 most posttranslational mod-
2900 ifications of the polypeptide backbone result from
BamHI endopeptidolytic cleavage or covalent rearrangements.
900 900 Modifications of the original covalent structure of
BamHI the polypeptide are performed naturally by cellular
700 700
HindIII
1300
KpnI endopeptidases. Such normal editing of the amino acid
EcoRI fragment BamHI
600 600 sequence of the protein must be distinguished from arti-
factual degradation by endopeptidases that can occur,
1100
KpnI 1100 BamHI 1100 for example, during the purification of a protein. In the
course of a normal, natural modification, the polypep-
tide of a particular protein is cleaved internally, either as
a mechanism for controlling its enzymatic activity or for
architectural purposes. An example of the former is the
114 Sequences of Polymers
activation of endopeptidases in the pancreatic secretions and oxazolines in microsin.160,161 Another is the self-cat-
or the serum by internal cleavages by endopeptidases.155 alyzed posttranslational modification that cleaves the
An example of the latter is the trimming of folded proin- polypeptides of human S-adenosylmethionine decar-
sulin to produce insulin. As in the production of insulin boxylase162 between Glutamine 67 and Serine 68¢, histi-
from proinsulin, a number of other hormones are pro- dine decarboxylase from Lactobacillus163,164 between
duced by endopeptidic cleavage at –Lys-Lys– or –Arg- Serine 81 and Serine 82, and aspartate 1-decarboxylase
Lys– positions in the sequence of longer precursors.156 from E. coli165 between Glycine 24 and Serine 25, in each
For example, corticotrophin, b-lipotropin, g-lipotropin, case producing a pyruvated amino terminus.
b-endorphin, a-melanocyte-stimulating hormone, and
g-melanocyte-stimulating hormone are all cut from the O
same precursor 265 aa in length.157,158 Following the ini-
tial endopeptidolytic, posttranslational cleavage, the N
H OH
new amino terminus and carboxy terminus can be fur-
ther digested by exopeptidases.111 ”O H”: ’O :O‘
O OH
Almost all of the proteins of animals are posttrans-
lationally shortened by the removal of one or more of the
H O ““ H O
N N H H
1 HN
H
amino acids from their amino terminus, but some pro-
:
H H
teins have particular segments removed from their N N
O H H
’
H H
“
amino termini as they are passed from one compartment
in the cell to another compartment. These amino-termi-
nal signal sequences159 address the proteins to the
proper locations, and their removal is presumably O O
involved in keeping them there. These successive
N N
removals of portions of the amino-terminal sequence H OH H OH
have led to the terms pre-proprotein and proprotein.
’O O‘ ’O O‘
:
:
There is a set of posttranslational modifications ± H 2O
involving cysteines, serines, threonines, asparagines, 1 “ “) 1 “ “)
and aspartates that result in rearrangements in the ± H+ ±H N
3
covalent structure of the polyamide backbone of a pro- H H CH3
tein or self cleavage of its backbone. These five amino O O
acids promote these modifications because they place H O
N
either a nucleophile or an electrophile four atoms away H N N
from either an electrophilic acyl carbon or a nucleophilic H H
amide nitrogen, respectively. Thus, the chemistry (3–9)
involved is the chemistry of five-membered heterocycles.
Almost all of these posttranslational modifications are The first step in this reaction is a five-membered tetrava-
catalyzed intramolecularly by the protein itself. lent intermediate as in the first step of Equation 3–8, but
Cysteines, serines, and threonines have their the amine leaves the intermediate rather than water to
nucleophilic oxygens or sulfurs four atoms away from the produce the intermediate ester, which has been
acyl carbon of their amino-terminal neighbor. One observed crystallographically.162,165 This step is an exam-
example of a consequence of this spacing is the post- ple of an N Æ O acyl migration. The next step in this
translational modifications that produce thiazolines reaction utilizes the superior ability of carboxylate as a
leaving group to effect the dehydration ultimately pro-
ducing the pyruvyl group. The oxygen of the original
serine ends up in the carboxylate of the new carboxy
O
R O R H terminus produced in the reaction.166 An a-ketobutyryl
H group167 is found as an acyl substituent at the amino
N
N 1 N ””:
:
produce the break in the polypeptide between amino asparaginyl peptide bonds.172 Both an aspartylimide and
acids 205 and 206. In hedgehog protein from D. an isoaspartyl peptide bond have been observed crystal-
melanogaster, the thioester resulting from an N Æ S lographically at the position of Aspartate 101 in hen
migration of the polypeptide at Cysteine 258 is transes- lysozyme, which precedes Glycine 102,174 and an isoas-
terified onto the hydroxyl group of cholesterol, which partyl peptide bond has been observed crystallographi-
takes the place of the water that would hydrolyze the cally at the position of Asparagine 67 in bovine
ester.170 In the process, the polypeptide is cleaved pancreatic ribonuclease, which precedes Glycine 68.175
between Glycine 257 and Cysteine 258, and the Because the hydrolysis of an aspartyl imide can lead to
cholesterol ends up as a posttranslational modification the replacement of an asparagine with an aspartate still
esterified to the new carboxy terminus. in a normal peptide bond, this reaction may be responsi-
An asparagine or aspartic acid places an elec- ble for the deamidation observed at particular sites in
trophile four atoms away from the amide nitrogen of its some proteins.176,177 Because the aspartylimide racem-
carboxy-terminal neighbor. This can lead to the produc- izes more rapidly at its a carbon than does either of the
tion of an aspartyl imide, an isoaspartyl peptide bond, or amides,172 this process also introduces D-aspartates into
an aspartate where there was an asparagine.171,172 the polypeptide.
Both the unnatural isoaspartyl peptide bonds and
aspartyl imide the D-aspartates are recognized by a repair enzyme that
O methylates their free carboxylates. This methylation
O O reinitiates the formation of the aspartyl imide, which can
NH2 O O spontaneously racemize and hydrolyze to produce
O H L-aspartate in a normal peptide bond, thus repairing the
N 1
± NH
N
R problem.178–181 Only a fraction of the imide racemizes
N 3 N
H H O before it hydrolyzes, and when it hydrolyzes the isoas-
O R partyl peptide bond is the favored product, but if only the
1
The intein can be one continuous folded polypep- acyl migration (Equation 3–9) of the amino-terminal seg-
tide connecting the amino-terminal segment preceding ment of the protein occurring at the serine, threonine, or
it to the carboxy-terminal segment following it, or it can cysteine on the amino-terminal side of the intein
be the carboxy-terminal segment of one folded polypep- (Equation 3–11).191,192 The amino-terminal segment is
tide that is bound noncovalently to the amino-terminal then passed by a transesterification to the oxygen or
segment of a second folded polypeptide.187,188 In the sulfur of the serine, threonine, or cysteine on the
latter instance, the intein, after it has been spliced out, is carboxy-terminal side of the upstream splice site
two folded polypeptides bound to each other but the (Equation 3–11) to form a branched intermediate in
other product is still one continuous, spliced polypeptide which the intein and the carboxy-terminal segment are
formed from the amino-terminal segment of the first still joined together and the amino-terminal segment is
folded polypeptide and the carboxy-terminal segment of esterified to the serine, threonine, or cysteine.193,194 In the
the second. The intein always begins with a serine, thre- next step of the reaction, the peptide bond to the car-
onine, or cysteine and ends with a histidinyl asparagine, boxy-terminal side of the asparagine is cleaved (Equation
and the carboxy-terminal segment always begins with a 3–12) to produce the free intein with an unsubstituted
serine, threonine, or cysteine.189 aspartyl imide at its carboxy terminus.189 The peptide
An even more extensive set of similar self-catalyzed bond between the amino-terminal segment and the
posttranslational rearrangements occurs in con- carboxy-terminal segment is then formed by the respec-
canavalin A from Canavalia ensiformis. After the initial tive O Æ N or S Æ N acyl migration. The amino-terminal
polypeptide is produced by the ribosome, the a-amido and carboxy-terminal splice sites sit next to each other in
group of Serine 30 couples to the a-acyl group of the folded protein to permit all of these rearrangements
Asparagine 281 in place of the a-amido group of to occur in close proximity.188,195,196
Glutamate 282, releasing the amino-terminal 29 amino In the rearrangement of concanavalin A, Glutamate
acids preceding Serine 30 and the carboxy-terminal nine 282 in the asparaginylglutamate is replaced by Serine 30
amino acids following Asparagine 281 as two short pep- to produce an asparaginylserine. The first step in this
tides, and the polypeptide is cleaved to the carboxy-ter- reaction is probably the cleavage of the peptide bond of
minal sides of Asparagines 148 and 163, releasing the the asparaginylglutamate at positions 281 and 282 to
intervening 26 amino acids as another short peptide.190 produce the aspartyl imide at the resulting carboxy
The final intact product of the splicing begins at Alanine terminus (Equation 3–12). The following steps in the
164 of the precursor and ends at Asparagine 148. reaction would then be, by analogy to those of intein
The two spontaneous cleavages to the carboxy- splicing, N Æ O migration at Serine 30, attack of the
terminal sides of Asparagines 148 and 163 in concanava- a-amino group of Serine 30 on the aspartyl imide of
lin A are thought to result from an attack of the amide Asparagine 281, and hydrolysis of the ester between the
nitrogen of the asparagine on its own acyl carbon a-carboxyl group of Serine 29 and the hydroxyl group on
the side chain of Serine 30.
O The posttranslational modifications of the back-
O
bone of the initially synthesized polypeptide that are
H
:
protein went unrecognized until the complementary addition to n-tetradecanoic acid.211 Such chemical
DNA encoding it had been sequenced.190,198 demonstrations of a modification at the amino terminus
The amino terminus of a polypeptide can be of a polypeptide should be distinguished from an unsup-
N-methylated,199,200 N-2-pyruvylated,201 or N-acylated, ported conjecture that the amino terminus is blocked
either intramolecularly, as in pyroglutamate (Figure when the Edman degradation fails.
3–16), or externally, as when it is N-formylated,202 The carboxy terminus of a polypeptide can also be
N-acetylated,203 or N-glucuronylated.204 Enzymes are modified, for example, as the primary amide (Figure
available that hydrolyze pyroglutamyl groups205 or 3–16), the tyrosyl amide,212 or the methyl ester.213 In at
remove acetyl groups.206 In a murein lipoprotein from least one instance,214 the primary amide at a carboxy
bacterial outer membrane207 and ubiquinol oxidase terminus is produced from a carboxy-terminal glycine
(cytochrome bo3) from E. coli,208 each of the respective that is first monooxygenated and then decomposes with
amino-terminal cysteines is N-acylated by a fatty acid at the loss of glyoxylate to leave behind its former amino
its a-amino group and its sulfur forms a thioether with group as the carboxy-terminal amide.
carbon 3 of a 1,2-diacylglycerol. n-Tetradecanoyl amides
of amino termini (Figure 3–16)209 were first found on pro- O O HO
HH H
tein kinases. The existence of these fatty acylated amino
termini was established by isolating an amino-terminal N COO- + "2 O2
1
1 N COO-
peptide, CH3(CH2)12COHNGly-Asn-Ala, from cAMP- H H
dependent protein kinase and confirming its structure
by chemical degradation and by mass spectrometry with O
H
fast-atom bombardment.210 By similar procedures it was
shown that recoverin was acylated at its amino terminus 1 N
H + O COO-
with a mixture of n-dodecanoic acid, cis-n-tetradec- H
5-enoic acid, and cis,cis-n-tetradeca-5,8-dienoic acid in (3–13)
OMan
OH
H O
OOH O -
H O
-
O P O H
O mannose O N O
OH
H ethanolamine H
OH N N
H O O H
OH
O mannose
EO O
(GalX)O +
H 3N
(Gal)n O H OH
mannose O O
HO H O
H
glucosamine H2N O-
OH O P H O CH3
HO O CH3
HO O O
H O
glycerol fatty acids
inositol O
OH
Figure 3–17: Structure of the linkage between phosphatidylinositol and the carboxy terminus of a polypeptide in a phosphatidylinositol-
linked protein.228–232 The carboxy-terminal amino acid sequence shown is that for the linkage at the end of the variant surface glycoprotein
MITat.1.4 from Trypanosoma brucei.233 The phosphatidylinositol shown is in the ditetradecanoyl form, but saturated and unsaturated fatty
acids from 12 to 22 carbons in length can be esterified at either position in place of either or both of the tetradecanoyl groups. A variant in
which a tetradecanoyl group is also attached to the inositol has been reported,234 as well as one in which a ceramide replaces the dia-
cylglycerol.230 The phosphatidylinositol is coupled to an unacetylated D-glucosamine, which is coupled in turn through a trisaccharide of
D-mannoses to ethanolamine phosphate, the primary amine of which is attached in amide linkage to the carboxy terminus of the protein. A
variant of the more common structure displayed here has the phosphoethanolamine attached through the 3-position of the middle mannose
rather than the 6-position of the end mannose.235 Within the tetrasaccharide, the position marked (Gal)n is either a hydrogen or an oligosac-
charide of one or more galactosyl groups; the position marked Man is either a hydrogen, an a1-mannosyl group, or a mannosyl disaccha-
ride; the position marked (Gal X) is either a hydrogen, a b1-galactosyl group, or a b1-N-acetylgalactosamyl group; and the position marked E
is either a hydrogen or an (O-ethanolamino)phosphoryl group.
posttranslational modification causes the protein to occur in the various collagens and proteins related to the
adhere to membranes, it is called a glycosylphos- collagens. The modifications producing covalently
phatidylinositol (GPI) anchor. bound coenzymes occur only in proteins using these
In addition to the polypeptide itself and its amino coenzymes to assist in catalysis of particular reactions.
and carboxy termini, posttranslational modification of Some of the other posttranslational modifications, for
the side chain of an amino acid can occur. When it does, example, the quinones of 2,5-dihydroxytyrosine (Figure
the derivative remains an L-a-amino acid residue 3–18),312 6,7-dioxo-4-(2-tryptophanyl)tryptophan
because its carboxyl group and its a-amino group are (Figure 3–18), and dehydroalanine,349 occur only in the
protected by the amides of the backbone. There are active sites of particular enzymes and are designed for
many posttranslationally modified amino acids that have specific functions. 4-Carboxyglutamate (Figure
been identified in naturally occurring polypeptides 3–18)252,253,385,411 is found only in a few of the proteins
(Table 3–1, Figure 3–18). that bind calcium strongly or that are involved in cal-
The length of Table 3–1 gives the erroneous cium metabolism.412 Thyroxin,261–263 O-(3,5-diiodo-
impression that posttranslational modifications are 4-hydroxyphenyl)-3,5-diiodotyrosine (Figure 3–18), is
common. Aside from glycosylation and the phosphory- found only in the protein thyroglobulin,266 wherein it is
lation of serines, threonines, and tyrosines, the inci- formed at two positions by the intramolecular conden-
dence of any of these modifications is quite limited, sation of two pairs of 3,5-diiodotyrosines.413 The sole
often being confined to only one protein or one small function of this large protein (2769 aa) is to produce the
family of proteins. For example, two of the earliest thyroxin, which is then liberated from the protein by its
recognized posttranslational modifications were the complete digestion. Diphthamide, 2-[3-carboxamido-
5-hydroxylysine and 4-hydroxyproline (Figure 3–18) that, 3-(trimethylammonio)propyl]histidine (Figure 3–18), is
with few exceptions, are formed in the posttranslational found only in one of the elongation factors (elongation
monooxygenation of only prolines and lysines that are factor 2) involved in eukaryotic translation.267,269 The
found in segments of amino acid sequence in which attachment of ADP-ribosyl groups to diphthamide, argi-
every third amino acid is a glycine.410 Such sequences nine, and asparagine side chains in one or the other of a
Posttranslational Modification 119
small group of proteins is catalyzed by bacterial toxins, been used to identify posttranslational modifications in
and only proteins in individuals infected with these bac- peptides that have been purified from digests of pro-
teria are modified in this way. There also seem to be teins. The negative-ion mode is used for phosphopep-
enzymes in normal cells, however, that are capable of tides.422
ADP-ribosylating a small number of proteins as part of Electrospray can also be used to produce ions to
their normal operation.331,414–417 feed a tandem mass spectrometer.161,270 Although some
Mass spectrometry is often used to identify these difficulty arises with the multiply charged ions emitted
posttranslational modifications on the side chains of by the electrospray, they can usually be sorted out suc-
amino acids. Electrospray mass spectrometry of a puri- cessfully in the first mass spectrometer because peptides
fied, intact protein is often the first indication that it con- are short enough that only a few ions are produced from
tains a posttranslational modification. Because the each of them.111 For example, a peptide containing cova-
unmodified amino acid sequence of a protein as it is pro- lently bound flavin from fructosyl-amino acid oxidase of
duced by the ribosome is often known even before it has Aspergillus was vaporized by electrospray ionization, the
been purified but usually soon after, any difference ionic molecule of m/z 659 Da was selected in the first
between the molecular mass observed by electrospray quadrupole mass spectrometer, it was fragmented by
mass spectrometry and the mass calculated from the collision-induced dissociation, and the fragment ions
unmodified amino acid sequence indicates a posttrans- produced a mass spectrum in the second quadrupole
lational modification. Such results were the first indica- mass spectrometer of the tandem. The pattern of frag-
tions that the protein Ner of bacteriophage Mu was ments demonstrated that the flavin was covalently
modified at its amino terminus with a pyruvate201 and attached to Cysteine 342 of the protein.423 Such a system
that bovine recoverin was modified at its amino terminus can also be used to identify the locations in the sequence
by one of several different fatty acids.211 Electrospray of a protein at which it is phosphorylated.424
mass spectrometry of peptides purified from a digest of If the posttranslational modification cannot be
rat profilaggrin identified nine phosphopeptides by the identified by its mass or its pattern of fragmentation, it is
fact that their molecular masses were 80 Da greater than usually possible to hydrolyze the polypeptide and liber-
those predicted from their amino acid sequences.418 ate the modified amino acid. Usually the hydrolysis is
Normal direct probe, high-resolution mass spec- performed enzymatically to avoid destruction of the
trometry and mass spectrometry with electron ioniza- modified amino acid that might occur in strong acid or
tion have been used to provide molecular ions and strong base. Enough of the peculiar amino acid is
fragment ions of posttranslational modifications such as purified to perform a proof of its structure by chemical
polyisoprenoids219,296 or 5-mercaptouracil342 that can be analysis.
removed chemically from the amino acid to which they One way in which two or more of the amino acid
are attached. Electrospray mass spectrometry in the neg- side chains in a polypeptide can be modified coinciden-
ative ion mode has been used in a similar way to identify tally is during the formation of a covalent cross-link
the ceramide released from the GPI anchor of the between them or among them. The cross-link can be
arabinogalactan proteoglycan from Pyrus communis.230 intramolecular, connecting two or more amino acid side
Fast-atom bombardment feeding a conventional mass chains in the same polypeptide, or intermolecular, con-
spectrometer has been used to vaporize a bispeptide necting two or more amino acid side chains in different
containing the semicarbazide derivative of 6,7-dioxo- polypeptides. There is no formal distinction between
4-(2-tryptophanyl)tryptophan and obtain a high-resolu- these two outcomes because the linkage is invariably
tion mass spectrum with a molecular ion of 940.3262 Da made after the polypeptides have folded into their native
which was of sufficient precision to calculate a molecular structure and, subsequently, formed specific intermole-
formula for the modification.385 cular complexes among themselves. This folding and
Fast-atom bombardment or matrix-assisted-laser- intermolecular assembly is what brings the two or more
desorption ionization feeding a tandem mass spectrom- amino acid side chains that will be cross-linked into
eter can be used to vaporize posttranslationally modified atomic contact with each other. Therefore, it is irrelevant
peptides purified from a digest of a protein, sort the whether the amino acid side chains started out on the
molecular ions in the first mass spectrometer, fragment same polypeptide or different polypeptides or whether
those ions, and then separate the fragments in the they are at positions within the amino acid sequence of
second mass spectrometer. The resulting pattern of frag- the same polypeptide that are close to or distant from
ments is often sufficient to identify the posttranslational each other. The only deciding factor is that they are
modification. Peptides containing an a-hydroxyg- immediately adjacent to each other in the final structure
lycine,324 an 8a-(N3-histidyl)flavin mononucleotide,419 of the mature protein.
and an N-acetyl-O-phosphothreonine420 were analyzed A simple example of a covalent cross-link is an
in this way. Matrix-assisted-laser-desorption ionization amide between a lysine side chain and a glutamate side
feeding a time-of-flight mass spectrometer in either the chain. Such a cross-link is formed from a glutamine
positive-ion mode304,305,350 or negative-ion mode421 has side chain and a lysine side chain, both within a
120 Sequences of Polymers
Table 3–1: Posttranslational Modifications of the Side Chains of Amino Acids in Proteins236
O O O- OH OH CH3
O C O HO H
P I I OH N N(
- C O- HO H CH3
O HO O O H
O
H H N
N I N H
N N
N O
H O H O
O O H
O -phosphotyrosine H N
4-carboxyglutamate I N
O N
H N H
H N H O
N O
N N w,N w-dimethylarginine
N(CH3)3+ H O 2-(1-mannosyl)tryptophan
O N O -(3,5-diiodo-4-hydroxyphenyl)-
H NH2 3,5-diiodotyrosine
N (thyroxine) OCH3
N O
H H O
O O H3C N O H
2-[3-carboxamido- N
3-(trimethylammonio)propyl]histidine O O N
(diphthamide) O N H
O
O -palmitoylthreonine H
O -methylaspartate
NH3+
+H N HO
3
OH NH2
O O-
H O H N O P O P O
NH2+ N N N
N N - O
H O O O
O H O N N NH
O
HO OH
5-hydroxylysine O H H
O H N N(
HO OH
N HO O O H
N
H para - quinone of
O 2,5-dihydroxytyrosine Nw -(ADP-ribosyl)arginine
N
H
Ne -(4-amino-2-hydroxybutyryl)lysine
H O NH
NH2 N HN
O H H
H
O H
N S O O
N N N
N O N
H H NH N
O O O N N
P N
N O
dehydroalanine O H
O O- O
O OH
H 2-(S-cysteinyl)histidine
N HO OH 4-hydroxyproline
N
H O H
O O -(5'-adenylyl)tyrosine
N
N
O HN O
N H O
H H I
H N O O
N OH H
O H3C N H
NH N HO
O H N
N I N OH
O O H N N O
N HO H
N
O H H O H
O HO
N OH N
N 3,5-diiodotyrosine N
H H
O OPO32- O
6,7-dioxo-4-(2-tryptophanyl)tryptophan 8a -(N 3-histidyl)flavinmononucleotide 3-(3-tyrosyl)tyrosine
Figure 3–18: Posttranslational modifications of amino acid side chains in the interior of a polypeptide.
122 Sequences of Polymers
HN HN HN
H
( O2 (
O NH3 protein- O O O NH3
lysine
6-oxidase 1) procollagen 2) protein-
lysine lysine
5-dioxygenase 6-oxidase
HN aldol
H condensation
O O HN OH
H
HN OH NH O
O
O
O
O H
aldols, imines
HN dehydrations, etc.
1) imine
NH2 2) dehydration
O
H2O
HN NH
O
O
HN H ’N H
O O
1) Michael
addition
2) imine
O 3) enamine N O H
H H
N N
O
H H H H
N N N N
H
O O O O
aromatization
:
N N(
–H-
N N
H H
O O
a desmosine
Figure 3–19: Examples of the formation of four of the more than 25 cross-links initiated by the formation of 6-deamino-6-oxolysine in col-
lagen by protein-lysine 6-oxidase. Shown is an aldol condensation to produce the first cross-link. The b-hydroxyaldehyde can then form an
imine with a lysine that dehydrates to an a,b-unsaturated imine, a product that cross-links three amino acids. The enol of another aldehyde
can add to the a,b-unsaturated imine, and the initial enamine can condense to an imine with the carbonyl of the aldehyde. This forms a dihy-
dropyridine that cross-links four amino acid residues. The pyridinium cation formed upon oxidation of the dihydropyridine is a desmosine
linking the four amino acid residues. Upper right corner: 6-Deamino-5-hydroxy-6-oxolysine formed by the consecutive action of procolla-
gen-lysine 5-dioxygenase and protein-lysine 6-oxidase produces an a-hydroxyaldehyde, which is susceptible to an even more complicated
set of modifications.
124 Sequences of Polymers
”
O O
O
””:
R S
”:
R S ) H H
”:
S”
H ’ N N
””:
N H H
R S ) ’ ” S R + ” S
’
”
S
”:
”:
”: O O
’S‘ R S
”:
O ± RSH ± H+
””:
SH cysteine
”:
S”
N 1 1
H N N
H H
cystine
H cysteine
N O H H
N O N O
Figure 3–20: Reduction of a cystine side chain by disulfide interchange. The cystine connecting two segments of polypeptide is exposed to
an external thiolate (RS-) that displaces the cysteinyl anion by nucleophilic substitution and is in turn removed by another thiolate.
equilibrium constant445,446 for disulfide interchange trode430 or colorimetrically after they have been nucle-
between its disulfide (taking the place of RS-SR in ophilically cleaved with tributylphosphine.457 The
Equation 3–14) and the disulfide in a normal extracyto- bis(phenylthiohydantoin) of cystine (Figure 3–1) dis-
plasmic protein [prot (S-S) in Equation 3–14] is about plays a unique relative mobility on the high-pressure
10–3. Unlike synthetic 2,3-dihydroxy-1,4-dithiobutane or liquid chromatograms used to identify the products
the natural protein thioredoxin, both of which can cleave from the steps of automated Edman degradation.458,459
disulfides in native proteins, protein disulfide-isomerase Peptides containing cystine that have been purified
forms them. from a digest of a protein can also be positively identi-
The identification of the two cysteine side chains fied by mass spectrometry.460,461 The advantage in this
that are connected in a particular cystine in a native pro- instance is that the gaseous molecular ion of the bis-
tein requires that a peptide containing only those two peptide, necessarily containing the intact cystine, is
cysteines still joined as the cystine be isolated from a observed directly. The presence of a cystine within a
digest of the protein.447–450 Before the protein is unfolded peptide can be established by mass spectrometry
or digested, however, it must be treated with an alkylat- because the mass of the peptide gradually increases by
ing agent such as N-ethylmaleimide under conditions 2 Da as a result of photoreduction during successive
capable of capping off all the free sulfhydryls in the shots from the laser during its vaporization.462
preparation, which if left unalkylated would catalyze
disulfide interchange (Figure 3–20) and thereby scram- Suggested Reading
ble the disulfides.448 Ideally, the peptides with intact cys-
Carr, S.A., Biemann, K., Shoji, S., Parmelee, D.C., & Titani, K. (1982)
tine side chains used to identify the cysteines involved n-Tetradecanoyl is the NH2-terminal blocking group of the cat-
should be two short peptides held together by the cystine alytic subunit of cyclic AMP-dependent protein kinase from
itself. For example, one of the peptides from ribonucle- bovine cardiac muscle, Proc. Natl Acad. Sci. U.S.A. 79,
ase isolated from a digest of the protein performed with 6128–6131.
pepsin, trypsin, and chymotrypsin was composed of the Haniu, M., Horan, T., Arakawa, T., Le, J., Katta, V., Hara, S., &
two smaller peptides NGQTNCYH and NVACK, cova- Rohde, M.F. (1996) Disulfide structure and N-glycosylation sites
lently coupled by a cystine between the two cysteine side of an extracellular domain of granulocyte-colony stimulating
chains.447 From this result it could be concluded that, in factor receptor, Biochemistry 35, 13040–13046.
native ribonuclease, Cysteine 65 is coupled as a cystine
with Cysteine 72. Problem 3–12: Assume that a protein containing an
The three digestions used in the experiments just intein has a cysteine at the amino terminus of the intein
described served the purpose of producing bispeptides and a serine at the amino terminus of the carboxy-termi-
containing cystine that were as small as possible. This nal segment. Write the mechanism for intein splicing
precaution avoids the confusion of having several large involving the initial N Æ S migration, an S Æ O migra-
peptides interlaced by multiple disulfides451 into one tion, cleavage of the peptide bond between the intein
large, intractable peptide. This problem, however, is and the carboxy-terminal segment to produce the
sometimes unavoidable, as in the case of thrombomod- unsubstituted aspartyl imide, and the final O Æ N migra-
ulin, in which three cystines occur within a short tion to produce the new peptide bond.
sequence of 14 amino acids and from which individual
peptides containing each of them could not be obtained. Problem 3–13: Draw the structure of a polypeptide with
In this case, the linkages were assigned452 by following an amino-terminal cysteine residue the a-amine of
the rates at which the individual cysteines appeared as which is acylated with palmitate and the sulfur of which
the cystines were slowly cleaved with the nucleophile forms a thioether with C3 of a 1,2-dipalmitoyl-3-deoxy-
tris(2-carboxyethyl)phosphine.453 glycerol.
Because oxidation states of cysteine (Figure 2–8)
other than cystine as well as covalent modifications of Problem 3–14: A remarkable feature of the enzyme glu-
cysteine454 revert to yield free cysteine upon addition of a tamate–ammonia ligase from E. coli is that its catalytic
thiol such as 2,3-dihydroxy-1,4-dithiobutane, the indi- properties depend on the conditions of growth under
rect assignment of a cystine based solely upon the which the E. coli from which it is purified were grown.
appearance of free cysteine after the addition of a thiol The enzyme purified from E. coli grown on NH4Cl and
cannot be trusted.455,456 glucose (Type I) is less sensitive to inhibition by AMP
Procedures have been developed to assist in the than is the enzyme purified from E. coli grown on gluta-
analysis of peptides containing cystine. Sensitive mate and glycerol (Type II). The Type II enzyme can be
methods have been described for continuously moni- converted into Type I enzyme if it is treated with snake
toring chromatograms of digests performed with venom phosphodiesterase.463
endopeptidases to detect peptides containing intact When Type II enzyme was digested with snake
cystine side chains either electrochemically after they venom phosphodiesterase and subsequently precipi-
have been reductively cleaved on the surface of an elec- tated out of solution with trichloroacetic acid, the super-
126 Sequences of Polymers
cycle 1 2 3 4 5 6
Problem 3–15: Write mechanisms for the formation of phenylthiohydantoins D,G G,W C*,F,I D E,S F
the following posttranslational modification found in
cycle 7 8 9 10 11
histidine ammonia-lyase464
phenylthiohydantoins D,W I,S C*,N E,P E,G
CH2
*Bis (phenylthiohydantoin) of cystine.
N O
H How are the cysteines linked to form the two cystines in the
H N N peptide? What unexpected cleavage did trypsin produce?
N
O
H HH O
H3C
Oligosaccharides of Glycoproteins
and the following posttranslational modification pro- Living organisms are formed from three types of covalent
ducing the chromophore in red fluorescent protein from polymers: proteins, nucleic acids, and polysaccharides.
Discosoma:465 Polysaccharides used biologically for structural purposes
Oligosaccharides of Glycoproteins 127
or for the storage of carbohydrate occur as long, often A glycoprotein is any protein to which one or more
branched, uniform polymers of monosaccharides oligosaccharides are covalently attached. To define the
(sugars). Examples of polysaccharides would be agarose complete covalent structure of a glycoprotein, not only
(Figure 1–7), cellulose (Figure 1–2), starch,467 hyaluronic the amino acid sequence of the protein but also the
acid, chitin, and glycogen. Although the reducing ends of points of attachment and the sequences of the monosac-
these polysaccharides are sometimes attached covalently charide in the oligosaccharides must be established.
to particular proteins,468 this fact is secondary to their Some of the rarely occurring oligosaccharides and their
biological roles. Oligosaccharides are shorter, more sites of attachment have been listed along with the other
heterogeneous oligomers of monosaccharides. Oligo- posttranslational modifications in Table 3–1. The more
saccharides are frequently attached to recently synthe- commonly encountered oligosaccharides in glycopro-
sized polypeptides as posttranslational modifications of teins from animals and plants, however, are branched
serines, threonines, or asparagines. Such posttransla- oligomers attached through N-acetylglucosamine to
tional modifications produce glycoproteins. asparagine side chains or through N-acetylgalac-
”
3–21, 3–22, and 3–23).469–472 HO
The monomers from which these oligosaccharides O OH
are constructed are monosaccharides. The eleven major HO OH
monosaccharides that are found in the oligosaccharides HO
of glycoproteins are D-mannose, D-galactose, D-glucose,
N-acetyl-D-glucosamine, N-acetyl-D-galactosamine, the HO
sialic acids, D-glucuronic acid, L-fucose, L-rhamnose, O
D-xylose, and L-arabinose (Figure 3–24). Several of the H OH
OH
monosaccharides in glycoproteins can be O-sul- ú
± H+, H O
OH
fated,473,474 and mannoses and N-acetylglucosamines can 2 ”O
”
:
be O-(2-aminoethyl)phosphonylated.475 O OH
A variety of different sialic acids are known (greater HO OH
than 40) that are derivatives of either D-neuraminic acid HO
(Figure 3–24) or the closely related D-5-deamino- (3–20)
5(S)-hydroxyneuraminic acid (2-keto-3-deoxy-D-glycero-
D-galacto-noninic acid; 3-deoxy-D-glycero-D-galacto- Branching of the oligosaccharide (Figures 3–21 to 3–23)
nonulosonic acid).476,477 These two anionic mono- occurs whenever two or more of the hydroxyl groups on
saccharides are modified variously by N-acetylation, one of the monosaccharides participate in glycosidic
N-glycolylation, O-lactylation, O-sulfation, O-methyla- linkages.
tion, O-phosphorylation, and O-acetylation.478 Each of the oligosaccharides on a glycoprotein can
The covalent bonds that link the monosaccharides be thought of as beginning at the monosaccharide that is
together are those of acetals and occasionally ketals. attached to the polypeptide (the reducing end). The
Glycosidic linkages are the bonds in these acetals and point of attachment is either an O-glycosidic linkage,
ketals formed between the only carbonyl carbon on each formed between the carbonyl carbon of this initial
monosaccharide, enclosed within a pyranose ring or a monosaccharide and the hydroxyl group of a serine or
furanose ring as a hemiacetal, and one of the hydroxyl threonine side chain, or an N-glycosidic linkage, formed
groups of the preceding monosaccharide in the between the carbonyl carbon of this initial monosaccha-
oligomer. A glycosidic linkage is formed between the ride and the amide nitrogen of an asparagine side chain.
oxocarbenium cation of the pyranose or furanose and a The first monosaccharide in an oligosaccharide attached
lone pair of electrons on a nitrogen or an oxygen. An to a serine or a threonine is usually N-acetylgalac-
example would be tosamine; the first sugar in an oligosaccharide attached
to asparagine is almost always N-acetylglucosamine.
H ’ Peripheral to this initial monomer, the oligomer will be
H+ H O‘ found to branch at several points and end at each of
HO ( several unsubstituted monosaccharides that occupy the
” + glucose H O OH
”
O-
O Figure 3–23: Covalent structure of one of the oligosaccharides attached
O H through serine and threonine to human colonic mucin.470 This is an example of
OH an O-linked oligosaccharide.
HO
O NHHO CH2OH
HO O OH
H
O HO O
O O O
HO CH3 NH
HO OH NH
HO H OH OH
HO OO O O CH3 HN
HO O O O NH
O
O NH OH HO O O
HO O H NH O O
O HN
O NH H CH3 O HO O
HO NH O
OH CH3 HO O H
- O HO OH
O
H CH2OH - O HO
O O
H CH2OH
O
Oligosaccharides of Glycoproteins 129
(1) Sia(a2,6)GalNAcb
(2) GlcNAc(b1,3)GalNAc
(3) GlcNAc(b1,3)GalNAc
|
Sia(a2,6)
(4) Gal(b1,4)GlcNAc(b1,3)GalNAc
(5) Gal(b1,4)GlcNAc(b1,3) GalNAc
|
Sia(a2,6)
(6) GlcNAcb(1,3)Gal(b1,4)GlcNAc(b1,3)GalNAc
(7) Sia(a 2,6)Gal(b1,3)GlcNAc(b1,3)
|
Sia(a 2,6)Gal(b1,3)GlcNAc(b1,6) Gal(b1,4)GlcNAc(b1,3)GalNAc
|
Sia(a 2,6)
a
Only 7 of the 13 oligosaccharides isolated are tabulated. bFor abbreviations see Figure 3–27.
oligosaccharides were isolated from human colonic most common of these oligosaccharides can be divided
mucin.470 All of the other 12, of which six are presented in into three classes (Table 3–3). The high-mannose
Table 3–2, were incomplete realizations of the largest oligosaccharides begin with two N-acetylglucosamines
(Table 3–2, entry 7). It is possible, however, that this linked (b1,4) to each other, the first attached to an
largest one may be an incomplete realization of an even asparagine in the protein. These oligosaccharides con-
larger oligosaccharide that escaped detection. tain 5–9 mannoses (Figure 3–21). The complex N-linked
Another view of microheterogeneity, in opposition oligosaccharides, because they are biosynthetically
to the view that it is haphazard and purposeless, is that it derived from the high-mannose oligosaccharides, also
has a role in producing many different glycoforms of the begin with two N-acetylglucosamines linked (b1,4) to
same protein. This would increase the functional range each other, followed by three branched mannose
of these proteins and would be advantageous in particu- residues. Beyond this structural core, variable amounts
lar situations. For example, the microheterogeneity of N-acetylglucosamine, galactose, fucose, various sialic
observed in the set of oligosaccharides isolated from acids, and occasionally N-acetylgalactosamine486 are
human colonic mucin (Table 3–2) may permit the attached (Figure 3–22). The O-linked oligosaccharides
oligosaccharides on this glycoprotein to ensnare many begin with an N-acetylgalactosamine linked to a serine
different species of bacteria, each of which binds specif- or threonine on the protein and contain variable
ically to only one or a few oligosaccharide sequences. It amounts of N-acetylglucosamine, galactose, N-acetyl-
is probably the case that microheterogeneity is relevant galactosamine, fucose, and various sialic acids (Figure
in some instances and irrelevant in others. For example, 3–23).
the length and amount of branching in the oligosaccha- High-mannose oligosaccharides occur in all
rides on erythropoetin determines its biological activ- eukaryotes, but those in fungi have differences in linkage
ity,479 but the presence or absence of oligosaccharide on and branching patterns487 from those in plants and ani-
channel-forming intrinsic protein has no effect on its mals and often contain significantly more mannose.488
function.480 Unlike most proteins, most oligosaccharides Most, if not all,489 of the high-mannose oligosaccharides
do not assume a fixed conformation so the involvement from the proteins of plants and animals are incomplete
of microheterogeneity in their biological specificity realizations of one complete, basic structure (Table 3–3,
would be based mainly on differences in sequence. As entry 1).471 This uniformity results from the fact that this
the oligosaccharides, however, become more crowded481 unit is transferred in its entirety to the targeted
or more branched,482 local steric effects become more asparagine side chain on the glycoprotein.484 Then, in a
numerous, and their confinement of the conformation of specific sequence of steps, catalyzed by three exoman-
the oligosaccharide may contribute to differences in bio- nosidases, it is shortened until all of the mannoses in
logical function. (a1,2) linkage have been removed. When only five man-
From an examination of the sequences of the noses remain, the oligosaccharide is then elongated in a
oligosaccharides attached to glycoproteins from animals highly specific sequence of steps by specific glycosyl-
and plants, several generalizations can be drawn. The transferases to produce complex N-linked oligosaccha-
Oligosaccharides of Glycoproteins 131
a
From Chinese hamster ovary cells. bFrom human plasma a1-acid glycoprotein. cFrom blood group A active glycoprotein in human
ovarian cyst fluid.
rides. After the last step of this elongation, other man- charides on certain asparagine side chains will remain as
nosidases remove two more mannoses to leave the three high-mannose oligosaccharides exclusively, while the
found in the mature complex N-linked oligosaccharide. oligosaccharides on other asparagine side chains are
At the end of this process, most of these complex completely converted to complex N-linked oligosaccha-
N-linked oligosaccharides are also incomplete realiza- rides.469 Occasionally, however, a hybrid N-glycan is
tions of one basic structure (Table 3–3, entry 2),471,485,490 encountered, in which one of the branches in one of
but minor differences in the positions on the peripheral these oligosaccharides is of the complex structure while
N-acetylglucosamines and galactoses at which the the other remains of the high-mannose structure,494 pre-
linkages are made have been noted491–493 as well as sumably because the processing on the latter branch was
substitution of the peripheral galactoses with N-acetyl- specifically blocked.
galactosamines.494 Short, repeating units of N-acetylglu- The O-linked oligosaccharides display less uni-
cosaminyl(b1,3)galactose have also been observed formity than the N-linked. This may result from the fact
inserted between the peripheral galactoses and N-acetyl- that they are built up one sugar at a time rather than as
glucosamines of some complex N-linked oligosaccha- intact units.498 The O-linked oligosaccharides drawn in
rides.495 Fucoses are found attached to many of the Figure 3–23 and presented in Table 3–3 include some of
complex N-linked oligosaccharides from animals in the common structural features of this class. By far the
(a1,6) or (a1,3) linkage to one or the other of the most common monosaccharide forming the linkage to
N-acetylglucosamines in their cores496 or in (a1,3) or the serine or threonine is N-acetylgalactosamine, but
(a1,4) linkage to N-acetylglucosamines in their periph- oligasaccharides O-linked through other monosaccha-
eries. Xyloses are found attached to a few of the complex rides have been reported.499,500 The branches are con-
N-linked oligosaccharides from animals497 but many of structed from the basic repeating unit, Gal(b1,3 or
the complex N-linked oligosaccharides from plants471 in b1,4)GlcNAc(b1,3 or b1,4 or b1,6). Branching usually
(b1,2) linkage to the central mannose in the core. occurs at a galactose or at the initial N-acetylgalac-
It seems that, within the same protein, the oligosac- tosamine, rarely if ever at an N-acetylglucosamine. The
132 Sequences of Polymers
basic repeating unit of each branch can begin with either The human intestinal mucin MUC2 is a polypeptide 5159
an N-acetylglucosamine (Figure 3–23) or a galactose aa long.2 Between Cysteine 1375 and Cysteine 1762 and
(Table 3–3). Fucose is found in (a1,4) and (a1,3) linkages between Cysteine 1858 and Isoleucine 4299, there are
to penultimate N-acetylglucosamines in addition to two regions of amino acid sequence that are rich in thre-
(a1,2) linkage to peripheral galactoses. The branches onine (58% of the amino acids) and proline (24%) and are
either end with a galactose of the repeating unit or are thought to contain the majority if not all of the sites for
capped by an N-acetylgalactosamine in (a1,3) or (a1,4) the O-linked glycosylation (Table 3–2), which occurs
linkage. Sialic acids are found in (a2,6) or (a2,3) linkage mainly on threonine.509 The larger of these two regions is
to galactoses or the initial N-acetylgalactosamine. Many made up almost exclusively of 101 consecutive repeats of
variations on these patterns are observed.470,483,501–508 the sequence –TTTTTVTPTPTPTGTQTPTTTPI– with
Often O-linked oligosaccharides are quite short. An only a few substitutions over the entire length of 2323 aa.
example would be NeuNAc(a2,3)Gal(b1,3)[NeuNAc There are about 1100 N-acetylgalactosylthreonyl link-
(a2,6)]GalNAc.501,503 All these regularities seem to result ages in the entire protein,509 and if all of these are con-
from the fact that the sugars are added one at a time from fined to the two regions rich in threonine, about 85% of
the initial N-acetylgalactosamine outward by a limited the threonines in these regions carry oligosaccharides.
set of glycosyltransferases. These enzymes are specific From an examination of the repeating sequence and the
for particular sugars and attach them only to particular fact that each oligosaccharide contains an average of
hydroxyl groups on particular sugars within the growing four monosaccharides,509 one can gain an appreciation
oligosaccharide. of how closely packed these oligosaccharides must be.
Two of the most heavily glycosylated glycoproteins Other mucins also have similar regions rich in threonine
found in animals are the mucins and the proteoglycans. and serine, usually found in repeating sequences510–512
These two types of glycoproteins can contain up to 80% that are also heavily glycosylated.
or 90% carbohydrate by mass, respectively. Proteoglycans are proteins to which particular
The mucins are the glycoproteins that constitute types of regular polysaccharides are attached.
mucus and also coat the surfaces of many types of cells. Proteoglycans are secreted as extracellular matrix and
a
This is the disaccharide monomer constituting the newly synthesized polymer before postsynthetic modification.
b
Epimerization of glucuronic acid at carbon 5 to form iduronic acid, the presence of which distinguishes dermatan sulfate
from chondroitin sulfate. cDeacetylation and N-sulfation (–NSO3–) are both incomplete so that N-acetylglucosamine, glu-
cosamine, and N-sulfoglucosamine coexist in the same proteoglycan. dEpimerization of glucuronic acid at carbon 5 to form
iduronic acid, the presence of which distinguishes heparan sulfate from heparin.
Oligosaccharides of Glycoproteins 133
are the main constituents in such structures as cartilage, Ser/Thr).471,523 The serines and threonines to which
vascular wall, and tendon. Although they often carry the O-linked oligosaccharides are attached, however, are evi-
usual N-linked and O-linked oligosaccharides, by defini- dently not designated by any pattern in the surrounding
tion, they also carry at least one of a class of long poly- sequence of amino acids501 but tend to be clustered in
saccharides formed from repeating disaccharides (Table regions of the polypeptide rich in serines, threonines,
3–4). Each proteoglycan is defined by the repeating dis- and prolines. The ultimate example of this would be the
accharide that forms the polysaccharide that is attached mucin MUC2.
to it. These polysaccharides of repeating units are The chemical or enzymatic cleavage used to release
heterogeneous because of a collection of postsynthetic the several microscopically heterogeneous oligosaccha-
modifications (Table 3–4) that are only partially accom- rides from a particular glycopeptide depends upon the
plished, often concentrated within randomly spaced glycosidic linkage. For N-linked oligosaccharides, endo-
blocks of consecutive monosaccharides along the length glycosidases specific for cleavage within the common
of the polymer. One constant feature of the covalent segment GlcNAc(b1,4)GlcNAcAsn are usually used. For
structure of a proteoglycan is that each of its defining example, mannosyl-glycoprotein endo-b-N-acetylglu-
polysaccharides, except for keratan sulfate,515 is O-linked cosaminidase (endoglycosidase H) catalyzes hydrolysis
to the protein through the oligosaccharide of the glycosidic linkage between two N-acetylglu-
–GlcA(b1,3)Gal(b1,3)Gal(b1,4)xylosylserine.516 cosamines and releases the oligosaccharide missing its
As with those of the mucins, the polypeptides of initial monosaccharide, while peptide-N4-(N-acetyl-
proteoglycans can be quite long. The polypeptide of one b-glucosaminyl)asparagine amidase (peptide:N-glycosi-
of the human chondroitin sulfate proteoglycans is 2293 dase F) cleaves the N-glycosidic linkage between an
aa in length517 and has on average, 12,000 monosaccha- N-linked oligosaccharide and the asparagine on a glyco-
rides in its covalently attached oligosaccharides and protein or glycopeptide.524 Oligosaccharides in N-glyco-
polysaccharides.518 Unlike the mucins, which contain sidic linkage to asparagine can also be released from the
short oligosaccharides densely packed together because glycopeptide by hydrazinolysis525 and reacetylated with
they are on long strings of adjacent threonines, the pro- acetic anhydride. Regardless of the method by which it is
teoglycans contain long polysaccharides that can be released, the aldehyde at C1 of the initial N-acetylglu-
attached to serines at isolated –Gly-Ser– or –Ser-Gly– cosamine in the oligosaccharide is usually reduced to the
sites scattered randomly over the sequence at intervals of primary alcohol with Na[3H]BH4 (Figure 3–25).469 This
about 50 aa.517,519 In at least one of the proteoglycans, reduction eliminates the aldehyde, simplifies the subse-
however, there is the sequence –YS(GS)24L–, to the ser- quent chemistry, and makes the oligosaccharide
ines of which heparin and chondroitin sulfate are radioactive, if it is not so already. Oligosaccharides in
attached.520 O-glycosidic linkage to serine and threonine are usually
The sequence of the monosaccharides in an released from the glycopeptides by treatment with base,
oligosaccharide or polysaccharide on a glycoprotein is which promotes b-elimination (Figure 3–25). The treat-
established by chemical analysis. The starting material in ment with base is performed in the presence of
this analysis is a purified preparation of the glycoprotein Na[3H]BH4 to prevent, by reduction of the aldehyde at
itself. Often, to facilitate the analysis, the oligosaccha- C1, the destruction of the oligosaccharide from its reduc-
rides on the glycoprotein have been made radioactive by ing end and to make the released oligosaccharide
growing cells producing it in the presence of one or two radioactive.
radioactive monosaccharides, for example, [3H]mannose It is at this point that the technical consequences of
and [14C]glucosamine.521 Oligosaccharides attached to a microheterogeneity are experienced. Instead of one pure
glycoprotein are isolated by digesting the protein with oligosaccharide released in a quantity equimolar to the
endopeptidases, purifying the resulting glycopeptides, amount of glycopeptide, a mixture of many oligosaccha-
and releasing the oligosaccharides from these glycopep- rides is produced, each present in a correspondingly
tides by chemical or enzymatic cleavage. The glycopep- small quantity. This mixture is first separated into
tides produced by the digestion are usually separated on neutral and anionic oligosaccharides chromatographi-
a chromatographic system, such as chromatography by cally.491 Chromatographic systems that separate the
reverse-phase adsorption, to separate them on the basis oligosaccharides by molecular exclusion or by anion
of only their amino acid sequence. In this way all of the exchange526 are then used to perform further separa-
oligosaccharides attached to a particular amino acid side tions. Chromatography by molecular exclusion provides
chain in the sequence of the glycoprotein are isolated an indication of the size of each oligosaccharide in the
together.522 From an examination of the amino acid set. After the sialic acids have been removed from the
sequences of a large number of such glycopeptides, it has anionic oligosaccharides by hydrolysis in mild acid and
been concluded that N-linked oligosaccharides from separately analyzed, the composition of each oligosac-
plants and animals are always attached to asparagines charide is determined. This can be done by methanolysis
that have either a serine or a threonine two amino acids under acidic conditions to cleave the acetals and coinci-
further on in the amino acid sequence (Asn-X- dentally form methyl glycosides (Figure 3–26) that are
134 Sequences of Polymers
”
O O O O)
O O H’
1 O O
H
’ Æ
step in the reaction is the
:
) ±H O removal of the proton a to the
O‘
2
”:
”
NH NH NH :N H amide to produce the 1,2-
O H H H O H diaminoenolate, which in
O turn ejects the alcohol to form
CH3 CH3
a dehydroalanine. The reduc-
enolate ing end of the oligosaccharide
is then reductively labeled
with Na[3H]BH4.
OH
HO OH
H N: HO
OH
O H Na[3H]BH4
O O : + H O
O OH
”
NH ” HN H
H :N H 3H
O
O O
CH3 H 3C
dehydroalanine
identified by their mobilities on gas chromatography. It date cleavage by mass spectrometry.228 Oxidation by
is also possible to analyze the composition of the periodic acid can be performed sequentially by the Smith
oligosaccharide directly by submitting it to hydrolysis in degradation (Figure 3–28). This series of reactions takes
acid and separating the resulting monosaccharides and advantage of the lability to acid of a glycosidic linkage at
deacetylated amino sugars on chromatography by anion carbon 1 of a sugar that has been cleaved by periodic acid
exchange (Figure 3–27),480,527 the effluent from which is and the resulting aldehydes of which have been reduced
monitored electrochemically.528 with sodium borohydride. In theory this reaction should
The sequence of each of the purified oligosaccha- be able to cleave sugars sequentially from the ends of the
rides is determined by indirection. A series of chemical branches inward, but in practice only one cycle is usually
and enzymatic reactions is performed on the oligosac- successful because the selectivity for acyclic acetals is not
charide and the outcome of each of these reactions is great.
assessed either directly or by determining the change in A more informative sequence of cleavages can often
composition of the oligosaccharide that occurs. The be performed with exoglycosidases. These are enzymes
results of these various reactions are gathered until in that remove particular sugars from the ends of branches.
their entirety they are consistent with only one of the They are highly specific for the sugar removed, the
many possible structures for the oligosaccharide. This anomeric state of the glycosidic linkage, and sometimes
one structure is then considered to be the actual struc- the location of the hydroxyl group from which the bond
ture. The reactions used in this process are periodate oxi- has been formed. Examples of such exoglycosidases
dation, Smith degradation, treatment with glycosidases, would be b-galactosidase, a-L-fucosidase, b-N-acetylglu-
and methylation. The results from these chemical analy- cosamidase, and exo-a-sialidase. An example of the
ses are often supplemented with nuclear magnetic reso- specificity for the hydroxyl group would be the exo-
nance spectroscopy and mass spectrometry. a-2,3-sialidase from Newcastle disease virus. The release
When sodium metaperiodate (NaIO4) is dissolved of a monosaccharide after exposure of the oligosaccha-
in water at acidic pH (pH 3–6) it forms a mixture of acidic ride to an exoglycosidase is evidence that that monosac-
hydrates referred to as periodic acid (HIO4). Periodic charide was at the end of a branch and attached to it by
acid cleaves polyalcohols such as monosaccharides at a glycosidic linkage of the designated anomeric stereo-
the carbon–carbon bonds between vicinal diols and pro- chemistry. The digestions are usually performed sequen-
duces two carbonyls from the two hydroxyl groups tially. After each of the monosaccharides at the ends of
(Figure 3–28). Both of the hydroxyl groups in the vicinal the branches has been catalogued, each of the shortened
diol must be free for periodic acid to cleave the products of the first round of digestions is then submit-
carbon–carbon bond between them. The disappearance ted to a round of digestion to identify the penultimate
of a monosaccharide during treatment with periodic acid monosaccharides on each branch and so on until the last
demonstrates that, in the intact oligosaccharide, the sugar is released. The products of each round of digestion
sugar that disappeared had at least two adjacent are monitored either chromatographically529 or by mass
hydroxyl groups unbonded in glycosidic linkages. It is spectrometry.530 Several specific endoglycosidases,
also possible to identify the actual products of the perio- which cleave an oligosaccharide internally at particular
Oligosaccharides of Glycoproteins 135
O O H A 3
H
O H O H
N O 4
Amperometric response
N O
“
O O
H+ O “ O( 5
”:
”
”:
O O
CH3 1 O
H
O CH3
O O O O
HN HN
O O B
H3C H3C
2
3 4 5 6
1
( H
N O
:
0 5 10 15
Æ :
“
O H CH3 ” ”
OCH Time (min)
+CH3OH O 3
H
Figure 3–27: Chromatographic analysis of the hydrolysate of a
glycopeptide to quantify its composition of monosaccharides.527 A
– H+ + sample (300 pmol) of a purified, homogeneous glycopeptide (with
”:
”
O O O
OCH3 O O ration, and the hydrolysate was submitted to chromatography
O H (panel A) on a column (0.46 cm ¥ 25 cm) of a medium for anion
“
O N O
the surface of the electrode that produces the current. Standards
“ O
”:
” (25 pmol) were run (panel B) under the same conditions and sepa-
H3C CH3 rately identified as the following monosaccharides: 1, fucose;
2, galactosamine; 3, glucosamine; 4, galactose; 5, glucose; 6, man-
a and b anomers nose. From the areas of the peaks of the standards it could be cal-
Figure 3–26: Acidic methanolysis of oligosaccharides. Protonation culated that the original hydrolysate contained 1.1 nmol of
of the exocyclic acetal oxygen produces a leaving group, the depar- glucosamine, 0.79 nmol of galactose, and 0.75 nmol of mannose.
ture of which gives the planar oxacarbenium cation. Addition of The glucose observed was a contaminant. Reprinted with permis-
methanol to either face of the oxacarbenium cation produces a sion from ref 527. Copyright 1988 Academic Press, Inc.
mixture of the a- and b-anomers of the methyl glycoside.
Æ
HIO4 same structure as the oligosaccharide isolated from the
OH glycoprotein is available, the coincidence of the nuclear
H OH magnetic resonance spectrum of the standard and that of
HO the unknown is proof of the structure of the unknown.534
O OH O O O
+ HO O In the nuclear magnetic resonance spectrum of an
O oligosaccharide, the chemical shift for the resonance of
H NH
O each of the various hydrogens attached to the carbons of
CH3 the monosaccharides is characteristic of the monosac-
charide itself, the carbon it is attached to, and whether or
Æ
fragmentation is not sufficient to define the sequence in Problem 3–19: A polysaccharide, which is a polymer of
which they occur in the oligosaccharide. In the case of the glucose only, is treated in the following way:
linear, unbranched oligosaccharides of proteoglycans
(Table 3–4), however, because the masses of glucuronic polyglucose + NaBH4 Æ V
acid and iduronic acid differ from those of N-acetylgalac- Æ W
V + CH3 I
tosamic and N-acetylglucosamine and because the vari-
ous postsynthetic modifications change the masses of the HCl
H2O + W Æ X
monosaccharides, spectrometry provides significant 60 ∞C
information about sequence.538 X + NaBH 4 Æ Y
Suggested Reading Y + (CH 3 CO)2 O Æ Z
Baenziger, J.U., & Fiete, D. (1979) Structure of the complex Z is a mixture containing the following distribution of
oligosaccharides of fetuin, J. Biol. Chem. 254, 789–795. methylated glucitol acetates.
van Kuik, J.A., de Waard, P., Vliegenthart, J.F.G., Klein, A., Carnoy,
C., Lamblin, G., & Roussel, P. (1991) Isolation and structural methylated glucitol acetate mole percent
characterization of novel neutral oligosaccharide-alditols from
respiratory-mucus glycoproteins of a patient suffering from
1,4,5-triacetyl-2,3,6-trimethylglucitol 81.9
bronchiectasis 2. Structure of twelve hepta-to-nonasaccharides, 2,3-dimethyl-1,4,5,6-tetraacetylglucitol 9.0
six of which possess the GlcNAc b(1 Æ 3)[Gal b(1 Æ 4)GlcNAc 1,5-diacetyl-2,3,4,6-tetramethylglucitol 8.8
b(1 Æ 6)]Gal b(1 Æ 3)GalNAc-ol common structural element, 4-acetyl-1,2,3,5,6-pentamethylglucitol 0.2
Eur. J. Biochem. 198, 169–182.
(A) On average, how many monosaccharides does
each molecule of the polysaccharide contain, how
Problem 3–18: Complete the following reactions: many branch points are there, and how many
nonreducing ends are there?
HOH2C O OCH3
(B) Draw structures of the linkages in the main linear
H H + nHIO4 Æ polymer and the structure of a branch point.
H H
(C) If the polysaccharide were treated with periodic
OH OH
acid, what percentage of the glucose would be
destroyed?
24. Drapeau, G.R. (1980) J. Biol. Chem. 255, 839–840. 56. Canfield, R.E. (1963) J. Biol. Chem. 238, 2698–2707.
25. Choi, K.H., Laursen, R.A., & Allen, K.N. (1999) 57. Hartley, B.S. (1964) Nature 201, 1284–1291.
Biochemistry 38, 11624–11633. 58. Edmundson, A.B. (1965) Nature 205, 883–887.
26. Sela, M., White, F.H., & Anfinsen, C.B. (1959) Biochim. 59. Edelman, G.M., Cunningham, B.A., Gall, W.E., Gottlieb,
Biophys. Acta 31, 417–426. P.D., Rutishauser, U., & Waxdal, M.J. (1969) Proc. Natl.
27. Brattin, W.J., Jr., & Smith, E.L. (1971) J. Biol. Chem. 246, Acad. Sci. U.S.A. 63, 78–85.
2400–2418. 60. Titani, K., Koide, A., Ericsson, L.H., Kumar, S.,
28. Nicholas, R.A. (1984) Biochemistry 23, 888–898. Hermann, J., Wade, R.D., Walsh, K.A., Neurath, H., &
29. Gross, E., & Witkop, B. (1962) J. Biol. Chem. 237, Fischer, E.H. (1978) Biochemistry 17, 5680–5693.
1856–1860. 61. Fowler, A.V., & Zabin, I. (1978) J. Biol. Chem. 253,
30. Jacobson, G.R., Schaffer, M.H., Stark, G.R., & Vanaman, 5521–5525.
T.C. (1973) J. Biol. Chem. 248, 6583–6591. 62. Watt, K.W., Cottrell, B.A., Strong, D.D., & Doolittle, R.F.
31. Witkop, B. (1961) Adv. Protein Chem. 16, 221–321. (1979) Biochemistry 18, 5410–5416.
32. Burstein, Y., & Patchornik, A. (1972) Biochemistry 11, 63. Roberts, R.J. (1983) Nucleic Acids Res. 11, r135–r167.
4641–4650. 64. Chaiyen, P., Ballou, D.P., & Massey, V. (1997) Proc. Natl.
33. Landon, M. (1977) Methods Enzymol. 47, 145–149. Acad. Sci. U.S.A. 94, 7233–7238.
34. Piszkiewicz, D., Landon, M., & Smith, E.L. (1970) 65. Burke, C.C., Wildung, M.R., & Croteau, R. (1999) Proc.
Biochem. Biophys. Res. Commun. 40, 1173–1178. Natl. Acad. Sci. U.S.A. 96, 13062–13067.
35. Charbonneau, H., Tonks, N.K., Kumar, S., Diltz, C.D., 66. Vehar, G.A., Keyt, B., Eaton, D., Rodriguez, H., O’Brien,
Harrylock, M., Cool, D.E., Krebs, E.G., Fischer, E.H., & D.P., Rotblat, F., Oppermann, H., Keck, R., Wood, W.I.,
Walsh, K.A. (1989) Proc. Natl. Acad. Sci. U.S.A. 86, Harkins, R.N., Tuddenham, E.G.D., Lawn, R.M., &
5252–5256. Capon, D.J. (1984) Nature 312, 337–342.
36. Hu, W., Van Driessche, G., Devreese, B., Goodhew, C.F., 67. Wood, W.I., Capon, D.J., Simonsen, C.C., Eaton, D.L.,
McGinnity, D.F., Saunders, N., Fulop, V., Pettigrew, Gitschier, J., Keyt, B., Seeburg, P.H., Smith, D.H.,
G.W., & Van Beeumen, J.J. (1997) Biochemistry 36, Hollingshead, P., Wion, K.L., Delwart, E., Tuddenham,
7958–7966. E.G.D., Vehar, G., & Lawn, R.M. (1984) Nature 312,
37. Bornstein, P. (1970) Biochemistry 9, 2408–2421. 330–337.
38. Titani, K., Koide, A., Hermann, J., Ericsson, L.H., Kumar, 68. Mullis, K., Faloona, F., Scharf, S., Saiki, R., Horn, G., &
S., Wade, R.D., Walsh, K.A., Neurath, H., & Fischer, E.H. Erlich, H. (1986) Cold Spring Harbor Symp. Quant. Biol.
(1977) Proc. Natl. Acad. Sci. U.S.A. 74, 4762–4766. 51 (Pt 1), 263–273.
39. Huang, I.Y., Welch, C.D., & Yoshida, A. (1980) J. Biol. 69. Saiki, R.K., Gelfand, D.H., Stoffel, S., Scharf, S.J.,
Chem. 255, 6412–6420. Higuchi, R., Horn, G.T., Mullis, K.B., & Erlich, H.A.
40. Koide, A., Titani, K., Ericsson, L.H., Kumar, S., Neurath, (1988) Science 239, 487–491.
H., & Walsh, K.A. (1978) Biochemistry 17, 5657–5672. 70. Lundberg, K.S., Shoemaker, D.D., Adams, M.W., Short,
41. Mahoney, W.C., & Hermodson, M.A. (1980) J. Biol. J.M., Sorge, J.A., & Mathur, E.J. (1991) Gene 108, 1–6.
Chem. 255, 11199–11203. 71. Ertl, H., Hallmann, A., Wenzl, S., & Sumper, M. (1992)
42. Shoji, S., Parmelee, D.C., Wade, R.D., Kumar, S., EMBO J. 11, 2055–2062.
Ericsson, L.H., Walsh, K.A., Neurath, H., Long, G.L., 72. Boulter, J., Luyten, W., Evans, K., Mason, P., Ballivet, M.,
Demaille, J.G., Fischer, E.H., & Titani, K. (1981) Proc. Goldman, D., Stengelin, S., Martin, G., Heinemann, S.,
Natl. Acad. Sci. U.S.A. 78, 848–851. & Patrick, J. (1985) J. Neurosci. 5, 2545–2552.
43. Bradshaw, R.A., Garner, W.H., & Gurd, F.R. (1969) J. 73. Kumagai, I., Pieler, T., Subramanian, A.R., & Erdmann,
Biol. Chem. 244, 2149–2158. V.A. (1982) J. Biol. Chem. 257, 12924–12928.
44. Petra, P.H. (1970) Methods Enzymol. 19, 460–503. 74. Olivera, B.M., Baine, P., & Davidson, N. (1964)
45. Folk, J.E. (1970) Methods Enzymol. 19, 504–508. Biopolymers 2, 245–257.
46. Hayashi, R. (1976) Methods Enzymol. 45, 568–587. 75. Fisher, M.P., & Dingman, C.W. (1971) Biochemistry 10,
47. Himmelhoch, S.R. (1970) Methods Enzymol. 19, 508. 1895–1899.
48. Harris, J.I., & Li, C.H. (1955) J. Biol. Chem. 213, 499– 76. Richards, E.G., & Lecanidou, R. (1971) Anal. Biochem.
507. 40, 43–71.
49. Johnson, R.S., & Biemann, K. (1987) Biochemistry 26, 77. Maxam, A.M., & Gilbert, W. (1977) Proc. Natl. Acad. Sci.
1209–1214. U.S.A. 74, 560–564.
50. Fenn, J.B., Mann, M., Meng, C.K., Wong, S.F., & 78. Tamm, C., Hodes, M.E., & Chargaff, E. (1952) J. Biol.
Whitehouse, C.M. (1989) Science 246, 64–71. Chem. 195, 49–63.
51. Barber, M., Bordoli, R.S., Sedgwick, R.D., & Tyler, A.N. 79. Chargaff, E., Rust, P., Temperli, A., Morisawa, S., &
(1981) J. Chem. Soc., Chem. Commun., 325–327. Danon, A. (1963) Biochim. Biophys. Acta 76, 149–151.
52. Surman, D.J., & Vickerman, J.C. (1981) J. Chem. Soc., 80. Lawley, P.D., & Brooks, P. (1963) Biochem. J. 89,
Chem. Commun., 324–325. 127–128.
53. Karas, M., & Hillenkamp, F. (1988) Anal. Chem. 60, 81. Maxam, A.M., & Gilbert, W. (1980) Methods Enzymol.
2299–2301. 65, 499–560.
54. Shevchenko, A., Wilm, M., Vorm, O., & Mann, M. (1996) 82. Temperli, A., Turler, H., Rust, P., Danon, A., & Chargaff,
Anal. Chem. 68, 850–858. E. (1964) Biochim. Biophys. Acta 91, 462–476.
55. Hirs, C.H.W., Moore, S., & Stein, W.H. (1960) J. Biol. 83. Hayes, D.H., & Hayes-Baron, F. (1967) J. Chem. Soc., C,
Chem. 235, 633–647. 1528–1533.
140 Sequences of Polymers
84. Tamm, C., Shapiro, H.S., Lipshitz, R., & Chargaff, E. 111. Resing, K.A., Johnson, R.S., & Walsh, K.A. (1993)
(1953) J. Biol. Chem. 203, 673–688. Biochemistry 32, 10036–10045.
85. Sanger, F., Nicklen, S., & Coulson, A.R. (1977) Proc. Natl. 112. Morgan, D.O., Edman, J.C., Standring, D.N., Fried, V.A.,
Acad. Sci. U.S.A. 74, 5463–5467. Smith, M.C., Roth, R.A., & Rutter, W.J. (1987) Nature
86. Messing, J., Crea, R., & Seeburg, P.H. (1981) Nucleic 329, 301–307.
Acids Res. 9, 309–321. 113. Warren, J.C., Murdock, G.L., Ma, Y., Goodman, S.R., &
87. Saladino, R., Mincione, E., Crestini, C., Negri, R., Di Zimmer, W.E. (1993) Biochemistry 32, 1401–1406.
Mauro, E., & Costanzo, G. (1996) J. Am. Chem. Soc. 118, 114. Kennedy, M.C., Mende-Mueller, L., Blondin, G.A., &
5615–5619. Beinert, H. (1992) Proc. Natl. Acad. Sci. U.S.A. 89,
88. Smith, L.M., Sanders, J.Z., Kaiser, R.J., Hughes, P., 11730–11734.
Dodd, C., Connell, C.R., Heiner, C., Kent, S.B., & Hood, 115. Seely, O., Jr., Feng, D.F., Smith, D.W., Sulzbach, D., &
L.E. (1986) Nature 321, 674–679. Doolittle, R.F. (1990) Genomics 8, 71–82.
89. Prober, J.M., Trainor, G.L., Dam, R.J., Hobbs, F.W., 116. Ferreira, G.C., & Dailey, H.A. (1993) J. Biol. Chem. 268,
Robertson, C.W., Zagursky, R.J., Cocuzza, A.J., Jensen, 584–590.
M.A., & Baumeister, K. (1987) Science 238, 336– 117. Kervinen, J., Dunbrack, R.L., Jr., Litwin, S., Martins, J.,
341. Scarrow, R.C., Volin, M., Yeung, A.T., Yoon, E., & Jaffe,
90. Rosenblum, B.B., Lee, L.G., Spurgeon, S.L., Khan, S.H., E.K. (2000) Biochemistry 39, 9018–9029.
Menchen, S.M., Heiner, C.R., & Chen, S.M. (1997) 118. Zeghouf, M., Fontecave, M., Macherel, D., & Coves, J.
Nucleic Acids Res. 25, 4500–4504. (1998) Biochemistry 37, 6114–6123.
91. Tabor, S., & Richardson, C.C. (1987) Proc. Natl. Acad. 119. Mathis, J.R., Back, K., Starks, C., Noel, J., Poulter, C.D.,
Sci. U.S.A. 84, 4767–4771. & Chappell, J. (1997) Biochemistry 36, 8340–8348.
92. Tabor, S., & Richardson, C.C. (1995) Proc. Natl. Acad. 120. Hallis, T.M., Lei, Y., Que, N.L., & Liu, H. (1998)
Sci. U.S.A. 92, 6339–6343. Biochemistry 37, 4935–4945.
93. Chan, W.Y., Liu, Q.R., Borjigin, J., Busch, H., Rennert, 121. Peters, R.J., Flory, J.E., Jetter, R., Ravn, M.M., Lee, H.J.,
O.M., Tease, L.A., & Chan, P.K. (1989) Biochemistry 28, Coates, R.M., & Croteau, R.B. (2000) Biochemistry 39,
1033–1039. 15592–15602.
94. Eggink, G., Engel, H., Vriend, G., Terpstra, P., & Witholt, 122. Stewart, J., Wilson, D.B., & Ganem, B. (1990) J. Am.
B. (1990) J. Mol. Biol. 212, 135–142. Chem. Soc. 112, 4582–4584.
95. Wei, Y., Contreras, J.A., Sheffield, P., Osterlund, T., 123. Butt, T.R., Jonnalagadda, S., Monia, B.P., Sternberg,
Derewenda, U., Kneusel, R.E., Matern, U., Holm, C., & E.J., Marsh, J.A., Stadel, J.M., Ecker, D.J., & Crooke, S.T.
Derewenda, Z.S. (1999) Nat. Struct. Biol. 6, 340– (1989) Proc. Natl. Acad. Sci. U.S.A. 86, 2540–2544.
345. 124. van der Linden, M.P., Mottl, H., & Keck, W. (1992) Eur.
96. Xu, D., Ballou, D.P., & Massey, V. (2001) Biochemistry J. Biochem. 204, 197–202.
40, 12369–12378. 125. Hitzeman, R.A., Leung, D.W., Perry, L.J., Kohr, W.J.,
97. Hasson, M.S., Muscate, A., McLeish, M.J., Polovnikova, Levine, H.L., & Goeddel, D.V. (1983) Science 219,
L.S., Gerlt, J.A., Kenyon, G.L., Petsko, G.A., & Ringe, D. 620–625.
(1998) Biochemistry 37, 9918–9930. 126. Hinnen, A., Hicks, J.B., & Fink, G.R. (1978) Proc. Natl.
98. Andersson, I. (1996) J. Mol. Biol. 259, 160–174. Acad. Sci. U.S.A. 75, 1929–1933.
99. Keller, B., Sauer, N., & Lamb, C.J. (1988) EMBO J. 7, 127. Luckow, V.A., & Summers, M.D. (1989) Virology 170,
3625–3633. 31–39.
100. Nardelli, D., Gerber-Huber, S., Van Het Schip, F.D., 128. Medin, J.A., Hunt, L., Gathy, K., Evans, R.K., & Coleman,
Gruber, M., Ab, G., & Wahli, W. (1987) Biochemistry 26, M.S. (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 2760–2764.
6397–6402. 129. Luckow, V.A. (1991) in Recombinant DNA Technology
101. Xu, M., & Lewis, R.V. (1990) Proc. Natl. Acad. Sci. U.S.A. and Applications (Prokop, A., Bajpai, K., & Ho, C., Eds.)
87, 7120–7124. pp 97–152, McGraw-Hill, New York.
102. Ann, D.K., Smith, M.K., & Carlson, D.M. (1988) J. Biol. 130. Smith, M.C., Furman, T.C., Ingolia, T.D., & Pidgeon, C.
Chem. 263, 10887–10893. (1988) J. Biol. Chem. 263, 7211–7215.
103. Koide, T., Foster, D., Yoshitake, S., & Davie, E.W. (1986) 131. Chattopadhyay, D., Evans, D.B., Deibel, M.R., Jr.,
Biochemistry 25, 2220–2225. Vosters, A.F., Eckenrode, F.M., Einspahr, H.M., Hui,
104. Celniker, S.E., Keelan, D.J., & Lewis, E.B. (1989) Genes J.O., Tomasselli, A.G., Zurcher-Neely, H.A., Heinrikson,
Dev. 3, 1424–1436. R.L., et al. (1992) J. Biol. Chem. 267, 14227–14232.
105. La Spada, A.R., Wilson, E.M., Lubahn, D.B., Harding, 132. Hutchison, C.A., III, Phillips, S., Edgell, M.H., Gillam, S.,
A.E., & Fischbeck, K.H. (1991) Nature 352, 77–79. Jahnke, P., & Smith, M. (1978) J. Biol. Chem. 253,
106. Perutz, M.F. (1999) Trends Biochem. Sci. 24, 58–63. 6551–6560.
107. Perutz, M.F., Johnson, T., Suzuki, M., & Finch, J.T. 133. Zoller, M.J., & Smith, M. (1982) Nucleic Acids Res. 10,
(1994) Proc. Natl. Acad. Sci. U.S.A. 91, 5355–5358. 6487–6500.
108. Group, T.H.s.D.C.R. (1993) Cell 72, 971–983. 134. Alber, T., Sun, D.P., Wilson, K., Wozniak, J.A., Cook, S.P.,
109. Gomez, J., Sanchez-Martinez, D., Stiefel, V., Rigau, J., & Matthews, B.W. (1987) Nature 330, 41–46.
Puigdomenech, P., & Pages, M. (1988) Nature 334, 135. Sanger, F., Coulson, A.R., Barrell, B.G., Smith, A.J., &
262–264. Roe, B.A. (1980) J. Mol. Biol. 143, 161–178.
110. Haydock, P.V., & Dale, B.A. (1990) DNA Cell Biol. 9, 136. Wilkinson, A.J., Fersht, A.R., Blow, D.M., & Winter, G.
251–261. (1983) Biochemistry 22, 3581–3586.
References 141
137. Sugimoto, M., Esaki, N., Tanaka, H., & Soda, K. (1989) 166. Recsei, P.A., Huynh, Q.K., & Snell, E.E. (1983) Proc. Natl.
Anal. Biochem. 179, 309–311. Acad. Sci. U.S.A. 80, 973–977.
138. Kunkel, T.A., Roberts, J.D., & Zakour, R.A. (1987) 167. Kapke, G., & Davis, L. (1975) Biochemistry 14,
Methods Enzymol. 154, 367–382. 4273–4276.
139. Taylor, J.W., Ott, J., & Eckstein, F. (1985) Nucleic Acids 168. Guan, C., Cui, T., Rao, V., Liao, W., Benner, J., Lin, C.L.,
Res. 13, 8765–8785. & Comb, D. (1996) J. Biol. Chem. 271, 1732–1737.
140. Vandeyar, M.A., Weiner, M.P., Hutton, C.J., & Batt, C.A. 169. Fisher, K.J., Tollersrud, O.K., & Aronson, N.N., Jr. (1990)
(1988) Gene 65, 129–133. FEBS Lett. 269, 440–444.
141. Weiner, M.P., Costa, G.L., Schoettlin, W., Cline, J., 170. Porter, J.A., Young, K.E., & Beachy, P.A. (1996) Science
Mathur, E., & Bauer, J.C. (1994) Gene 151, 119–123. 274, 255–259.
142. Deng, W.P., & Nickoloff, J.A. (1992) Anal. Biochem. 200, 171. Swallow, D.L., & Abraham, E.P. (1958) Biochem. J. 70,
81–88. 364–373.
143. Ho, S.N., Hunt, H.D., Horton, R.M., Pullen, J.K., & Pease, 172. Geiger, T., & Clarke, S. (1987) J. Biol. Chem. 262,
L.R. (1989) Gene 77, 51–59. 785–794.
144. Landt, O., Grunert, H.P., & Hahn, U. (1990) Gene 96, 173. Haley, E.E., Corcoran, B.J., Dorer, F.E., & Buchanan,
125–128. D.L. (1966) Biochemistry 5, 3229–3235.
145. Jones, D.H., & Winistorfer, S.C. (1992) BioTechniques 174. Noguchi, S., Miyawaki, K., & Satow, Y. (1998) J. Mol.
12, 528–530, 532, 534–525. Biol. 278, 231–238.
146. Reidhaar-Olson, J.F., & Sauer, R.T. (1988) Science 241, 175. Esposito, L., Vitagliano, L., Sica, F., Sorrentino, G.,
53–57. Zagari, A., & Mazzarella, L. (2000) J. Mol. Biol. 297,
147. Climie, S., & Santi, D.V. (1990) Proc. Natl. Acad. Sci. 713–732.
U.S.A. 87, 633–637. 176. McIntire, W.E., Schey, K.L., Knapp, D.R., & Hildebrandt,
148. Noren, C.J., Anthony-Cahill, S.J., Griffith, M.C., & J.D. (1998) Biochemistry 37, 14651–14658.
Schultz, P.G. (1989) Science 244, 182–188. 177. Artigues, A., Birkett, A., & Schirch, V. (1990) J. Biol.
149. Bain, J.D., Diala, E.S., Glabe, C.G., Dix, T.A., & Chamberlin, Chem. 265, 4853–4858.
A.R. (1989) J. Am. Chem. Soc. 111, 8013–8014. 178. McFadden, P.N., & Clarke, S. (1982) Proc. Natl. Acad.
150. Judice, J.K., Gamble, T.R., Murphy, E.C., de Vos, A.M., Sci. U.S.A. 79, 2460–2464.
& Schultz, P.G. (1993) Science 261, 1578–1581. 179. Aswad, D.W. (1984) J. Biol. Chem. 259, 10714–10721.
151. Endo, Y., & Wool, I.G. (1982) J. Biol. Chem. 257, 180. Lowenson, J.D., & Clarke, S. (1992) J. Biol. Chem. 267,
9054–9060. 5985–5995.
152. Zinoni, F., Birkmann, A., Stadtman, T.C., & Beock, A. 181. Johnson, B.A., Murray, E.D., Jr., Clarke, S., Glass, D.B.,
(1986) Proc. Natl. Acad. Sci. U.S.A. 83, 4650–4654. & Aswad, D.W. (1987) J. Biol. Chem. 262, 5622–5629.
153. Ermler, U., Grabarse, W., Shima, S., Goubeaud, M., & 182. Najbauer, J., Orpiszewski, J., & Aswad, D.W. (1996)
Thauer, R.K. (1997) Science 278, 1457–1462. Biochemistry 35, 5183–5190.
154. Grabarse, W., Mahlert, F., Shima, S., Thauer, R.K., & 183. Hirata, R., Ohsumk, Y., Nakano, A., Kawasaki, H.,
Ermler, U. (2000) J. Mol. Biol. 303, 329–344. Suzuki, K., & Anraku, Y. (1990) J. Biol. Chem. 265,
155. Neurath, H. (1984) Science 224, 350–357. 6726–6733.
156. Thomas, L., Leduc, R., Thorne, B.A., Smeekens, S.P., 184. Kane, P.M., Yamashiro, C.T., Wolczyk, D.F., Neff, N.,
Steiner, D.F., & Thomas, G. (1991) Proc. Natl. Acad. Sci. Goebl, M., & Stevens, T.H. (1990) Science 250, 651–657.
U.S.A. 88, 5297–5301. 185. Davis, E.O., Sedgwick, S.G., & Colston, M.J. (1991) J.
157. Lazure, C., Seidah, N.G., Pelaprat, D., & Chretien, M. Bacteriol. 173, 5653–5662.
(1983) Can. J. Biochem. Cell Biol. 61, 501–515. 186. Perler, F.B., Comb, D.G., Jack, W.E., Moran, L.S., Qiang,
158. Nakanishi, S., Inoue, A., Kita, T., Nakamura, M., Chang, B., Kucera, R.B., Benner, J., Slatko, B.E., Nwankwo,
A.C., Cohen, S.N., & Numa, S. (1979) Nature 278, D.O., Hempstead, S.K., et al. (1992) Proc. Natl. Acad. Sci.
423–427. U.S.A. 89, 5577–5581.
159. Blobel, G., & Dobberstein, B. (1975) J. Cell Biol. 67, 187. Martin, D.D., Xu, M.Q., & Evans, T.C., Jr. (2001)
852–862. Biochemistry 40, 1393–1402.
160. Yorgey, P., Lee, J., Kordel, J., Vivas, E., Warner, P., 188. Klabunde, T., Sharma, S., Telenti, A., Jacobs, W.R., Jr.,
Jebaratnam, D., & Kolter, R. (1994) Proc. Natl. Acad. Sci. & Sacchettini, J.C. (1998) Nat. Struct. Biol. 5, 31–36.
U.S.A. 91, 4519–4523. 189. Xu, M.Q., Comb, D.G., Paulus, H., Noren, C.J., Shao, Y.,
161. Kelleher, N.L., Hendrickson, C.L., & Walsh, C.T. (1999) & Perler, F.B. (1994) EMBO J. 13, 5517–5522.
Biochemistry 38, 15623–15630. 190. Carrington, D.M., Auffret, A., & Hanke, D.E. (1985)
162. Ekstrom, J.L., Tolbert, W.D., Xiong, H., Pegg, A.E., & Nature 313, 64–67.
Ealick, S.E. (2001) Biochemistry 40, 9495–9504. 191. Xu, M.Q., & Perler, F.B. (1996) EMBO J. 15, 5146–5153.
163. Recsei, P.A., & Snell, E.E. (1973) Biochemistry 12, 192. Shao, Y., Xu, M.Q., & Paulus, H. (1996) Biochemistry 35,
365–371. 3810–3815.
164. van Poelje, P.D., & Snell, E.E. (1990) Biochemistry 29, 193. Chong, S., Shao, Y., Paulus, H., Benner, J., Perler, F.B.,
132–139. & Xu, M.Q. (1996) J. Biol. Chem. 271, 22159–22168.
165. Albert, A., Dhanaraj, V., Genschel, U., Khan, G., Ramjee, 194. Xu, M.Q., Southworth, M.W., Mersha, F.B., Hornstra,
M.K., Pulido, R., Sibanda, B.L., von Delft, F., Witty, M., L.J., & Perler, F.B. (1993) Cell 75, 1371–1377.
Blundell, T.L., Smith, A.G., & Abell, C. (1998) Nat. Struct. 195. Duan, X., Gimble, F.S., & Quiocho, F.A. (1997) Cell 89,
Biol. 5, 289–293. 555–564.
142 Sequences of Polymers
196. Ichiyanagi, K., Ishino, Y., Ariyoshi, M., Komori, K., & Gelb, M.H., & Glomset, J.A. (1991) Proc. Natl. Acad. Sci.
Morikawa, K. (2000) J. Mol. Biol. 300, 889–901. U.S.A. 88, 6196–6200.
197. Cunningham, B.A., Wang, J.L., Waxdal, M.J., & 224. Khosravi-Far, R., Lutz, R.J., Cox, A.D., Conroy, L.,
Edelman, G.M. (1975) J. Biol. Chem. 250, 1503–1512. Bourne, J.R., Sinensky, M., Balch, W.E., Buss, J.E., & Der,
198. Chrispeels, M.J., Hartl, P.M., Sturm, A., & Faye, L. (1986) C.J. (1991) Proc. Natl. Acad. Sci. U.S.A. 88, 6264–6268.
J. Biol. Chem. 261, 10021–10024. 225. Giner, J.L., & Rando, R.R. (1994) Biochemistry 33,
199. Chang, C.N., Schwartz, M., & Chang, F.N. (1976) 15116–15123.
Biochem. Biophys. Res. Commun. 73, 233–239. 226. Ferguson, M.A., Low, M.G., & Cross, G.A. (1985) J. Biol.
200. Stock, A., Clarke, S., Clarke, C., & Stock, J. (1987) FEBS Chem. 260, 14547–14555.
Lett. 220, 8–14. 227. Tse, A.G., Barclay, A.N., Watts, A., & Williams, A.F.
201. Rose, K., Simona, M.G., Savoy, L.A., Regamey, P.O., (1985) Science 230, 1003–1008.
Green, B.N., Clore, G.M., Gronenborn, A.M., & 228. Ferguson, M.A., Homans, S.W., Dwek, R.A., &
Wingfield, P.T. (1992) J. Biol. Chem. 267, 19101–19106. Rademacher, T.W. (1988) Science 239, 753–759.
202. Milligan, D.L., & Koshland, D.E., Jr. (1990) J. Biol. Chem. 229. Homans, S.W., Ferguson, M.A., Dwek, R.A.,
265, 4455–4460. Rademacher, T.W., Anand, R., & Williams, A.F. (1988)
203. Persson, B., Flinta, C., von Heijne, G., & Jornvall, H. Nature 333, 269–272.
(1985) Eur. J. Biochem. 152, 523–527. 230. Oxley, D., & Bacic, A. (1999) Proc. Natl. Acad. Sci. U.S.A.
204. Lin, T.S., & Kolattukudy, P.E. (1980) Eur. J. Biochem. 96, 14246–14251.
106, 341–351. 231. Fankhauser, C., Homans, S.W., Thomas-Oates, J.E.,
205. Doolittle, R.F. (1972) Methods Enzymol. 25, 231– McConville, M.J., Desponds, C., Conzelmann, A., &
244. Ferguson, M.A. (1993) J. Biol. Chem. 268, 26365–26374.
206. Farries, T.C., Harris, A., Auffret, A.D., & Aitken, A. (1991) 232. Field, M.S., & Menon, A.K. (1993) in Lipid Modifications
Eur. J. Biochem. 196, 679–685. of Proteins (Schlesinger, M. J., Ed.) pp 83–134, CRC
207. Hantke, K., & Braun, V. (1973) Eur. J. Biochem. 34, Press, Boca Raton, FL.
284–296. 233. Ferguson, M.A., & Williams, A.F. (1988) Annu. Rev.
208. Prutsch, A., Lohaus, C., Green, B., Meyer, H.E., & Biochem. 57, 285–320.
Lubben, M. (2000) Biochemistry 39, 6554–6563. 234. Gerold, P., Striepen, B., Reitter, B., Geyer, H., Geyer, R.,
209. Towler, D.A., Eubanks, S.R., Towery, D.S., Adams, S.P., Reinwald, E., Risse, H.J., & Schwarz, R.T. (1996) J. Mol.
& Glaser, L. (1987) J. Biol. Chem. 262, 1030–1036. Biol. 261, 181–194.
210. Carr, S.A., Biemann, K., Shoji, S., Parmelee, D.C., & 235. Guther, M.L., de Almeida, M.L., Yoshida, N., &
Titani, K. (1982) Proc. Natl. Acad. Sci. U.S.A. 79, Ferguson, M.A. (1992) J. Biol. Chem. 267, 6820–6828.
6128–6131. 236. Uy, R., & Wold, F. (1977) Science 198, 890–896.
211. Dizhoor, A.M., Ericsson, L.H., Johnson, R.S., Kumar, S., 237. Lipmann, F. (1933) Biochem. Z. 262, 3–8.
Olshevskaya, E., Zozulya, S., Neubert, T.A., Stryer, L., 238. Taborsky, G. (1974) Adv. Protein Chem. 28, 1–210.
Hurley, J.B., & Walsh, K.A. (1992) J. Biol. Chem. 267, 239. Hunter, T. (1987) Cell 50, 823–829.
16033–16036. 240. deVerdier, C. (1952) Nature 170, 804–805.
212. Paturle-Lafanechere, L., Edde, B., Denoulet, P., Van 241. Eckhart, W., Hutchinson, M.A., & Hunter, T. (1979) Cell
Dorsselaer, A., Mazarguil, H., Le Caer, J.P., Wehland, J., 18, 925–933.
& Job, D. (1991) Biochemistry 30, 10523–10528. 242. Hunter, T., & Cooper, J.A. (1985) Annu. Rev. Biochem.
213. Xie, H., & Clarke, S. (1993) J. Biol. Chem. 268, 54, 897–930.
13364–13371. 243. Chen, C.C., Bruegger, B.B., Kern, C.W., Lin, Y.C.,
214. Eipper, B.A., Perkins, S.N., Husten, E.J., Johnson, R.C., Halpern, R.M., & Smith, R.A. (1977) Biochemistry 16,
Keutmann, H.T., & Mains, R.E. (1991) J. Biol. Chem. 266, 4852–4855.
7827–7833. 244. DeLuca, M., Ebner, K.E., Hultquist, D.E., Kreil, G., Peter,
215. Stimmel, J.B., Deschenes, R.J., Volker, C., Stock, J., & J.B., Moyer, R.W., & Boyer, P.D. (1963) Biochem. Z. 338,
Clarke, S. (1990) Biochemistry 29, 9651–9659. 512–525.
216. Vorburger, K., Kitten, G.T., & Nigg, E.A. (1989) EMBO J. 245. Smith, L.S., Kern, C.W., Halpern, R.M., & Smith, R.A.
8, 4007–4013. (1976) Biochem. Biophys. Res. Commun. 71, 459–465.
217. Anderegg, R.J., Betz, R., Carr, S.A., Crabb, J.W., & 246. Pigiet, V., & Conley, R.R. (1978) J. Biol. Chem. 253,
Duntze, W. (1988) J. Biol. Chem. 263, 18236–18240. 1910–1920.
218. Yamane, H.K., Farnsworth, C.C., Xie, H.Y., Howald, W., 247. Degani, C., & Boyer, P.D. (1973) J. Biol. Chem. 248,
Fung, B.K., Clarke, S., Gelb, M.H., & Glomset, J.A. (1990) 8222–8226.
Proc. Natl. Acad. Sci. U.S.A. 87, 5868–5872. 248. Lewis, R.J., Brannigan, J.A., Muchova, K., Barak, I., &
219. Rilling, H.C., Breunger, E., Epstein, W.W., & Crain, P.F. Wilkinson, A.J. (1999) J. Mol. Biol. 294, 9–15.
(1990) Science 247, 318–320. 249. Sanders, D.A., Gillece-Castro, B.L., Stock, A.M.,
220. Ishibashi, Y., Sakagami, Y., Isogai, A., & Suzuki, A. (1984) Burlingame, A.L., & Koshland, D.E., Jr. (1989) J. Biol.
Biochemistry 23, 1399–1404. Chem. 264, 21770–21778.
221. Hancock, J.F., Cadwallader, K., & Marshall, C.J. (1991) 250. Cohen-Solal, L., Cohen-Solal, M., & Glimcher, M.J.
EMBO J. 10, 641–646. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 4327–4330.
222. Silvius, J.R., & l’Heureux, F. (1994) Biochemistry 33, 251. Huttner, W.B. (1982) Nature 299, 273–276.
3014–3022. 252. Nelsestuen, G.L., Zytkovicz, T.H., & Howard, J.B. (1974)
223. Farnsworth, C.C., Kawata, M., Yoshida, Y., Takai, Y., J. Biol. Chem. 249, 6347–6350.
References 143
253. Stenflo, J., Ferlund, P., Egan, W., & Roepstorff, P. (1974) 285. Van Der Werf, P., & Koshland, D.E., Jr. (1977) J. Biol.
Proc. Natl. Acad. Sci. U.S.A. 71, 2730–2733. Chem. 252, 2793–2795.
254. McTigue, J.J., Dhaon, M.K., Rich, D.H., & Suttie, J.W. 286. Lowenson, J.D., & Clarke, S. (1990) J. Biol. Chem. 265,
(1984) J. Biol. Chem. 259, 4272–4278. 3106–3110.
255. Welinder, B.S. (1972) Biochim. Biophys. Acta 279, 287. Farooqui, J.Z., Tuck, M., & Paik, W.K. (1985) J. Biol.
491–497. Chem. 260, 537–545.
256. Henze, M. (1907) Hoppe-Seyler’s Z. Physiol. Chem. 51, 64. 288. Swanson, R.V., & Glazer, A.N. (1990) J. Mol. Biol. 214,
257. Wolff, J., & Covelli, I. (1969) Eur. J. Biochem. 9, 371–377. 787–796.
258. Roche, J. (1952) Experientia 8, 45–84. 289. Klotz, A.V., & Glazer, A.N. (1987) J. Biol. Chem. 262,
259. Ackermann, D., & Müller, E. (1941) Hoppe-Seyler’s Z. 17350–17355.
Physiol. Chem. 269, 146–157. 290. Lhoest, J., & Colson, C. (1977) Mol. Gen. Genet. 154,
260. Hunt, S., & Breuer, S.W. (1971) Biochim. Biophys. Acta 175–180.
252, 401–404. 291. Selmer, T., Kahnt, J., Goubeaud, M., Shima, S.,
261. Kendall, E.C. (1919) J. Biol. Chem. 39, 125–147. Grabarse, W., Ermler, U., & Thauer, R.K. (2000) J. Biol.
262. Harington, C.R. (1944) Proc. R. Soc. London, B 132, Chem. 275, 3755–3760.
223–238. 292. Park, M.H., Wolff, E.C., & Folk, J.E. (1993) Biofactors 4,
263. McQuillan, M.T., & Trikojus, V.M. (1972) in 95–104.
Glycoproteins: Their Composition, Structure, and 293. Shiba, T., Mizote, H., Kaneko, T., Nakajima, T., &
Function (Gottschalk, A., Ed.) pp 926–963, Elsevier, Kakimoto, Y. (1971) Biochim. Biophys. Acta 244,
Amsterdam. 523–531.
264. Gross, J., & Pitt-Rivers, R. (1953) Biochem. J. 53, 645–650. 294. Wolff, E.C., Park, M.H., & Folk, J.E. (1990) J. Biol. Chem.
265. Roche, J., Michel, R., & Tata, J. (1953) Biochim. Biophys. 265, 4793–4799.
Acta 11, 543–547. 295. Kamiya, Y., Sakurai, A., Tamura, S., Takahashi, N.,
266. Mercken, L., Simons, M.J., Swillens, S., Massaer, M., & Tsuchiya, E., Abe, K., & Fukui, S. (1979) Agric. Biol.
Vassart, G. (1985) Nature 316, 647–651. Chem. 43, 363–369.
267. Chen, J.Y., & Bodley, J.W. (1988) J. Biol. Chem. 263, 296. Farnsworth, C.C., Gelb, M.H., & Glomset, J.A. (1990)
11692–11696. Science 247, 320–322.
268. Evans, D.A., & Lundy, K.M. (1992) J. Am. Chem. Soc. 114, 297. Gershey, E.L., Vidali, G., & Allfrey, V.G. (1968) J. Biol.
1495–1496. Chem. 243, 5018–5022.
269. Van Ness, B.G., Howard, J.B., & Bodley, J.W. (1980) J. 298. DeLange, R.J., Smith, E.L., Fambrough, D.M., & Bonner,
Biol. Chem. 255, 10710–10716. J. (1968) Proc. Natl. Acad. Sci. U.S.A. 61, 1145–1146.
270. Hofsteenge, J., Muller, D.R., de Beer, T., Loffler, A., 299. Stoffel, W., Hillen, H., Schreoder, W., & Deutzmann, R.
Richter, W.J., & Vliegenthart, J.F. (1994) Biochemistry (1983) Hoppe-Seyler’s Z. Physiol. Chem. 364, 1455–1466.
33, 13524–13530. 300. Jing, S., & Trowbridge, I.S. (1987) EMBO J. 3, 2581–2585.
271. Loffler, A., Doucey, M.A., Jansson, A.M., Muller, D.R., 301. Schmidt, M.F. (1989) Biochim. Biophys. Acta 988,
de Beer, T., Hess, D., Meldal, M., Richter, W.J., 411–426.
Vliegenthart, J.F., & Hofsteenge, J. (1996) Biochemistry 302. Schmidt, M., Schmidt, M.F., & Rott, R. (1988) J. Biol.
35, 12005–12014. Chem. 263, 18635–18639.
272. Yoshino, K., Takao, T., Suhara, M., Kitai, T., Hori, H., 303. Bach, R., Konigsberg, W.H., & Nemerson, Y. (1988)
Nomura, K., Yamaguchi, M., Shimonishi, Y., & Suzuki, Biochemistry 27, 4227–4231.
N. (1991) Biochemistry 30, 6203–6209. 304. Redeker, V., Rossier, J., & Frankfurter, A. (1998)
273. Ambler, R.P., & Rees, M.W. (1959) Nature 184, 56–57. Biochemistry 37, 14838–14844.
274. Paik, W.K., & Kim, S. (1971) Science 174, 114–119. 305. Redeker, V., Levilliers, N., Schmitter, J.M., Le Caer, J.P.,
275. Paik, W.K., & Kim, S. (1980) Protein Methylation, John Rossier, J., Adoutte, A., & Bre, M.H. (1994) Science 266,
Wiley, New York. 1688–1691.
276. Paik, W.K., & Kim, S. (1967) Biochem. Biophys. Res. 306. Gallop, P.M., Blumenfeld, O.O., & Seifter, S. (1972)
Commun. 27, 479–483. Annu. Rev. Biochem. 41, 617–672.
277. Hempel, K., Lange, H.W., & Birkofer, L. (1968) 307. Nakajima, T., & Volcani, B.E. (1970) Biochem. Biophys.
Naturwissenschaften 55, 37. Res. Commun. 39, 28–33.
278. Ghosh, S.K., Paik, W.K., & Kim, S. (1988) J. Biol. Chem. 308. Udenfriend, S. (1966) Science 152, 1335–1340.
263, 19024–19033. 309. Bornstein, P. (1974) Annu. Rev. Biochem. 43, 567–
279. Paik, W.K., & Kim, S. (1970) J. Biol. Chem. 245, 88–92. 603.
280. Baldwin, G.S., & Carnegie, P.R. (1971) Science 171, 310. Berg, R.A., & Prockop, D.J. (1973) J. Biol. Chem. 248,
579–581. 1175–1182.
281. Karn, J., Vidali, G., Boffa, L.C., & Allfrey, V.G. (1977) J. 311. Ogle, J.D., Arlinghaus, R.B., & Logan, M.A. (1962) J. Biol.
Biol. Chem. 252, 7307–7322. Chem. 237, 3667–3673.
282. Lischwe, M.A., Cook, R.G., Ahn, Y.S., Yeoman, L.C., & 312. Janes, S.M., Mu, D., Wemmer, D., Smith, A.J., Kaur, S.,
Busch, H. (1985) Biochemistry 24, 6025–6028. Maltby, D., Burlingame, A.L., & Klinman, J.P. (1990)
283. Zobel-Thropp, P., Gary, J.D., & Clarke, S. (1998) J. Biol. Science 248, 981–987.
Chem. 273, 29283–29286. 313. McMullen, B.A., Fujikawa, K., Kisiel, W., Sasagawa, T.,
284. Vijayasarathy, C., & Rao, B.S. (1987) Biochim. Biophys. Howald, W.N., Kwa, E.Y., & Weinstein, B. (1983)
Acta 923, 156–165. Biochemistry 22, 2875–2884.
144 Sequences of Polymers
314. Fernlund, P., & Stenflo, J. (1983) J. Biol. Chem. 258, 344. Sletten, K., Aakesson, I., & Alvsaker, J.O. (1971) Nat. New
12509–12512. Biol. 231, 118–119.
315. Wang, Q.P., VanDusen, W.J., Petroski, C.J., Garsky, 345. Midelfort, C.F., & Mehler, A.H. (1972) Proc. Natl. Acad.
V.M., Stern, A.M., & Friedman, P.A. (1991) J. Biol. Chem. Sci. U.S.A. 69, 1816–1819.
266, 14004–14010. 346. Robinson, A.B., Scotchler, J.W., & McKerrow, J.H. (1973)
316. Stenflo, J., Lundwall, A., & Dahlback, B. (1987) Proc. J. Am. Chem. Soc. 95, 8156–8159.
Natl. Acad. Sci. U.S.A. 84, 368–372. 347. Wickner, R.B. (1969) J. Biol. Chem. 244, 6550–6552.
317. Choinowski, T., Blodig, W., Winterhalter, K.H., & 348. Givot, I.L., Smith, T.A., & Abeles, R.H. (1969) J. Biol.
Piontek, K. (1999) J. Mol. Biol. 286, 809–827. Chem. 244, 6341–6353.
318. Blodig, W., Smith, A.T., Doyle, W.A., & Piontek, K. (2001) 349. Langer, M., Lieber, A., & Retey, J. (1994) Biochemistry
J. Mol. Biol. 305, 851–861. 33, 14034–14038.
319. Goodwill, K.E., Sabatier, C., & Stevens, R.C. (1998) 350. Niwa, H., Inouye, S., Hirano, T., Matsuno, T., Kojima,
Biochemistry 37, 13437–13445. S., Kubota, M., Ohashi, M., & Tsuji, F.I. (1996) Proc. Natl.
320. Dorsett, L.C., Hawkins, C.J., Grice, J.A., Lavin, M.F., Acad. Sci. U.S.A. 93, 13617–13622.
Merefield, P.M., Parry, D.L., & Ross, I.L. (1987) 351. Ormo, M., Cubitt, B., Kallio, K., Gross, L.A., Tsien, R.Y.,
Biochemistry 26, 8078–8082. & Remington, S.J. (1996) Science 273, 1392–1395.
321. Waite, J.H., & Tanzer, M.L. (1981) Science 212, 352. Nakajima, T., & Ballou, C.E. (1974) J. Biol. Chem. 249,
1038–1040. 7685–7694.
322. Waite, J.H. (1983) J. Biol. Chem. 258, 2911–2915. 353. Muir, L., & Lee, Y.C. (1969) J. Biol. Chem. 244,
323. Filpula, D.R., Lee, S.M., Link, R.P., Strausberg, S.L., & 2343–2349.
Strausberg, R.L. (1990) Biotechnol. Prog. 6, 171–177. 354. Hallgren, P., Lundblad, A., & Svensson, S. (1975) J. Biol.
324. Tajima, M., Iida, T., Yoshida, S., Komatsu, K., Namba, Chem. 250, 5312–5314.
R., Yanagi, M., Noguchi, M., & Okamoto, H. (1990) J. 355. Spiro, R.G. (1967) J. Biol. Chem. 242, 4813–4823.
Biol. Chem. 265, 9602–9605. 356. Lindahl, V., & Róden, L. (1972) in Glycoproteins: Their
325. Stassen, F.L. (1976) Biochim. Biophys. Acta 438, 49–60. Composition, Structure, and Function, 2nd ed.
326. Muchmore, C.R., Krahn, J.M., Kim, J.H., Zalkin, H., & (Gottschalk, A., Ed.) pp 491–517, Elsevier, Amsterdam.
Smith, J.L. (1998) Protein Sci. 7, 39–51. 357. Lindahl, U., & Róden, L. (1966) J. Biol. Chem. 241,
327. Schmidt, B., Selmer, T., Ingendoh, A., & von Figura, K. 2113–2119.
(1995) Cell 82, 271–278. 358. Lote, C.J., & Weiss, J.B. (1971) FEBS Lett. 16, 81–85.
328. Gouet, P., Jouve, H.M., & Dideberg, O. (1995) J. Mol. 359. Weiss, J.B., Lote, C.J., & Bobinski, H. (1971) Nat. New
Biol. 249, 933–954. Biol. 234, 25–26.
329. Sjoberg, B.M., & Reichard, P. (1977) J. Biol. Chem. 252, 360. Miller, D.H., Lamport, D.T., & Miller, M. (1972) Science
536–541. 176, 918–920.
330. Wagner, A.F., Frey, M., Neugebauer, F.A., Schafer, W., 361. Torres, C.R., & Hart, G.W. (1984) J. Biol. Chem. 259,
& Knappe, J. (1992) Proc. Natl. Acad. Sci. U.S.A. 89, 3308–3317.
996–1000. 362. Hart, G.W. (1997) Annu. Rev. Biochem. 66, 315–335.
331. Just, I., Sehr, P., Jung, M., van Damme, J., Puype, M., 363. Kieliszewski, M.J., O’Neill, M., Leykam, J., & Orlando, R.
Vandekerckhove, J., Moss, J., & Aktories, K. (1995) (1995) J. Biol. Chem. 270, 2541–2549.
Biochemistry 34, 326–333. 364. Hase, S., Nishimura, H., Kawabata, S., Iwanaga, S., &
332. Moss, J., & Vaughan, M. (1977) J. Biol. Chem. 252, Ikenaka, T. (1990) J. Biol. Chem. 265, 1858–1861.
2455–2457. 365. Rodriguez, I.R., & Whelan, W.J. (1985) Biochem.
333. Oppenheimer, N.J. (1978) J. Biol. Chem. 253, 4907–4910. Biophys. Res. Commun. 132, 829–836.
334. Sekine, A., Fujiwara, M., & Narumiya, S. (1989) J. Biol. 366. Klostermeyer, H., Rabbel, K., & Reimerdes, E.H. (1976)
Chem. 264, 8602–8605. Hoppe-Seyler’s Z. Physiol. Chem. 357, 1197–1199.
335. West, R.E., Jr., Moss, J., Vaughan, M., Liu, T., & Liu, T.Y. 367. Pisano, J.J., Finlayson, J.S., & Peyton, M.P. (1969)
(1985) J. Biol. Chem. 260, 14428–14430. Biochemistry 8, 871–876.
336. Demurcia, G., & Demurcia, J.M. (1994) Trends Biochem. 368. Harding, H.W., & Rogers, G.E. (1971) Biochemistry 10,
Sci. 19, 172–176. 624–630.
337. Lindahl, T., Satoh, M.S., Poirier, G.G., & Klungland, A. 369. Sottrup-Jensen, L., Petersen, T.E., & Magnusson, S.
(1995) Trends Biochem. Sci. 20, 405–411. (1980) FEBS Lett. 121, 275–279.
338. Pappenheimer, A.M., Jr. (1977) Annu. Rev. Biochem. 46, 370. Thomas, M.L., Davidson, F.F., & Tack, B.F. (1983) J. Biol.
69–94. Chem. 258, 13580–13586.
339. Oppenheimer, N.J., & Bodley, J.W. (1981) J. Biol. Chem. 371. Klabunde, T., Eicken, C., Sacchettini, J.C., & Krebs, B.
256, 8579–8581. (1998) Nat. Struct. Biol. 5, 1084–1090.
340. Shapiro, B.M., & Stadtman, E.R. (1968) J. Biol. Chem. 372. Lerch, K. (1982) J. Biol. Chem. 257, 6414–6419.
243, 3769–3771. 373. Ito, N., Phillips, S.E., Stevens, C., Ogel, Z.B., McPherson,
341. Adler, S.P., Purich, D., & Stadtman, E.R. (1975) J. Biol. M.J., Keen, J.N., Yadav, K.D., & Knowles, P.F. (1991)
Chem. 250, 6264–6272. Nature 350, 87–90.
342. Dolinger, D.L., Schramm, V.L., & Shockman, G.D. 374. Ito, N., Phillips, S.E., Yadav, K.D., & Knowles, P.F. (1994)
(1988) Proc. Natl. Acad. Sci. U.S.A. 85, 6667–6671. J. Mol. Biol. 238, 794–814.
343. Harding, H.W., & Rogers, G.E. (1976) Biochim. Biophys. 375. Yoshikawa, S., Shinzawa-Itoh, K., Nakashima, R.,
Acta 427, 315–324. Yaono, R., Yamashita, E., Inoue, N., Yao, M., Fei, M.J.,
References 145
Libeu, C.P., Mizushima, T., Yamaguchi, H., Tomizaki, 403. Akhtar, M., Blosse, P.T., & Dewhurst, P.B. (1968)
T., & Tsukihara, T. (1998) Science 280, 1723–1729. Biochem. J. 110, 693–702.
376. Proshlyakov, D.A., Pressler, M.A., DeMaso, C., Leykam, 404. Bownds, D. (1967) Nature 216, 1178–1181.
J.F., DeWitt, D.L., & Babcock, G.T. (2000) Science 290, 405. Vanaman, T.C., Wakil, S.J., & Hill, R.L. (1968) J. Biol.
1588–1591. Chem. 243, 6420–6431.
377. LaBella, F., Keeley, F., Vivian, S., & Thornhill, D. (1967) 406. Igarashi, N., Moriyama, H., Fujiwara, T., Fukumori, Y.,
Biochem. Biophys. Res. Commun. 26, 748–753. & Tanaka, N. (1997) Nat. Struct. Biol. 4, 276–284.
378. Andersen, S.O. (1966) Acta Physiol. Scand., Suppl. 263, 407. Robinson, J.B., Jr., Singh, M., & Srere, P.A. (1976) Proc.
1–81. Natl. Acad. Sci. U.S.A. 73, 1872–1876.
379. Michon, T., Chenu, M., Kellershon, N., Desmadril, M., 408. Berg, M., Hilbi, H., & Dimroth, P. (1996) Biochemistry
& Gueguen, J. (1997) Biochemistry 36, 8504–8513. 35, 4689–4696.
380. Fry, S.C. (1982) Biochem. J. 204, 449–455. 409. Hoenke, S., Wild, M.R., & Dimroth, P. (2000)
381. Kanwar, R., & Balasubramanian, D. (2000) Biochemistry Biochemistry 39, 13223–13232.
39, 14976–14983. 410. Kivirikko, K.I., & Pihlajaniemi, T. (1998) Adv. Enzymol.
382. Andersen, S.O. (1967) Nature 216, 1029–1030. Relat. Areas Mol. Biol. 72, 325–398.
383. Fujimoto, D., Horiuchi, K., & Hirama, M. (1981) 411. Esmon, C.T., Sadowski, J.A., & Suttie, J.W. (1975) J. Biol.
Biochem. Biophys. Res. Commun. 99, 637–643. Chem. 250, 4744–4748.
384. Nomura, K., Suzuki, N., & Matsumoto, S. (1990) 412. Price, P.A., Poser, J.W., & Raman, N. (1976) Proc. Natl.
Biochemistry 29, 4525–4534. Acad. Sci. U.S.A. 73, 3374–3375.
385. McIntire, W.S., Wemmer, D.E., Chistoserdov, A., & 413. Ma, Y.A., Sih, C.J., & Harms, A. (1999) J. Am. Chem. Soc.
Lidstrom, M.E. (1991) Science 252, 817–824. 121, 8967–8968.
386. Chen, L., Durley, R., Poliks, B.J., Hamada, K., Chen, Z., 414. Moss, J., Stanley, S.J., & Watkins, P.A. (1980) J. Biol.
Mathews, F.S., Davidson, V.L., Satow, Y., Huizinga, E., Chem. 255, 5838–5840.
Vellieux, F.M., et al. (1992) Biochemistry 31, 4959– 415. Yost, D.A., & Moss, J. (1983) J. Biol. Chem. 258,
4964. 4926–4929.
387. Wang, S.X., Mure, M., Medzihradszky, K.F., 416. Takada, T., Iida, K., & Moss, J. (1993) J. Biol. Chem. 268,
Burlingame, A.L., Brown, D.E., Dooley, D.M., Smith, 17837–17843.
A.J., Kagan, H.M., & Klinman, J.P. (1996) Science 273, 417. Scaife, R.M., Wilson, L., & Purich, D.L. (1992)
1078–1084. Biochemistry 31, 310–316.
388. Margoliash, E., & Schejter, A. (1966) Adv. Protein Chem. 418. Resing, K.A., Johnson, R.S., & Walsh, K.A. (1995)
21, 113–286. Biochemistry 34, 9477–9487.
389. Williams, V.P., & Glazer, A.N. (1978) J. Biol. Chem. 253, 419. Chlumsky, L.J., Sturgess, A.W., Nieves, E., & Jorns, M.S.
202–211. (1998) Biochemistry 37, 2089–2095.
390. Ficner, R., Lobeck, K., Schmidt, G., & Huber, R. (1992) 420. Michel, H., Hunt, D.F., Shabanowitz, J., & Bennett, J.
J. Mol. Biol. 228, 935–950. (1988) J. Biol. Chem. 263, 1123–1130.
391. Walker, W.H., Kenney, W.C., Edmondson, D.E., Singer, 421. Rudiger, M., Plessmann, U., Rudiger, A.H., & Weber, K.
T.P., Cronin, J.R., & Hendriks, R. (1974) Eur. J. Biochem. (1995) FEBS Lett. 364, 147–151.
48, 439–448. 422. Dreger, M., Otto, H., Neubauer, G., Mann, M., & Hucho,
392. Edmondson, D.E., & Singer, T.P. (1976) FEBS Lett. 64, F. (1999) Biochemistry 38, 9426–9434.
255–265. 423. Wu, X., Takahashi, M., Chen, S.G., & Monnier, V.M.
393. Singer, T.P., & Edmondson, D.E. (1974) FEBS Lett. 42, (2000) Biochemistry 39, 1515–1521.
1–14. 424. Zhang, X., Herring, C.J., Romano, P.R., Szczepanowska,
394. Mewies, M., Basran, J., Packman, L.C., Hille, R., & J., Brzeska, H., Hinnebusch, A.G., & Qin, J. (1998) Anal.
Scrutton, N.S. (1997) Biochemistry 36, 7162–7168. Chem. 70, 2050–2059.
395. Steenkamp, D.J., McIntire, W., & Kenney, W.C. (1978) 425. Folk, J.E., & Finlayson, J.S. (1977) Adv. Protein Chem.
J. Biol. Chem. 253, 2818–2824. 31, 1–133.
396. Willie, A., Edmondson, D.E., & Jorns, M.S. (1996) 426. Gielens, C., De Geest, N., Xin, X.Q., Devreese, B., Van
Biochemistry 35, 5292–5299. Beeumen, J., & Preaux, G. (1997) Eur. J. Biochem. 248,
397. McIntire, W., Edmondson, D.E., Singer, T.P., & Hopper, 879–888.
D.J. (1980) J. Biol. Chem. 255, 6553–6555. 427. Williamson, P.R., & Kagan, H.M. (1986) J. Biol. Chem.
398. Cunane, L.M., Chen, Z.W., Shamala, N., Mathews, F.S., 261, 9477–9482.
Cronin, C.N., & McIntire, W.S. (2000) J. Mol. Biol. 295, 428. Toth, E.A., Worby, C., Dixon, J.E., Goedken, E.R.,
357–374. Marqusee, S., & Yeates, T.O. (2000) J. Mol. Biol. 301,
399. Maloy, W.L., Bowien, B.U., Zwolinski, G.K., Kumar, 433–450.
K.G., Wood, H.G., Ericsson, L.H., & Walsh, K.A. (1979) 429. Cleland, W.W. (1964) Biochemistry 3, 480–482.
J. Biol. Chem. 254, 11615–11622. 430. Kellaris, K.V., & Ware, D.K. (1989) Biochemistry 28,
400. Hale, G., & Perham, R.N. (1980) Biochem. J. 187, 3469–3482.
905–908. 431. Xia, Z., Dai, W., Zhang, Y., White, S.A., Boyd, G.D., &
401. Piszkiewicz, D., Landon, M., & Smith, E.L. (1970) J. Biol. Mathews, F.S. (1996) J. Mol. Biol. 259, 480–501.
Chem. 245, 2622–2626. 432. Blake, C.C., Ghosh, M., Harlos, K., Avezoux, A., &
402. Tanase, S., Kojima, H., & Morino, Y. (1979) Biochemistry Anthony, C. (1994) Nat. Struct. Biol. 1, 102–105.
18, 3002–3007. 433. White, S., Boyd, G., Mathews, F.S., Xia, Z.X., Dai, W.W.,
146 Sequences of Polymers
Zhang, Y.F., & Davidson, V.L. (1993) Biochemistry 32, Burlingame, A.L., & Villafranca, J.J. (1994) Biochemistry
12955–12958. 33, 11563–11575.
434. Wang, X., Connor, M., Smith, R., Maciejewski, M.W., 461. Poerio, E., Caporale, C., Carrano, L., Pucci, P., &
Howden, M.E., Nicholson, G.M., Christie, M.J., & King, Buonocore, V. (1991) Eur. J. Biochem. 199, 595–600.
G.F. (2000) Nat. Struct. Biol. 7, 505–513. 462. Solouki, T., Emmett, M.R., Guan, S., & Marshall, A.G.
435. Chandrasekaran, R., & Balasubramanian, R. (1969) (1997) Anal. Chem. 69, 1163–1168.
Biochim. Biophys. Acta 188, 1–9. 463. Shapiro, B.M., Kingdon, H.S., & Stadtman, E.R. (1967)
436. Strater, N., Klabunde, T., Tucker, P., Witzel, H., & Krebs, Proc. Natl. Acad. Sci. U.S.A. 58, 642–649.
B. (1995) Science 268, 1489–1492. 464. Schwede, T.F., Retey, J., & Schulz, G.E. (1999)
437. Frech, C., & Schmid, F.X. (1995) J. Mol. Biol. 251, Biochemistry 38, 5355–5361.
135–149. 465. Wall, M.A., Socolich, M., & Ranganathan, R. (2000) Nat.
438. Darby, N.J., & Creighton, T.E. (1993) J. Mol. Biol. 232, Struct. Biol. 7, 1133–1138.
873–896. 466. Sardana, M., Sardana, V., Rodkey, J., Wood, T., Ng, A.,
439. Lal, M., Rao, R., Fang, X.W., Schuchmann, H.P., & Vlasuk, G.P., & Waxman, L. (1991) J. Biol. Chem. 266,
vonSonntag, C. (1997) J. Am. Chem. Soc. 119, 13560–13563.
5735–5739. 467. Imberty, A., Chanzy, H., Perez, S., Buleon, A., & Tran, V.
440. De Lorenzo, F., Goldberger, R.F., Steers, E., Jr., Givol, (1988) J. Mol. Biol. 201, 365–378.
D., & Anfinsen, B. (1966) J. Biol. Chem. 241, 1562–1567. 468. Campbell, D.G., & Cohen, P. (1989) Eur. J. Biochem. 185,
441. Akiyama, Y., Kamitani, S., Kusukawa, N., & Ito, K. (1992) 119–125.
J. Biol. Chem. 267, 22440–22445. 469. Mellis, S.J., & Baenziger, J.U. (1983) J. Biol. Chem. 258,
442. Bardwell, J.C., McGovern, K., & Beckwith, J. (1991) Cell 11546–11556.
67, 581–589. 470. Podolsky, D.K. (1985) J. Biol. Chem. 260, 15510–15515.
443. Hirano, N., Shibasaki, F., Sakai, R., Tanaka, T., Nishida, 471. Sturm, A. (1991) Eur. J. Biochem. 199, 169–179.
J., Yazaki, Y., Takenawa, T., & Hirai, H. (1995) Eur. J. 472. Mattei, B., Bernalda, M.S., Federici, L., Roepstorff, P.,
Biochem. 234, 336–342. Cervone, F., & Boffi, A. (2001) Biochemistry 40, 569–576.
444. McCarthy, A.A., Haebel, P.W., Torronen, A., Rybin, V., 473. Spiro, R.G., & Bhoyroo, V.D. (1988) J. Biol. Chem. 263,
Baker, E.N., & Metcalf, P. (2000) Nat. Struct. Biol. 7, 14351–14358.
196–199. 474. Roux, L., Holojda, S., Sundblad, G., Freeze, H.H., &
445. Zapun, A., Bardwell, J.C., & Creighton, T.E. (1993) Varki, A. (1988) J. Biol. Chem. 263, 8879–8889.
Biochemistry 32, 5083–5092. 475. Hard, K., Van Doorn, J.M., Thomas-Oates, J.E.,
446. Kortemme, T., Darby, N.J., & Creighton, T.E. (1996) Kamerling, J.P., & Van der Horst, D.J. (1993)
Biochemistry 35, 14503–14511. Biochemistry 32, 766–775.
447. Spackman, D.H., Stein, W.H., & Moore, S. (1960) J. Biol. 476. Nadano, D., Iwasaki, M., Endo, S., Kitajima, K., Inoue,
Chem. 235, 648–659. S., & Inoue, Y. (1986) J. Biol. Chem. 261, 11550–11557.
448. Haniu, M., Horan, T., Arakawa, T., Le, J., Katta, V., Hara, 477. Angata, T., Nakata, D., Matsuda, T., Kitajima, K., & Troy,
S., & Rohde, M.F. (1996) Biochemistry 35, 13040– F.A. (1999) J. Biol. Chem. 274, 22949–22956.
13046. 478. Manzi, A.E., Dell, A., Azadi, P., & Varki, A. (1990) J. Biol.
449. Hoffman, R.C., Andersen, H., Walker, K., Krakover, J.D., Chem. 265, 8094–8107.
Patel, S., Stamm, M.R., & Osborn, S.G. (1996) 479. Takeuchi, M., Inoue, N., Strickland, T.W., Kubota, M.,
Biochemistry 35, 14849–14861. Wada, M., Shimizu, R., Hoshi, S., Kozutsumi, H.,
450. McMullen, B.A., Fujikawa, K., & Davie, E.W. (1991) Takasaki, S., & Kobata, A. (1989) Proc. Natl. Acad. Sci.
Biochemistry 30, 2050–2056. U.S.A. 86, 7819–7822.
451. Burman, S., Wellner, D., Chait, B., Chaudhary, T., & 480. van Hoek, A.N., Wiener, M.C., Verbavatz, J.M., Brown,
Breslow, E. (1989) Proc. Natl. Acad. Sci. U.S.A. 86, D., Lipniunas, P.H., Townsend, R.R., & Verkman, A.S.
429–433. (1995) Biochemistry 34, 2212–2219.
452. White, C.E., Hunter, M.J., Meininger, D.P., Garrod, S., 481. Gerken, T.A., Butenhof, K.J., & Shogren, R. (1989)
& Komives, E.A. (1996) Proc. Natl. Acad. Sci. U.S.A. 93, Biochemistry 28, 5536–5543.
10177–10182. 482. Homans, S.W., Dwek, R.A., & Rademacher, T.W. (1987)
453. Gray, W.R. (1993) Protein Sci. 2, 1732–1748. Biochemistry 26, 6553–6560.
454. Kellaris, K.V., Ware, D.K., Smith, S., & Kyte, J. (1989) 483. Wu, A.M., Kabat, E.A., Nilsson, B., Zopf, D.A., Gruezo,
Biochemistry 28, 3469–3482. F.G., & Liao, J. (1984) J. Biol. Chem. 259, 7178–7186.
455. Thompson, S.A. (1992) J. Biol. Chem. 267, 2269–2273. 484. Li, E., Tabas, I., & Kornfeld, S. (1978) J. Biol. Chem. 253,
456. Eriksson, A.E., Cousens, L.S., Weaver, L.H., & Matthews, 7762–7770.
B.W. (1991) Proc. Natl. Acad. Sci. U.S.A. 88, 3441– 485. Fournet, B., Montreuil, J., Strecker, G., Dorland, L.,
3445. Haverkamp, J., Vliegenthart, F.G., Binette, J.P., &
457. Sueyoshi, T., Miyata, T., Iwanaga, S., Toyo’oka, T., & Schmid, K. (1978) Biochemistry 17, 5206–5214.
Imai, K. (1985) J. Biochem. (Tokyo) 97, 1811–1813. 486. Green, E.D., & Baenziger, J.U. (1988) J. Biol. Chem. 263,
458. Marti, T., Rosselet, S.J., Titani, K., & Walsh, K.A. (1987) 25–35.
Biochemistry 26, 8099–8109. 487. Trimble, R.B., Atkinson, P.H., Tschopp, J.F., Townsend,
459. Hojrup, P., & Magnusson, S. (1987) Biochem. J. 245, R.R., & Maley, F. (1991) J. Biol. Chem. 266, 22807–22817.
887–891. 488. Ballou, L., Hernandez, L.M., Alvarado, E., & Ballou, C.E.
460. Robertson, J.G., Adams, G.W., Medzihradszky, K.F., (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 3368–3372.
References 147
489. Van Kuik, J.A., Van Halbeek, H., Kamerling, J.P., & 514. Safaiyan, F., Lindahl, U., & Salmivirta, M. (2000)
Vliegenthart, J.F. (1986) Eur. J. Biochem. 159, 297–301. Biochemistry 39, 10823–10830.
490. Yamashita, K., Inui, K., Totani, K., Kochibe, N., 515. Nilsson, B., Nakazawa, K., Hassell, J.R., Newsome, D.A.,
Furukawa, M., & Okada, S. (1990) Biochemistry 29, & Hascall, V.C. (1983) J. Biol. Chem. 258, 6056–6063.
3030–3039. 516. Gunnarsson, A., Svensson, S., & Roden, L. (1984)
491. Shoji, H., Takahashi, N., Nomoto, H., Ishikawa, M., Carbohydr. Res. 133, 75–82.
Shimada, I., Arata, Y., & Hayashi, K. (1992) Eur. J. 517. Pluschke, G., Vanek, M., Evans, A., Dittmar, T., Schmid,
Biochem. 207, 631–641. P., Itin, P., Filardo, E.J., & Reisfeld, R.A. (1996) Proc. Natl.
492. Pfeiffer, G., Dabrowski, U., Dabrowski, J., Stirm, S., Acad. Sci. U.S.A. 93, 9710–9715.
Strube, K.H., & Geyer, R. (1992) Eur. J. Biochem. 205, 518. Campbell, S.C., Krueger, R.C., & Schwartz, N.B. (1990)
961–978. Biochemistry 29, 907–914.
493. Bendiak, B., Harris-Brandts, M., Michnick, S.W., 519. Zimmermann, D.R., & Ruoslahti, E. (1989) EMBO J. 8,
Carver, J.P., & Cumming, D.A. (1989) Biochemistry 28, 2975–2981.
6491–6499. 520. Avraham, S., Stevens, R.L., Gartner, M.C., Austen, K.F.,
494. Nakata, N., Furukawa, K., Greenwalt, D.E., Sato, T., & Lalley, P.A., & Weis, J.H. (1988) J. Biol. Chem. 263,
Kobata, A. (1993) Biochemistry 32, 4369–4383. 7292–7296.
495. Knibbs, R.N., Perini, F., & Goldstein, I.J. (1989) 521. Kornfeld, S., Li, E., & Tabas, I. (1978) J. Biol. Chem. 253,
Biochemistry 28, 6379–6392. 7771–7778.
496. Rudd, P.M., Downing, A.K., Cadene, M., Harvey, D.J., 522. Leonard, C.K., Spellman, M.W., Riddle, L., Harris, R.J.,
Wormald, M.R., Weir, I., Dwek, R.A., Rifkin, D.B., & Thomas, J.N., & Gregory, T.J. (1990) J. Biol. Chem. 265,
Gleizes, P.E. (2000) Biochemistry 39, 1596–1603. 10373–10382.
497. van Kuik, J.A., van Halbeek, H., Kamerling, J.P., & 523. Spiro, R.G. (1970) Annu. Rev. Biochem. 39, 599–638.
Vliegenthart, J.F. (1985) J. Biol. Chem. 260, 524. Tai, T., Yamashita, K., Ogata-Arakawa, M., Koide, N., &
13984–13988. Muramatsu, T. (1975) J. Biol. Chem. 250, 8569–8575.
498. Strous, G.J. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 525. Takasaki, S., Mizuochi, T., & Kobata, A. (1982) Methods
2694–2698. Enzymol. 83, 263–268.
499. Nishimura, H., Takao, T., Hase, S., Shimonishi, Y., & 526. Hardy, M.R., & Townsend, R.R. (1988) Proc. Natl. Acad.
Iwanaga, S. (1992) J. Biol. Chem. 267, 17520–17525. Sci. U.S.A. 85, 3289–3293.
500. Harris, R.J., van Halbeek, H., Glushka, J., Basa, L.J., Ling, 527. Hardy, M.R., Townsend, R.R., & Lee, Y.C. (1988) Anal.
V.T., Smith, K.J., & Spellman, M.W. (1993) Biochemistry Biochem. 170, 54–62.
32, 6539–6547. 528. Lacourse, W.R., & Johnson, D.C. (1993) Anal. Chem. 65,
501. Mellis, S.J., & Baenziger, J.U. (1983) J. Biol. Chem. 258, 50–55.
11557–11563. 529. Davidson, D.J., & Castellino, F.J. (1991) Biochemistry 30,
502. Dua, V.K., Rao, B.N., Wu, S.S., Dube, V.E., & Bush, C.A. 625–633.
(1986) J. Biol. Chem. 261, 1599–1608. 530. Zhao, Y., Kent, S.B., & Chait, B.T. (1997) Proc. Natl. Acad.
503. Thomas, D.B., & Winzler, R.J. (1969) J. Biol. Chem. 244, Sci. U.S.A. 94, 1629–1633.
5943–5946. 531. Hakomori, S. (1964) J. Biochem. (Tokyo) 55, 205–208.
504. Slomiany, B.L., Murty, V.L., & Slomiany, A. (1980) J. Biol. 532. Nomoto, H., Takahashi, N., Nagaki, Y., Endo, S., Arata,
Chem. 255, 9719–9723. Y., & Hayashi, K. (1986) Eur. J. Biochem. 157, 233–242.
505. Adamany, A.M., Blumenfeld, O.O., Sabo, B., & 533. Damm, J.B., Voshol, H., Hard, K., Kamerling, J.P., &
McCreary, J. (1983) J. Biol. Chem. 258, 11537–11545. Vliegenthart, J.F. (1989) Eur. J. Biochem. 180, 101–110.
506. Klein, A., Carnoy, C., Lamblin, G., Roussel, P., van Kuik, 534. Sturm, A., Bergwerff, A.A., & Vliegenthart, J.F. (1992)
J.A., de Waard, P., & Vliegenthart, J.F. (1991) Eur. J. Eur. J. Biochem. 204, 313–316.
Biochem. 198, 151–168. 535. Kitagawa, H., Nakada, H., Kurosaka, A., Hiraiwa, N.,
507. Capon, C., Leroy, Y., Wieruszeski, J.M., Ricart, G., Numata, Y., Fukui, S., Funakoshi, I., Kawasaki, T.,
Strecker, G., Montreuil, J., & Fournet, B. (1989) Eur. J. Yamashina, I., Shimada, I., & et al. (1989) Biochemistry
Biochem. 182, 139–152. 28, 8891–8897.
508. Hounsell, E.F., Lawson, A.M., Stoll, M.S., Kane, D.P., 536. Sasaki, H., Ochi, N., Dell, A., & Fukuda, M. (1988)
Cashmore, G.C., Carruthers, R.A., Feeney, J., & Feizi, T. Biochemistry 27, 8618–8626.
(1989) Eur. J. Biochem. 186, 597–610. 537. Nemeth, J.F., Hochgesang, G.P., Jr., Marnett, L.J.,
509. Byrd, J.C., Nardelli, J., Siddiqui, B., & Kim, Y.S. (1988) Caprioli, R.M., & Hochensang, G.P., Jr. (2001)
Cancer Res. 48, 6678–6685. Biochemistry 40, 3109–3116.
510. Bobek, L.A., Tsai, H., Biesbrock, A.R., & Levine, M.J. 538. Shriver, Z., Raman, R., Venkataraman, G., Drummond,
(1993) J. Biol. Chem. 268, 20563–20569. K., Turnbull, J., Toida, T., Linhardt, R., Biemann, K., &
511. Lan, M.S., Batra, S.K., Qi, W.N., Metzgar, R.S., & Sasisekharan, R. (2000) Proc. Natl. Acad. Sci. U.S.A. 97,
Hollingsworth, M.A. (1990) J. Biol. Chem. 265, 10359–10364.
15294–15299. 539. Aspinall, G.O., McDonald, A.G., Pang, H., Kurjanczyk,
512. Bhargava, A.K., Woitach, J.T., Davidson, E.A., & L.A., & Penner, J.L. (1994) Biochemistry 33, 241–
Bhavanandan, V.P. (1990) Proc. Natl. Acad. Sci. U.S.A. 249.
87, 6798–6802. 540. Kitagawa, H., Nakada, H., Fukui, S., Funakoshi, I.,
513. Maimone, M.M., & Tollefsen, D.M. (1990) J. Biol. Chem. Kawasaki, T., Yamashina, I., Tate, S., & Inagaki, F. (1991)
265, 18263–18271. Biochemistry 30, 2869–2876.
148 Sequences of Polymers
541. Kocharova, N.A., Knirel, Y.A., Widmalm, G., Jansson, 544. Arima, T., & Spiro, R.G. (1972) J. Biol. Chem. 247,
P.E., & Moran, A.P. (2000) Biochemistry 39, 4755– 1836–1848.
4760. 545. Drew, H.R., Wing, R.M., Takano, T., Broka, C., Tanaka,
542. Sasaki, H., Bothner, B., Dell, A., & Fukuda, M. (1987) J. S., Itakura, K., & Dickerson, R.E. (1981) Proc. Natl. Acad.
Biol. Chem. 262, 12059–12076. Sci. U.S.A. 78, 2179–2183.
543. Misaki, A., & Goldstein, I.J. (1977) J. Biol. Chem. 252,
6995–6999.
Chapter 4
Crystallographic Molecular Models
To this point, it has been described how proteins are organic chemistry, it can be concluded that the mole-
composed of long polymers of amino acids and how cules in the crystal are all covalently identical or almost
these polymers are posttranslationally modified by identical to each other. Furthermore, if a crystal exists,
processes that alter the backbone of the polypeptide or the covalently identical molecules can be present only in
the side chains of the amino acids or that add oligosac- a small number of specific three-dimensional conforma-
charides to the polypeptides. All of the specific covalent tions. In the case of proteins, all of the molecules in the
bonds connecting all of the atoms in each of the post- crystal usually have the same structure or one of a small
translationally altered polypeptides composing a partic- number of almost identical structures. It is now also
ular protein can be defined by chemical analysis. The known that the structure of a molecule of protein in a
bond lengths and fixed bond angles of the monomers, crystal is essentially identical to its only structure or one
amino acids and monosaccharides, and of the bonds of its few structures when it is free in solution. When the
coupling the monomers into polymers, amides and crystal is submitted to X-ray crystallography, that unique
acetals, are known precisely. With these values, every structure can be observed.
bond length, the hybridization of every atom, and every
fixed bond angle in each complete, posttranslationally
modified polypeptide can be assigned unambiguously. Maps of Electron Density 1
From this information a long flexible molecular model of
a particular posttranslationally modified polypeptide Suppose that one could see X-radiation. If one were to
can be constructed with high precision. pick up a crystal of a purified protein and tumble it in his
The problem with defining the complete structure hand under a beam of X-radiation of one wavelength, it
of any polymer, polypeptides included, is the rotational would glitter as does a jewel in a ray of sunlight. There
degrees of freedom about the large number of exocyclic, would be, however, a peculiarity to this glitter. A jewel
unconjugated single bonds that are present in the poly- glitters because its facets reflect the sunlight as small indi-
mer. In a finished polypeptide there are from hundreds vidual mirrors. This means that if one follows a facet care-
to tens of thousands of such single bonds. In a commer- fully as the jewel turns, one would see that it is always
cial polymer, such as polystyrene, rotation about its reflecting the sunlight and realize that the glittering sen-
many single bonds causes each molecule of the polymer, sation only arises because the eye is at rest with respect
even though it may be covalently identical to other mol- to the moving reflected beam. The glitter from a crystal of
ecules of the polymer in the sample, to assume a differ- protein, however, arises because its facets produce
ent three-dimensional structure, and if the polymer is in flashes, and these flashes occur only when a facet is
solution, the structure of each molecule usually changes aligned in one precise direction relative to the direction
constantly and randomly with time. The polypeptides in of the incident beam of X-radiation. The reason for this
a protein, however, assume only one unchanging struc- is that the flashes are produced by the summation in
ture, or a small number of interchanging structures, phase of the reflections from a stack of evenly spaced, par-
uniquely determined by the amino acid sequences of allel mirrors. This summation in phase is diffraction. It is
those polypeptides. Each molecule of the same protein only at certain angles that the reflections sum in phase.
assumes the same or one of a small number of three- If one played with the crystal of protein long
dimensional structures. This structure or these few struc- enough, it would become clear that there were axes run-
tures are exclusively assumed because almost all of the ning through it. Rotation about any one of these axes
exocyclic single bonds composing the backbones of would produce flashes that were regularly arrayed. This
the polymer and most of the exocyclic single bonds of the regular array of flashes would be reminiscent of the array
side chains of the amino acids are confined to particular of reflections that emanates from one of the rotating mir-
dihedral angles. It is crystals of proteins that have pro- rored spheres in a ballroom. One difference, however,
vided both this insight and the opportunity to observe would be that while each mirror on the sphere continu-
molecular models representing these structures. ously reflects the spotlight when it is on the illuminated
The existence of a crystal of any protein permits side, as can be discerned by following the reflected
certain conclusions to be drawn about that protein. As in beams on the walls, each of the mirrors in the crystal of
150 Crystallographic Molecular Models
protein reflects only when it passes through certain pre- The profound insight into this curious phenome-
cise orientations, as could also be discerned by watching non was the realization that the remarkable variations in
the patterns of the flashes on the walls. In addition, the the intensities of the flashes (Figure 4–1B) contained
mirrors in the rotating crystal, referred to as the reflect- information and that, from the information they con-
ing faces, reflect onto the walls behind the crystal as well tained, the atomic structure of the molecules from which
as in front of the crystal because the crystal is not opaque the crystal was formed could be deduced. With the prom-
to X-radiation, as is the ballroom sphere to light, and ise that this is the reward, one can now ask, what are
both sides of each mirror can reflect. these mirrors, why do they flash, and why does each one
The easiest way to verify this behavior is photo- flash with a different intensity?
graphically. A crystal is mounted on the end of a rotating A crystal of protein is a solution to a warehousing
shaft the axis of which is coincident with the axis of a problem. It is a solid object formed from a huge number
cylinder of photographic film. The crystal is attached to of the same protein molecules, neatly stacked as the
the shaft in an orientation such that one of its principal boxes or barrels in a warehouse, with the vacancies
axes is parallel to the axis around which the shaft is rotat- between the molecules of the protein filled with water. It
ing. The cylinder of film has a slot through which a beam is, for all intents and purposes, an infinite, three-dimen-
of X-radiation perpendicular to the axis of rotation can sional array of identical enantiomeric objects. It can be
be directed upon the crystal (Figure 4–1A).2 After an shown that there are only 71 ways to arrange enan-
appropriate exposure, the film is developed. The image tiomeric objects to form an infinite array. Each crystal
observed is that of reflected flashes arrayed on lines of represents a particular one of these 71 solutions.
latitude (Figure 4–1B). Each line of latitude, referred to as Each of these 71 different arrangements can be
a layer line, arises from all of the mirrors that are tilted at divided in its entirety into a stack of boxes, each of which
the same angle with respect to the beam of incident X- is identical in its size, shape, contents, and the arrange-
radiation. Because the spots on the film produced by the ment of its contents to every other one. These boxes are
flashes occur along layer lines, the tilt of the mirrors rel- always parallelepipeds, and they are referred to as unit
ative to the axis of the crystal must be able to assume only cells. A unit cell is the smallest parallelepiped of matter
certain values. Because the layer lines are made up of dis- that, by only simple translational movements along the
crete spots, each the result of one flash, each mirror must three axes of the crystal, can be stacked to create and fill
reflect only when the angle between its face and the inci- completely the whole crystal. Keep in mind that each of
dent beam assumes unique values. these parallelepipeds is filled with molecules of protein,
rotating shaft
cylinder of film
crystal
slot in film
source of X-rays
Figure 4–1: (A) Schematic drawing of a camera used to take an oscillation photograph of a crystal turning about one of its crystallographic
axes. (B) A photograph from such a camera.2 The crystal was aligned such that one axis of the unit cell was perpendicular to the beam of X-
radiation, and the crystal was then rotated back and forth around this vertical axis back. Each of these oscillations covered the same excur-
sion of about 20 ∞ . The axis of rotation was aligned vertically with respect to the film as it is displayed. The white shadow in the center of the
photograph is of a beam stop used to protect the film from the majority of the X-radiation, which passes through and around the crystal. The
beam was pointed at the circular top of the beam stop. The five layer lines are labeled as if the rotation had occurred around the a axis of the
crystal. The middle layer line (0,k,l) is the equator. Reprinted with permission from ref 2. Copyright 1968 Macmillan.
Maps of Electron Density 151
Figure 4–4: Sets of parallel lines, defining sets of unit cells, that pass through a two-dimensional lattice with axes a and b. The sets of parallel
lines with indices (1,1), (2,1), and (3,1) and (–1,1), (–2,1), and (–3,1) are presented.
increases, so does the magnitude of this integer, monot- Each plane in a set of parallel planes has two faces,
onically and continuously. A given set of parallel planes and either can reflect X-radiation. The two reflections,
(Figure 4–6) is assigned three integers, h, k, and l. The one from each of the two sides of that set of reflecting
magnitude of the integer h is the number of segments planes, are a Friedel pair.2 In the indices (h,k,l) assigned
into which the planes divide the length of the fundamen- respectively to the two reflections of the Friedel pair, the
tal unit cell along its a axis. The magnitude of the signs of the integers h, k and l are opposite. For example,
integer k is the number of segments into which the planes the two reflections with indices (3,-2,4) and (-3,2,-4),
divide the length of the fundamental unit cell along the respectively, are from the opposite faces of the same set
b axis; and the magnitude of the integer l, along the c axis. of parallel planes and are a Friedel pair.
When the set of planes is parallel to one of the axes of the The reflections from the faces of the sets of reflect-
fundamental unit cell, as all of the planes in Figure 4–5 ing planes are produced by the electrons in the crystal
are to the c axis, it is assigned 0 for the respective index. and they are emitted by diffraction.
The signs of the integers assigned to each reflection Electrons scatter X-radiation, and molecules are
are determined by the relative progressions of the planes clouds of electrons confined within atomic and molecu-
along the three axes. If, as one progresses from one plane lar orbitals. The molecule or molecules of protein and the
to the next along the a axis in a positive direction, the molecules of water distributed through any unit cell in a
intersections of the successive planes with the b axis pro- crystal are clouds of electrons, and they will scatter X-
gresses also in a positive direction, as they do in Figure radiation. Electrons scatter X-radiation by being excited
4–6, then the signs of the integers h and k are the same. to vibrate by the oscillating electric field of the incident
If, however, as one progresses from one plane to the next beam and then radiating X-radiation of the same wave-
along the a axis in a positive direction, the intersections length in all directions.
of the successive planes with the b axis progress in a neg- Cut a crystal of protein across its entire width with
ative direction, as they do in Figure 4–5, then the signs of any plane parallel to the set of planes of a given index
the integers h and k are opposite each other. The same (Figure 4–5). Examine one of the two smooth, flat faces
holds for the relationship between the signs of the inte- produced by that random transection of the crystal. That
gers h and l. face contains a particular amount of electron density
Maps of Electron Density 153
from those atoms within each unit cell that were tran- in phase as is the case with any planar mirror (Figure
sected by the plane. Each unit cell defined by the set of 4–7). As a result, the scattering elements can be anywhere
planes parallel to the transection contributes exactly the in the reflecting face and the regularly arrayed, repeating
same amount of electron density to the face because it is pattern of electron density can be translated along axes
sliced at exactly the same angle and at exactly the same parallel to the plane, without affecting either the ampli-
level. All of the electron density in the entire face will tude or the phase of the reflection. It is this insensitivity
scatter X-radiation. The electron density in the face scat- of reflected electromagnetic radiation to translation that
ters the X-radiation just as the silver on the smooth, flat creates the requirement that a unit cell be only a transla-
surface of a mirror scatters light, and the face is therefore tional repeating unit. The amplitude of the reflection
a mirror for X-radiation. All of the quanta of X-radiation produced by this mirror will be proportional to the quan-
reflected by that mirror at a certain angle will be reflected tity of electron density it contains, which is equal to the
amount of electron density contributed by each unit cell
times the number of unit cells it transects.
Consider the two planes parallel to the one just
described at a distance the width of one unit cell above it
and at a distance the width of one unit cell below it. Each
of these two planes creates a reflecting face with an
c b
a
f f
Figure 4–6: Assignment of an index to a set of planes creating the
reflecting faces. The index h, k, or l relative to a given axis, a, b, or Figure 4–7: Incident electromagnetic radiation at an angle f to a
c, respectively, is the number of segments into which the respective plane of reflection emerges from the reflection in phase regardless
axis is intersected by the set of planes over the length of the funda- of the locations of the points on the plane at which reflection
mental unit cell. The index of this set of parallel planes is (4,2,3). occurs.
154 Crystallographic Molecular Models
identical orientation and an identical repeating pattern face in the first set. The phase of the diffraction from the
of electron density to the one just described. The three second set will also be different from the phase of the dif-
reflecting faces considered so far, however, are undistin- fraction from the first set of reflecting faces because the
guished members of a set of reflecting faces evenly second set is displaced a distance ds from the first.
spaced throughout the entire crystal that each contain an This process of slicing the crystal with sets of
identical repeating pattern of electron density, that are reflecting faces each displaced from the set before it by a
each the distance of a unit cell above and below their distance ds can be repeated until the entire unit cell, and
neighbors, and that together include identical transec- hence the entire crystal, has been sliced (Figure 4–8).
tions through all of the unit cells of the same index in the Each one of these different sets of reflecting faces will dif-
crystal. Each of the members in this set of faces will pro- fract at the same angle, qhkl, because they all have the
duce a reflection. If the crystal is being rotated in a beam spacing of the planes of the given index. The single
of X-radiation, when the angle of the incident beam of amplitude and single phase of the total diffracted reflec-
X-radiation assumes one of a set of particular values, qhkl, tion produced by the complete set of all of these reflect-
with respect to the set of planes that produced this set of ing faces will be the sum of the individual amplitudes and
reflecting faces, the reflections from all of the reflecting individual phases of all of the component sets. The dif-
faces in the set will add in phase to produce a burst or fracted reflection from the complete set will be observed
flash of X-radiation by diffraction (Figure 4–1B). at the angle qhkl to the incident beam of X-radiation, and
The values of qhkl at which this diffracted reflection its amplitude and phase will necessarily contain infor-
occurs is defined by Bragg’s law2 mation concerning the distribution of electron density
within the crystal. Each complete set of faces of a given
( )
nl index passing through the lattice will produce its own dif-
qhkl = sin–1 (4–1) fracted reflection. Each of the spots on the film in Figure
2dhkl
4–1 is the diffracted reflection from the complete set of
reflecting faces of a particular index.
where n is any integer, l is the wavelength of the incident The phase of the reflection from the set of faces hkl
X-radiation, and dhkl is the perpendicular distance, or is designated ahkl. The phase of the reflection is the dis-
Bragg spacing, between the reflecting faces, which is the tance between a crest of the emitted wave and a point of
width of the unit cells between the planes. Only dif- reference common to all of the emitted X-radiation. The
fracted reflections are emitted by the crystal because phase is expressed in units of wavelength so that, were its
when the incident angle of the X-radiation on a set of value 1, there would be an integral number of wave-
faces is not equal to one of these values qhkl, there are so lengths between the point of reference and the crest.
many reflections from that set of faces that are out of Because the wave is periodic, the phase is expressed as a
phase with each other that all of them cancel completely. dimensionless fraction between 0 and 1. Because the
From Equation 4–1 it follows that X-radiation is dif-
fracted by every set of reflecting faces for which the spac-
ing between the planes is larger than l/2. The distance
l/2 is the diffraction limit. Sets of faces with spacings less
than the diffraction limit do not diffract the X-radiation,
Æ
Æ
and the experimentally measurable distribution of elec- The imaginary portion of Equation 4–2 is some-
tron density in the regions of the crystal that contain the what disconcerting because the intention of the equation
fixed portion of the protein is the average over these is to calculate a real electron density. This conundrum is
small vibrational displacements. Within these limits, it is solved by noting that
featured. The distribution of featured electron density in
a crystal is a periodic function, by definition, and it is this exp (iw ) = cos w + i sin w (4–3)
periodicity that leads to the reflections.
It can be shown2 that for a crystal of protein and that, because a complete data set is a complete set of
Friedel pairs, all terms in i sin w cancel in pairs.2 As a
r (x, y, z ) = result
∞ ∞ ∞
1
∑ ∑ ∑ Fhkl exp – 2p i (hx + ky + lz – a hkl ) r (x, y, z ) =
1
∑ ∑ ∑ Fhkl cos 2 p (hx + ky + lz – ahkl )
V h=–∞ k=–∞ l=–∞ V h k l
(4–2) (4–4)
but the amplitudes of the reflections will differ (Figure vectors Fhkl,H, both amplitudes and phases, can be calcu-
4–9)12 as well as the phases. lated with Equation 4–7. Discovering the locations of the
The structure factor of a reflection from a set of heavy atoms in a given isomorph is an art, the descrip-
faces with the index hkl can be represented as a tion of which is dramatic but not germane to this discus-
vector Fhkl. The length of the vector is the amplitude of sion. Their locations are eventually determined, and
the structure factor, Fhkl, and its direction is defined by these locations are used to calculate each of the values of
the phase of the reflection, 2pahkl radians. Because the Fhkl,H.
computations are performed in complex space Unless the heavy atom chosen displays strong
(Equation 4–2), complex coordinates are chosen to rep- anomalous dispersion, at least two isomorphous crys-
resent this vector: tals, each substituted with a heavy atom in a different
way are required for a unique determination of the
Fhkl = Fhkl (cos 2 p ahkl + i sin 2 p ahkl ) = Fhkl exp (2p i ahkl ) phases. The data that are available are Fhkl,P, Fhkl,(H+P)g,
and Fhkl,Hg, where the index g refers to each of the several
(4–5)
isomorphous replacements from the crystals of which
reflections have been measured. From Equations 4–5
The real component of the vector Fhkl is Fhkl cos 2pahkl,
and 4–8, these data provide a set of simultaneous vector
and the imaginary component is Fhkl sin 2pahkl. The
equations equal in number to the number of isomor-
amplitude of the vector is
phous replacements for each structure factor. In theory,
any two of these vector equations can be solved for the
Fhkl = Fhkl (cos2 2 p ahkl + sin2 2 p ahkl ) (4–6) phase, 2pahkl,P, of structure factor Fhkl,P; in practice, as
many as are available are used.
There is a geometric solution to this set of simulta-
Equation 4–2 states that the electron density is the
neous vector equations. The amplitude of the vector Fhkl,P
Fourier transform of the structure factors. It follows that
is known from the data set, but not its phase, 2pahkl,P.
the structure factors must be the Fourier transforms of
Therefore, what is known about Fhkl,P defines a circle of
the electron density. As a result, the amplitude and phase
radius Fhkl,P with its center at point P (Figure 4–13B). Both
of a given structure factor from a crystal can be calcu-
the amplitude and the phase of a given Fhkl,H are known,
lated if the distribution of atoms in a fundamental unit
and this vector can be placed so that its head is at point P.
cell of that crystal is known
Its tail defines the position, point D, of the tail of
vector Fhkl,H+P from the isomorphous derivative in the
Fhkl = ∑ fj exp 2p i (hxj + kyj + lzj ) (4–7) vector sum (Figure 4–13A). The phase of vector Fhkl,H+P is
j
unknown but its amplitude, which is known, defines a
second circle with its center at point D. Because the vector
where fj is the scattering factor for atom j and (xj,yj,zj) is sum must balance (Equation 4–8), the two points where
its position in the unit cell. The scattering factor is deter- the two circles intersect must represent two possibilities
mined by the number of electrons in atom j and their dis- for the one actual vector sum. In theory, the correct one
tributions over their respective orbitals. Because the of the two possibilities can be determined by going
sizes of the orbitals are of the order of the wavelength of through the same steps with the data from a second iso-
the X-radiation, the numerical value of the scattering morphous replacement because the phase of Fhkl,P must
factor for a given atom is a function of the angle qhkl of the be the same in both, and only one of the two possibilities
reflection. As qhkl increases, the scattering produced by for Fhkl,P, namely, the one defined by the actual vector
the electrons around an atom decreases as a result of sum, should be the same in both.
interference. Values for scattering factors have been tab- A particularly gratifying example of this way of
ulated for all atoms and systematic values of q. choosing the correct point defining the head of
It can be seen that, since Equation 4–7 is a summa- vector Fhkl,P was the definition of the phase of structure
tion factor F9,1,–2 for a crystal of hemoglobin by use of six dif-
ferent isomorphous replacements (Figure 4–13C). All
Fhkl ,H+P = Fhkl ,H + Fhkl ,P (4–8) seven circles intersect at approximately the same point
and define the phase a 9,1,–2. This is, of course, the best
where Fhkl,H+P is the structure factor from the crystal con- example from the thousands of structure factors for
taining the heavy atom, Fhkl,P is the same structure factor hemoglobin; and, in practice, the circles almost never
from the unadorned crystal, and Fhkl,H would be the intersect in the same spot or even near the same spot.
structure factor from a crystal in which only the heavy The phase of each Fhkl,P must be estimated by taking a
atoms were present at the same locations they occupy in statistical average of all of the points of intersection.7 The
the existing isomorph. This summation can be presented uncertainty in this average value for each phase is then
geometrically (Figure 4–13A).7,13 If one knows where the used to weight the contribution of the respective struc-
heavy atoms are located in the fundamental unit cell, the ture factor to the summations of Equation 4–2 or 4–4.
160 Crystallographic Molecular Models
A C
FP
FP
FH+P
F H3
F H2
F H1
FH a
f2
FP f1
FP
FH+P P Figure 4–13: Assignment of phases by isomorphous replacement. (A) The
vector equation that must define the actual relationship between the
FH FH+P three actual vectors FP, FH, and FH+P. (B) The amplitudes FP (parent) and
FH+P (derivative) define two circles.7 The centers of these two circles, P and
D, must be at the head and tail, respectively, of vector FH. The two points
D n of intersection (j1 and j2) are possible locations for the head of vector FP
tei
Pro one in the actual vector equation (panel A). Reprinted with permission from
al ref 7. Copyright 1977 Academic Press. (C) Seven circles, the one defined by
FP and the six defined by F(Hi+P)g from six isomorphous derivatives, for the
us structure factor F9,1,–2 from a crystal of equine hemoglobin.13 The origins
r p ho of each of the six circles for the isomorphous replacements are displaced
e
mo tiv from the origin of the circle for the native protein by the respective
Iso eriva vector FHg calculated from the particular distribution of heavy metals in
d the fundamental unit cell. Three of those vectors are labeled FH1, FH2, and
FH3, respectively. Reprinted with permission from ref 13. Copyright 1961
Royal Society.
enzyme, provided useful isomorphous replacements.24 medium containing selenomethionine rather than
In the case of apoferritin, however, only two isomor- methionine. The selenium atoms end up at each position
phous replacements, made with p-mercuribenzoate and in the sequence of the protein normally occupied by
K2UO2F5, were used to determine the phases to Bragg methionine and are positioned at precise locations in the
spacings of 0.28 nm.25 unit cell by the tertiary structure of the protein.
Once the positions of two or more separate sets of There is an additional component of a crystal of
heavy metal atoms are known within the fundamental protein that is formally equivalent to a set of heavy atoms
unit cell, the reagents can be used in pairs to generate in an isomorphous derivative. This component is the fea-
additional unique isomorphous replacements. The tureless aqueous solvent that surrounds the protein. The
advantage is that because the positions in each of the fact that it should be featureless allows it to be used,
original isomorphous replacements are already avail- much as an electron-rich atom is used, to improve the
able, the positions in the combined isomorphous phases by solvent flattening.38 A map of electron density
replacement can be readily established. Isomorphous is prepared with the available estimates of the phase for
replacements were made from crystals of alcohol dehy- each structure factor gathered from isomorphous
drogenase with K2Pt(CN)4 and KAu(CN)2, and the posi- replacement and anomalous dispersion. If the map is
tions of the platinum and gold, respectively, in the clear enough that the boundary between protein and sol-
resulting fundamental unit cells were determined. In vent can be defined (Figure 4–12), all of the region of the
combination, these two anions produced a third isomor- fundamental unit cell occupied by solvent is forced to
phous replacement.26 From crystals of deoxyribonucle- have the same uniform electron density even though in
ase I, it was possible to make three isomorphous this original map it was not uniform in density (notice
replacements, one each with TbCl3, K2PtCl4, and the noise in the regions of the map occupied by solvent
Pb(NO3)2, which could then be used in the three possible in Figure 4–12). From this geometric solid of uniform
combinations to generate three additional, unique iso- electron density, a set of structure factors equivalent to
morphous replacements.8 an additional Fhkl,H for Equation 4–8 could be calculated
Today, however, phases are usually estimated by with Equation 4–7 and used as an additional constraint
taking advantage of the anomalous dispersion of the on the phases (Figure 4–13), but solvent flattening is
heavy atoms in only one isomorphous derivative.27–29 more successful if used iteratively.
The real and imaginary components of the scattering fac- The updated map of electron density with the
tors fj (Equation 4–7) for atoms such as copper,29 sele- vaguely defined features of the protein and the solvent
nium,30 holmium,31 terbium,32 tantalum,33 uranium,34 that has been purposely flattened is used in its entirety to
platinum,35 and bromine36 change with the wavelength calculate a set of phases. These calculated phases are
of the X-radiation in the vicinity of their respective used in combination with the available estimates of the
absorption edges. The changes are dramatic enough that phase from isomorphous replacement to arrive at a set of
if data sets are gathered at three or four different wave- improved phases. These improved phases and the
lengths properly chosen with respect to the absorption observed amplitudes are used to calculate a new map of
edge of the heavy atom, those data sets can be equiva- electron density. The regions of solvent in the new map
lent, in terms of the differences produced in the intensi- are defined, the electron density in these regions is again
ties of the reflections, to sets of reflections measured forced to be uniform, and the process is repeated. As the
from three or four isomorphous replacements. The iterations progress, the solvent in each new map
advantage of this procedure is that the same crystal con- becomes flatter and the protein more detailed. In
taining the heavy atoms is used for all of the measure- theory38 and in practice,39 the method can provide ade-
ments, so that the errors associated with combining data quate phases in the absence of measurements of anom-
from different crystals are avoided. alous dispersion when only one isomorphous heavy
The appropriate heavy atoms are usually incorpo- atom derivative is available. Usually, however, solvent
rated into the crystal by soaking. In the case of basic blue flattening is used to improve the phases that have been
copper protein, however, only the copper ion already gathered with multiple isomorphous replacements or by
within the native protein was used as the heavy atom, anomalous dispersion.
and data sets gathered at four different wavelengths were Because the quality of the final map of electron
sufficient to establish experimental phases to Bragg density (Figure 4–12) depends so heavily on the quality of
spacings of 0.25 nm with no isomorphous replacement the phases, the uninvolved observer can evaluate the
at all.29 It is also possible to take advantage of the anom- results only if she is informed. It is important to learn
alous dispersion of one isomorphous derivative in com- how many isomorphous replacements were made, how
bination with the normal diffraction from several others many wavelengths were used for anomalous dispersion,
to establish experimental phases.37 A common way of and which data sets were used to calculate the phases. It
introducing a heavy atom susceptible to anomalous dis- is also essential to see at least a portion of the calculated,
persion30 is to express the protein to be crystallized in a unrefined map of electron density (Figure 4–12) to get a
bacterium auxotrophic for methionine growing on a feeling for its quality.40 It must be emphasized that,
162 Crystallographic Molecular Models
unless a map of electron density is already available for a Problem 4–2: The amplitude of a particular structure
closely related protein, the calculation of the initial map factor from a crystal of protein, FP, is 22.2. The amplitude
of electron density from the phases derived from iso- of the structure factor with the same index from a crystal
morphous replacement is an unavoidable step in crystal- of the first isomorphous replacement, F1, is 24.2. The
lography, and the quality of this map can affect structure factor with the same index calculated from the
significantly the remainder of the process. The work established positions of the heavy metal ions in the unit
involved in obtaining this initial map is extensive, and cell of the first isomorphous replacement has an ampli-
many crystallographic experiments are designed specifi- tude FH1 of 5.4 and a phase of 110 ∞ . The amplitude of the
cally to avoid this work. structure factor with the same index from a crystal of the
When the map of electron density within the fun- second isomorphous replacement F2 is 21.0. The struc-
damental unit cell or from several neighboring funda- ture factor with the same index calculated from the
mental unit cells is examined, the electron density that established positions of the heavy metal ions in the unit
corresponds to the intact molecule of protein can be dis- cell of the second isomorphous replacement has an
cerned. Since a large fraction of the crystal is liquid water, amplitude FH2 of 8.9 and a phase of 65 ∞. Estimate graph-
which is featureless, the protein, which is fixed and ically the phase for the structure factor of this index from
highly featured, stands out (Figure 4–12). The compact the crystal of protein alone.
globule of electron density eventually assigned to an
individual molecule of protein usually has an overall size Problem 4–3: Pig heart citrate (si) synthase crystallizes
and shape consistent with its amino acid sequence, its from solution at pH 7.4. The crystals are tetragonal.
frictional coefficient, and other molecular parameters. The dimensions of the fundamental unit cell are
Within this globular solid, features can be seen in rela- a = b = 7.74 nm and c = 19.64 nm.41 A crystal was sub-
tively sharp detail, but only seldom at atomic resolution. mitted to diffraction with X-radiation generated from
a rotating anode of copper. The Ka emission of the
Suggested Reading copper (l = 0.154 nm) was selected for the source of
the X-radiation. On graph paper draw a view of the
Stout, G.H., & Jensen, L.H. (1989) X-ray Structure Determination, A
fundamental unit cell looking down the c axis with the set
Practical Approach, 2nd ed., Wiley, New York.
of (2,–4,0) faces intersecting it. At what angle q to the
incident beam of X-radiation will the reflection from
Problem 4–1: Below there is a generic unit cell. Make the (2,–4,0) set of faces emerge from the crystal?
several xerographic copies of it. Draw a right-handed set
of axes labeled a, b, and c next to each of your copies of
the unit cell. Draw a diagram of the (4,2,3) set of reflect- The Molecular Model
ing planes passing through the first unit cell as in Figure
4–6. Draw a diagram of the (4,–2,3) set of reflecting An irregular tube of electron density can be observed to
planes passing through the second unit cell. Draw a dia- meander through and account for the globule of featured
gram of the (4,2,–3) set of reflecting planes passing electron density assigned to the intact molecule of pro-
through the third unit cell. Label each of your diagrams tein in the map of electron density. Sections of one such
with the index number. continuous tube can be seen embedded in the flat slice
The Molecular Model 163
of electron density presented in Figure 4–12. This tube is phan. In terms of electron density, many of them are
the polypeptide of the protein (2–8) that has folded to indistinguishable, for example, valine and threonine or
assume the native structure of the molecule. It is into this aspartate, asparagine, leucine, and isoleucine, and only a
tube that a molecular model of the known covalent struc- few of them, for example, tryptophan, tyrosine, and
ture of the polypeptide must be fit. phenylalanine, are of sufficient size and peculiar enough
Once the covalent sequences of the polypeptides, shape to be identified unambiguously with the protru-
the covalent sequences and points of attachment of any sions jutting out from the continuous tube in the map of
covalently bound oligosaccharides, and the identity and electron density (Figure 4–15).* Together, however, the
points of attachment of any other posttranslational mod- sequence in which the amino acids are arranged in a
ifications have been established and even before a map given protein and their relative sizes usually provide suf-
of electron density is available, it is possible to construct ficient reassurance that the molecular model of the
a molecular model of the fully modified and glycosylated polypeptide has been fit into the map correctly.
polypeptide known to constitute a molecule of protein. An additional reassurance that the polypeptide has
Such a model would incorporate bond lengths and bond been properly fit into the map can be obtained from
angles that have been measured with high precision anomalous dispersion. If the protein has been expressed
during crystallographic studies of small molecules. These so that it contains selenomethionine instead of methio-
small molecules used as standards are molecules the nine, the electron density at the locations in the map that
covalent structures of which are identical to segments of are occupied by the selenium atoms will vary in intensity
polypeptide, the side chains of the amino acids, seg- when the wavelength of X-radiation used to produce the
ments of oligosaccharide, or the monosaccharides in the reflections is varied near the absorption edge of the sele-
oligosaccharide. As with any molecular model of such a nium. These variations in intensity can be used to locate
size and complexity, the one of a polypeptide would be a the positions at which the methionines must end up after
flexible, protean object that assumes a new shape each the molecular model has been fit properly into the tube
time rotation around one of its acyclic single bonds of electron density.45 Although the anomalous dispersion
occurs. from sulfur itself is weak, under appropriate circum-
It is this long, flexible model that must be fit, amino stances the positions of both the cysteines and the
acid by amino acid, into the map of electron density. methionines in the map of electron density of a molecule
Until recently, the process of fitting the model into the of protein expressed normally can be located by the
map was always performed visually by the crystallogra- anomalous dispersion of their sulfurs.46
pher.40,42 It is now possible,43 however, for a computer to The reassurance provided by the agreement
fit the model into the map automatically. Nevertheless, between the known amino acid sequence of the polypep-
the success of this automated process for a particular tide and the sequence of the sizes of the protrusions along
map of electron density still must be carefully evaluated the continuous tube of electron density or the positions
by the crystallographer,44 and the fit must be altered of atoms capable of anomalous dispersion is not incon-
accordingly by manual adjustments. To determine sequential. It is rarely the case that the tube of electron
whether or not the molecular model has been correctly density representing the polypeptide in the map of elec-
fit into the map of electron density, there are no auto- tron density is continuous over its entire length. Portions
mated rules that are as reliable as the judgment and of the polypeptide that are so flexible that they vibrate too
accumulated knowledge of the crystallographer. If care- widely will not contribute to the diffraction and will pro-
ful human evaluation of each fit is not performed rou- duce no structured electron density. Often segments of
tinely, there is a risk that the frequency at which incorrect the polypeptide can assume several different conforma-
crystallographic molecular models are published will tions while in the crystal. The movement among these
increase as more and more crystallographic molecular conformations within a particular molecule of the pro-
models are produced in an automated fashion. tein can be rapid or a particular conformation can be stat-
One criterion that the molecular model of the ically occupied. Regardless of the rate at which the
polypeptide has been successfully fit into the map of conformations interconvert, if at any given instant these
electron density is the correspondence between the segments from different molecules of the protein in the
sequence in which amino acids of different sizes (Figure crystal assume different conformations, this disorder will
4–14) are known to occur along the amino acid sequence prevent them from contributing to the diffraction and
of the polypeptide and the sequence in which protru- hence to the structured electron density. Occasionally
sions of different size occur at regular intervals along the
tube of electron density (Figure 4–15). The 20 different * Although the side chain of cysteine has the same number of elec-
amino acids are, in order of increasing electron density trons as those of threonine and valine and the side chain of methio-
(Figure 4–14), glycine, alanine, serine, proline, cysteine, nine the same number as those of glutamate, glutamine, and lysine
(Figure 4–14), the sulfurs in cysteine and methionine, because they
valine, threonine, aspartate, asparagine, leucine, have 10 core electrons, produce strong localized features of elec-
isoleucine, glutamate, glutamine, methionine, lysine, tron density (Figure 4–15) that permit them to be distinguished
histidine, phenylalanine, arginine, tyrosine, and trypto- from the others.
164 Crystallographic Molecular Models
A S P V T C
Ala Ser Pro Val Thr Cys
(9) (17) (24) (25) (25) (25)
N D L CG1 I
OD1 OD1 CD1
Asn Asp Leu Ile
(31) (31) (33) (33)
NE2 CE1
NZ
OE1 CD OE2 NE2 CD OE1 CE
SD
CD CD2 ND1
CG
CB CB CB CE CB
CB
CG CG CG CG
E Q M K H
Glu Gln Met Lys His
(39) (39) (41) (41) (43)
OH CZ2
CZ CH2
CZ CZ CE2
NH2 NE1
CE2 CE1 CE2 CE1
NH1 CZ3
CD2 CD1 CD2 CD1 CD1 CD2
CD
CG CG CG
NE CB CE3
CB CB CB
CG
F R Y W
Phe Arg Tyr Trp
(49) (55) (57) (69)
Figure 4–14: Silhouettes of the side chains of the amino acids. Space-filling models of the amino acids were
constructed with the program Chem 3D Plus. Each of the models, except those of the aromatic amino acids,
was then rotated to produce the largest silhouette of its side chain while the bond between the b carbon and
the a carbon was kept vertical and in the plane of the page. For each of the aromatic side chains, a view was
chosen in which the plane of the ring was in the plane of the page so that the silhouette was as large as pos-
F sible. In this way, each of the two-dimensional silhouettes represents the relative three-dimensional bulk of
Phe each side chain. To produce the silhouettes, the hydrogens were erased from the models; the a carbon, the
(49) carboxy group, and the amino group were deleted; and all of the remaining atoms were turned black. In all
of the silhouettes, except those of the aromatic side chains, the a carbon would occupy the position of the
label. In each of the silhouettes of the side chains of the aromatic amino acids, the b carbon is directly above
the label. The number of electrons in each side chain is indicated in parentheses. The standard crystallo-
graphic code for each atom in each side chain is indicated. A silhouette of phenylalanine viewed edge on is
also included.
The Molecular Model 165
A 8
Figure 4–16: Four types of secondary structure found in molecules of protein: (A) a helix,51 (B) parallel b sheet,52 (C) antiparallel b sheet,52
and (D) two types of b turn (type I and type II).53 The polyamide backbone can be traced by the pattern …N, Ca, CO, N, Ca, CO, N, Ca, CO…
and the side chains are the groups protruding (marked ®,©ß, or Ri, respectively). Side views of the b sheets are shown to the right of each over-
head view to demonstrate the pleats. Reprinted with permission from refs 51–53. Copyright 1951 National Academy of Sciences and 1981
Academic Press.
The Molecular Model 167
on the acyl oxygen of another amide of the backbone of the segments of secondary structure in three dimensions
the polypeptide. These hydrogen bonds are indicated in produces a representation of the tertiary structure of the
Figure 4–16 by dashed lines. A hydrogen bond connects folded polypeptide. The tertiary structure of a protein is
the acyl oxygen contributed to the polyamide backbone the complete conformation into which its polypeptide is
by each amino acid in a sequence coiled into an a helix folded in its native form.
with the amido nitrogen–hydrogen contributed to the An example of a crystallographic molecular model
backbone by the amino acid four positions farther along is the one constructed for the protein penicillopepsin
(Figure 4–16A). In pleated sheets of parallel b structure (Figure 4–17).56 To obtain a full understanding of this
(Figure 4–16B), the amido nitrogen–hydrogen and the molecular model, it must be viewed stereoscopically. The
acyl oxygen contributed by an amino acid in one of five panels of Figure 4–17 show drawings of the same view
the polypeptides are connected by hydrogen bonds to the of the model. In Figure 4–17A, all of the atoms in the crys-
acyl oxygen and amido nitrogen–hydrogen, respectively, tallographic molecular model are displayed in skeletal
of amino acids two positions apart from each other in the representation; in Figure 4–17B, the side chains of the
sequence of a neighboring polypeptide to form a ring amino acids have been removed to focus attention on
containing 12 atoms. In pleated sheets of antiparallel only the polyamide backbone of the polypeptide and its
b structure (Figure 4–16C), hydrogen-bonded rings of 14 hydrogen bonds, which are indicated by dashed lines; and
atoms and 10 atoms alternate along a ladderlike struc- in Figure 4–17C, only the a carbons of each amino acid
ture. The only structural element that defines the confor- are displayed, each connected to its two immediate
mation of a b turn (Figure 4–16D) is a hydrogen bond neighbors in the amino acid sequence by line segments
between the acyl oxygen of the first amino acid in the to create an a-carbon diagram. In all of the panels, the
turn and the amido nitrogen–hydrogen of the fourth and amino terminus is on the upper right at about 10 o’clock
last amino acid in the turn. and the carboxy terminus is to the back at about 8 o’clock.
Secondary structures enforce particular geometries You should follow the polypeptide [ … N, Ca, CO, N, Ca,
on the conformation of the polypeptide. The b turn CO, N, Ca, CO … ] through the whole drawing in Figure
causes the polypeptide to double back on itself, often to 4–17B. Note the a helices, b structures, and b turns.
form a hairpin the two tines of which are cross-connected Compare what you see to the drawings presented by
in antiparallel b structure. An a helix has a right-handed Pauling (Figure 4–16A–C). Note that a helices are rigid
pitch,* and the absolute stereochemistry of the L-amino tubes while b structures are sinuous and flexible. Note the
acids causes each side chain, the R groups in Figure pleats in the b sheets. Distinguish between sheets of
4–16A, to cant toward its amino terminus. The side chains b structure formed from three or more strands and rib-
protrude from the helical core at intervals of about 100 ∞. bons of b structure formed from only two strands. Now
b Structure is pleated when viewed from the side (Figure follow the polypeptide through the crystallographic
4–16B,C) owing to unavoidable steric requirements molecular model in Figure 4–17A. Note the disposition of
resulting from the angles of the covalent bonds along the the side chains along secondary structures, and try to
polypeptide. In pleated sheets of b structure, the side identify some of the amino acids.
chains of the amino acids in the sequence of each strand The tertiary structure observed in a crystallographic
alternately protrude to one side and then the other of the molecular model is often presented diagramatically
surface in which the strands of polypeptide lie. (Figure 4–17D) in a cartoon where flat arrows are used to
When the molecular model of the polypeptide has represent strands of polypeptide in b structure, with the
been fit into the tube of electron density, the final struc- head of the arrow at the carboxy terminus of the strand
ture of its conformation represents, within the accuracy to provide the direction in which the chain is oriented,
of the map of electron density, a skeleton of the actual and cylinders are used to represent a helices. The tertiary
molecule of protein. This crystallographic molecular structure of penicillopepsin, which you have explored in
model is the product of fitting atomically accurate detail in Figure 4–17A, is represented, in the same orien-
molecular models of known covalent structures into a tation, by the diagram in Figure 4–17D. Follow the
map of electron density.† The resulting arrangement of polypeptide through Figure 4–17, panels B and D, simul-
taneously.
* Put the four fingers of your right hand together, bent inward and The first three of the representations of the struc-
horizontal, and put your thumb up. As you slide your fingers ture of a protein molecule that have been presented so
around a right-handed helix in the direction in which they are
pointed, the helix rises in the direction in which your thumb is far are skeletons of the crystallographic molecular
pointed. As you slide the fingers of your left hand around a left- model. The advantage of the skeletons is that the whole
handed helix, the helix rises in the direction of the thumb. molecule can be examined simultaneously even in its
† To view a crystallographic molecular model on your own com- interior. As with all molecules, flesh resides upon the
puter, find the file of the coordinates for the model in which you are bones in the form of the electron clouds that produced
interested at http://betastaging.rcsb.org/pdb/Welcome.do and
download the file as text. Open the file with the program the map of electron density in the first place. It is possi-
SwissPdbViewer, which can be obtained free of charge from ble to construct a model of a molecule of protein from
us.expasy.org/spdbv/. space-filling units of the kind developed by Pauling and
168 Crystallographic Molecular Models
S60 S60
Q100 Q100
V90 T70 V90 T70
NTER NTER
F140 W40 Q50 F140 W40 Q50
P20 P20
N130 G120 N130 G120
Q150 L30 Q150 L30
S80 A110 S80 A110
I170 I170
F310 F310
C F190 P10 F190 P10
Q160 Q160
D300 D300
CTER CTER
T180A320 G210 T180 A320 G210
L220N290 N290
L220
I260 S240 I260 S240
Figure 4–17: Crystallographic molecular model of penicillopepsin, from the mold Penicillium anthinellum.56 In the first skeletal drawing (A),
both the peptide backbone (heavy line segments) and the side chains (light line segments) of the amino acids are displayed, and no poten-
tial hydrogen bonds are indicated. This drawing was produced with MolScript.139 In the second skeletal drawing (B), the side chains are left
out and the crystallographer has assigned subjectively the locations of hydrogen bonds (dashed lines). Every tenth amino acid is identified
and numbered to assist you in tracing the chain. Reprinted with permission from ref 56. Copyright 1983 Academic Press. In the a-carbon dia-
gram (C), the positions of the a carbons of the amino acids in the crystallographic molecular model are designated by points and the points
are joined by line segments. This a-carbon diagram often gives a clearer picture of the patterns of secondary and tertiary structure. This draw-
ing was produced with MolScript.139 In the cartoon (D), the skeletal drawing of panel B is represented diagramatically. In a space-filling rep-
resentation (E), each atom in the crystallographic molecular model is represented by a sphere with its van der Waals radius. This drawing
was produced with MolScript.139 As in the stereo image of Figure 3–9, black spheres are carbon atoms; gray, nitrogen; and white, oxygen.
Figure
D
4-17D
E
170 Crystallographic Molecular Models
41
64
8
153
globular structures are held together by flexible unstruc- incorporating one of these respective structures. The
tured segments of polypeptide, there is little doubt that crystallization and elucidation of the structures of
the one structure present in the crystal is the same as the deoxyhemoglobin and oxyhemoglobin provide an
unique structure, or is the same as one of a limited example.68 When such crystals are exposed to a ligand
number of unique structures, assumed by the protein in that binds to the protein they contain and coinciden-
free solution and therefore its native structure. First, tally elicits the change in the structure of the protein,
unlike the usual anhydrous crystals of small molecules, a the crystals will often shatter69 as the protein assumes
crystal of protein is 40–70% water.7 This water usually the new structure, which is incompatible with the
surrounds each molecule of protein almost entirely, and former crystal lattice. In addition to presenting another
the contacts between molecules of protein in the crystal observation consistent with the conclusions that the
are adventitious and not extensive.61 Consequently, the molecules of protein in the crystal retain the potentiali-
molecule of protein is still dissolved in the same aqueous ties that they assume in solution, this observation sug-
solution from which it crystallized. Second, there are gests why some crystals are not enzymatically active. If
many instances in which the same protein has been crys- expression of enzymatic activity requires that the pro-
tallized under two or more different conditions and was tein change its shape slightly and reversibly each time it
found to be incorporated into the two or more different catalyzes the reaction and that change in shape is steri-
fundamental unit cells with completely different orien- cally hindered by the lattice of the crystal, the protein
tations, yet the respective maps of electron density were would not be able to display activity.
almost indistinguishable from each other and could be Crystals of citrate (si) synthase provide an example
superposed.7 For example, the polypeptides in the two of such a situation.41 This protein can be crystallized
crystallographic molecular models of subtilisin from under different sets of conditions that yield two different
Bacillus alcalophilus produced from the two nonisomor- types of crystals containing two different conforma-
phous crystal types coincided to within less than 0.1 nm tions of the protein. From a careful examination of the
except at two short surface loops.62 T4 Lysozyme has maps of electron density for these two conformations, it
been crystallized in 25 different nonisomorphous forms became clear that each time the enzyme in free solution
and crystallographic molecular models have been pre- converts acetyl-SCoA and oxaloacetate into citrate and
pared from all of them. This molecule contains two inde- coenzyme A, it passes back and forth between these two
pendently folded globular portions connected by a conformations. Neither crystal is enzymatically active,
flexible segment of polypeptide, and the angle between but upon dissolving either, full activity is restored. The
these two portions can vary by up to 45 ∞ over the differ- conclusion drawn was that the packing of the molecules
ent crystals, but within each of the two portions the con- of protein in the crystal sterically prevented the move-
formation into which the polypeptide is folded is always ment between the two conformations necessary for
the same.63 Third, crystals of a protein usually retain its enzymatic activity, not that either crystallographic
enzymatic activity,7 albeit sometimes at a lower rate, molecular model was unrepresentative of the enzyme.
and this also indicates that the structure of the protein The most compelling argument for the identity of
has not changed during its crystallization. In fact, when the structure seen in the crystal and the structure
crystals of protein are suspended in an organic solvent assumed by the protein in solution is that the structure
that is sufficiently immiscible with water that the crystals seen in the crystal makes sense. Over the more than three
retain all their water of crystallization and their interior decades that crystallographic molecular models of high
remains a separate aqueous phase, the protein is unaf- accuracy have been available for examination, what has
fected and the crystals retain their enzymatic activity.64 been seen has consistently provided reasonable explana-
Fourth, Raman spectroscopy can be performed on solids tions for the behavior of the respective proteins in solu-
as well as liquids, and when the Raman spectrum of tion. These explanations have stimulated experiments to
ribonuclease in solution was compared to its Raman test those explanations that have usually yielded inform-
spectrum in the crystal, the two were virtually identical in ative results. Often an experiment will rule out a hypoth-
the region of the amide III vibrations, a region that would esis based on an examination of the structure, but the
be sensitive to any changes in the structure of the more informed reexamination of the structure that then
polypeptide chain that might have occurred during crys- occurs usually turns up the original error of judgment.
tallization.65 Finally, molecular models from a number of The fact that a crystallographic molecular model makes
proteins in solution have been obtained by nuclear mag- sense is an unambiguous verification that it represents
netic resonance spectroscopy, and at their level of accu- the actual structure of the molecule of protein even when
racy, they are indistinguishable from the crystallographic it is in the crystal, let alone in solution.
molecular models of the same proteins.66,67
There are proteins that have been shown to be
able to assume two stable structures in rapid equilib-
Suggested Reading
rium with each other in solution, and in some cases, Wyckoff, H.W., Tsernoglou, D., Hanson, A.W., Know, J.R., Lee, B., &
two different crystals can be made, each exclusively Richards, F.M. (1970) The three-dimensional structure of
172 Crystallographic Molecular Models
ribonuclease S: Interpretation of an electron density map at a crystallographic molecular model is sufficient to define
nominal resolution of 2 Å, J. Biol. Chem. 245, 305–328. the patterns in which secondary structures are arranged.
Brändén, C., and Jones, T.A. (1990) Between objectivity and sub- Because individual atoms, however, usually do not
jectivity, Nature, 343, 687–689. appear in the initial map of electron density, the initial
crystallographic molecular model usually does not have
Problem 4–4: The structure below was drawn from a sufficient accuracy to establish atomic details. These are
crystallographic molecular model of a particular pro- of importance in their own right as well as being essen-
tein.70 It depicts only a small portion of the entire mole- tial to understanding most of the biological functions of
cule. Trace the polypeptide backbone through the proteins. If the data set has been gathered to narrow
structure. enough Bragg spacing (0.3–0.25 nm or less), the accuracy
(A) How many lengths of polymer enter the figure? of a crystallographic molecular model can be improved
significantly by the process of refinement. The refine-
(B) Identify as many of the amino acids as you can ment of a crystallographic molecular model is the sys-
along the polymer, and write out its sequence or tematic adjustment of the positions of its atoms and the
sequences. uncertainties of those positions and the addition to the
(C) Identify the symbols for the individual atoms. model of portions of its covalent structure unobserved
Which atoms are not depicted? Why? initially as well as molecules of solutes and water so that
the amplitudes of the set of structure factors calculated
from the model reproduce the observed amplitudes of
the data set as closely as possible.
Although the fold of the polypeptide chain and the
general positions of the side chains of the individual
amino acids usually do not change significantly upon
refinement, the atomic details of both the polypeptide
and the side chains almost always change dramatically. If
it is the case that the dramatic changes occurring during
refinement actually do bring the molecular model closer
to reality, then the atomic details observed in initial,
unrefined molecular models are best ignored until the
refinement has validated their existence.
The first step in a refinement is to calculate the
amplitudes of the structure factors that the initial
molecular model itself would produce, so that these
amplitudes can be compared to the observed ampli-
tudes of the data set. Once the initial molecular model
of the polypeptide has been fit into the map of electron
density to the satisfaction of the crystallographer, the
coordinates within the fundamental unit cell of each of
its atoms other than the hydrogens can be determined
by direct measurements of the model. Often, if the ini-
tial map of electron density is of high enough quality,
some individual molecules of water and solutes can be
observed and included in the initial model and their
coordinates also measured. There are always large
regions of bulk solution in which individual molecules
of water and solutes are never delineated because these
regions are fluid in the crystal and thus unstructured.
These regions of bulk solvent are included in the initial
model as geometric solids of the appropriate uniform
electron density.
A set of theoretical structure factors can be calcu-
Refinement lated by Fourier transformation (Equation 4–7) from the
coordinates of the atoms, the shape of the geometric
The result of fitting the molecular model of a protein into solid occupied by the solvent, and the scattering func-
its map of electron density, either manually or automat- tions for each atom and for the solvent. The amplitudes
ically by computer, is an initial crystallographic molecu- of this set of calculated structure factors are referred to as
lar model. At this stage, the accuracy of the the calculated amplitudes, and the set of these ampli-
Refinement 173
tudes is designated Fc. The set of simultaneously calcu- value of the R-factor that should be used as a validation
lated phases of those structure factors is designated as of the model but the agreement of the model with inde-
ac. The amplitudes of the original experimental data set pendent chemical observations. In the case of the ferre-
or any subset thereof are referred to as the observed doxin from A. vinelandii, it was disagreements between
amplitudes and designated Fo. The set of phases esti- the earlier crystallographic molecular model and several
mated by isomorphous replacement are euphemistically direct chemical observations of the protein that
referred to as the observed phases, and the set contain- prompted a reevaluation.72 Both of these examples were
ing their values is designated as ao. All of these designa- situations in which the chains were incorrectly traced in
tions, Fc, ac, Fo, and ao, refer to three-dimensional the original maps of electron density, and this produced
matrices, each containing 5000–100,000 elements, all very large errors in the molecular model.40 Smaller errors
individually indexed as either Fhkl or ahkl. may often go undetected.
The only directly observed quantities are the Usually, the initial molecular model yields an R-
observed amplitudes Fo, and they are the only parame- factor of 0.30–0.60. This means that the amplitudes cal-
ters against which the success of the construction of any culated from the model differ on the average by 30–60%
molecular model can be judged. If the molecular model from the observed amplitudes. At first glance this seems
were an exact representation of the molecules of protein, alarming because a completely random acentric struc-
small solutes, and water within the crystal and there were ture would give an R-factor of 0.59. It is not so disturbing,
no systematic errors in the observed data set, the calcu- however, because it is obvious from direct observation
lated amplitudes Fc would be identical to the observed (Figure 4–12) that a unique structure has been defined by
amplitudes Fo. It is traditional to quantify the degree of the map of electron density. Nevertheless, such a large
this correspondence with a crystallographic R-factor: value of the R-factor indicates that the initial molecular
model does not duplicate the structure of the actual mol-
∑ ∑ ∑ Fo,hkl – Fc,hkl ecule very accurately and suggests that there is room for
h k l improvement. The improvements made in the structure
R ∫ (4–9)
∑ ∑ ∑ Fo,hkl after the initial model has been constructed are the
h k l refinements. The goal of refinement is to produce a
molecular model the calculated structure factors of
where Fo,hkl and Fc,hkl are the observed and calculated which have amplitudes as close as possible to the respec-
amplitudes, respectively, of the structure factor hkl. The tive observed amplitudes. To accomplish this goal, the
summation is performed over all available pairs of corre- positions of each of the atoms in the model are adjusted
sponding observed and calculated amplitudes or some in such a way that the R-factor decreases in magnitude.
subset of the available pairs. Once the Bragg spacings Only when it is realized that models of molecules of pro-
included in the data set are less than about 0.5 nm, so tein have 500–10,000 atoms that are not hydrogen and
that the electron density of the solvent can be properly that the movement of any one of these atoms in the
reproduced, the differences between the initial molecu- model affects the amplitudes of all of the structure fac-
lar model and the real structure usually become more tors in the set Fc is the task of refinement placed in a
significant the smaller the Bragg spacing, and the value proper perspective.
of the R-factor has a tendency to increase in magnitude The most easily understood way to perform a
as the data set is expanded to include the amplitudes of refinement proceeds by calculating difference maps of
structure factors of smaller and smaller Bragg spacing.71 electron density. When two sets of crystallographic
Therefore the minimum Bragg spacings of the reflections amplitudes are available for the same structure or for two
included in the data set must be known to assess the structures so similar that the same set of phases, ahkl, can
significance of the value of the R-factor. be used for both, a difference map of electron density,
The value of the R-factor is often presented as a Dr(x,y,z), can be calculated:
measure of the validity of a particular crystallographic
molecular model. Such claims should be ignored. An D r (x, y, z ) =
incorrect crystallographic molecular model can give a
1
reasonable R-factor.40 For example, an incorrect crystal- ∑ ∑ ∑ (Fhkl – F ¢hkl ) exp – 2p i (hx + ky + lz – a hkl )
lographic molecular model (Bragg spacing ≥ 0.2 nm)49 for V h k l
the ferredoxin from Azotobacter vinelandii had an R- (4–10)
factor of 0.24, while the later, presumably correct, crys-
tallographic molecular model (Bragg spacing ≥ 0.2 nm)72 where Fhkl and Fhkl¢ refer to the entries in the two available
had an R-factor of 0.21. An incorrect crystallographic sets of amplitudes. Equation 4–10 produces a map that
molecular model (Bragg spacing ≥ 0.3 nm)48 for the ras has positive electron density wherever r(x,y,z) is greater
protein had an R-factor of 0.29, while the later, presum- than r¢(x,y,z) and negative electron density wherever
ably correct, crystallographic molecular model (Bragg r(x,y,z) is less than r¢(x,y,z), where r(x,y,z) and r¢(x,y,z)
spacing ≥ 0.26 nm)73 had an R-factor of 0.23. It is not the are the two maps of electron density that would be cal-
174 Crystallographic Molecular Models
culated directly from the respective amplitudes and vector the elements of which are !q/!xi. Equation 4–12 is
phases. solved75 for h, and its solution defines the shifts, D xj, in
Difference maps of electron density have many each variable Dxi, required to produce a minimum value
more uses than in refinement; but, in this particular for q.
¢ are the
instance, Fhkl are the entries in the set Fo and Fhkl Suppose the variables xj are the positions of the
entries in the set Fc, and ahkl are almost always those in atoms j in the molecular model of a protein, and
the set of calculated phases. The intention of such a dif-
ference map of electron density is to indicate where the
molecular model differs from the actual molecule. Where q = ∑ ∑ ∑ whkl ( Fo,hkl – Fc,hkl ) 2 (4–14)
there is positive electron density in the map, the actual h k l
where ds,q is the ideal, standard distance between any Ep to increase dramatically. The disadvantage of using Ep
two atoms j that are rigidly connected by the covalent is that once overlaps are eliminated and covalent bonds
structure, for example, one of the ortho carbons and the are retained, the refinement is influenced by a large
para carbon of a phenyl ring, dc,q is the distance between number of noncovalent forces imposed by the theoreti-
them in the final, refined molecular model, and wq is a cal function and these may or may not be realistic. These
weight the magnitude of which is chosen on the basis of biases influence the shifts of the atoms j dictated by h.
how constrained the particular distance must be. If the Even in a rigid, anhydrous crystal of a small mole-
two atoms j the positions of which are xi and xj, respec- cule, the atoms j and functional groups retain rotational
tively, are directly attached to each other, wq is large. If and vibrational motion, which displaces them continu-
there are three or four covalent bonds between them, wq ously and rapidly from their mean positions. The vibra-
is small. By adding the second term in Equation 4–15, tional and rotational motion of the atoms j of a
bond distances and any rigid bond angles, such as those macromolecule of protein in a hydrous crystal are much
in a phenyl ring, are retained during the minimization. more dramatic. There are vibrational motions involving
The choice of which bond angles and bond dis- segments of the polypeptide as well as those of the indi-
tances to constrain is a subjective one that has a signifi- vidual atoms j, and the water surrounding the molecule of
cant effect on the final crystallographic molecular model. protein does not sterically hinder the rotational or vibra-
Phenyl, indolyl, or imidazoyl rings are obvious, but exo- tional motion of its functional groups so dramatically as
cyclic bond angles less so. It is usually unwise to con- the immediate neighbors hinder the rotational motions of
strain these bond angles in any structure other than the functional groups in an anhydrous crystal. Often the vibra-
routine polypeptide, for example, in a covalently bound tional and rotational motion that occurs within the mole-
enzymatic inhibitor.77 When accurate values for bond cule of protein in the crystal is sufficient to blur the electron
angles and bond lengths for an Fe2S2 cluster were avail- density of a side chain or a segment of the polypeptide so
able from a model compound, these were used as con- extensively that it is never present even in the refined map
straints early in the refinement of the crystallographic of electron density. Every atom j for which electron den-
molecular model of the ferredoxin from Anabaena but sity is observed, however, is subject to vibrational if not
then were removed from the process later on to incorpo- rotational motion, and the extent of the resulting dis-
rate the actual differences between the structures of the placements of each atom j differs depending on the rigid-
cluster in the protein and in the model compound.78 On ity of its bonding and the rigidity of its surroundings.
the contrary, it was concluded that the orientations of the During the refinement of the crystallographic molecular
ligands from the protein to the two irons in the nuclear model, it is possible to estimate the magnitudes of the
cluster in the crystallographic molecular model of actual displacements from its mean position experienced
ribonucleoside-diphosphate reductase from Escherichia by each atom j in the molecule within the crystal.
coli were inconsistent with spectral studies when no con- The scattering factors fj inserted into Equation 4–7
straints on those orientations were applied but that con- are affected by the displacement of each atom j from its
straining its structure to a conformation consistent with mean position. It was noted earlier that, because of inter-
the spectral observations produced as satisfactory a ference, as qhkl increases, the scattering from a given
refinement.79 A compromise must be made between atom j decreases. Vibrational motion and rotational
including enough constraints to hold the atoms j motion, because they also increase interference, also
together and in reasonable orientation and including so cause the scattering produced by the electrons around
many constraints that ideality replaces reality. an atom j to decrease. It has been shown1 that for the
Alternatively, it has been proposed80 that q can be scattering factor of atom j in a molecule within a crystal
written as
fj = f 0,j exp – 8p 2 uj 2 (sin2 q hkl ) l – 2 (4–17)
q = ∑ ∑ ∑ whkl ( Fo,hkl – Fc,hkl ) 2 + we E p (4–16)
h k l
where f0,j is the scattering factor for atom j at rest,
obtained from the usual table listing
_ scattering factors as
where Ep is a theoretically calculated value of the poten- a function of scattering angle, and uj2 is the mean square
tial energy for the molecular model and we is a weight amplitude of the displacement in all directions of atom j
given to this term. The weight we is arbitrarily adjusted to from its mean position, regardless of the reason for that
make it more or less important during the refinement. In displacement. This mean square amplitude of the dis-
this approach, covalent bonds between atoms j remain placement incorporates not only the vibrational and
because their distortion would produce a major increase rotational motion experienced by the atom j but also
in Ep. This approach has an advantage over the consider- static disorder that may occur within the crystal lattice
ation of only interatomic distances (Equation 4–15) and that consequently affects the position of the atom j
because any shift in an atom j in the molecular model when it is averaged over the whole crystal. It is custom-
causing it to overlap another atom j automatically causes ary to define a B value (temperature factor) for atom j
176 Crystallographic Molecular Models
0.6-0.20 nm
0.6 - 0.20 nm
0.8 - 0.20 nm
the interior of the molecule are confined by the sur- 0.6 - 0.25 nm
1.0 - 0.20 nm
0.45
0.6 - 0.22 nm
Æ
rounding atoms j and have low B values while those on DF
Æ
DF
Æ
DF
The flexibility of a segment of polypeptide is indicated by
Æ
0.35 DF
the set of B values for the atoms j of which it is composed.
R-factor
(designated DF in Figure 4–19), the decision is made to potential energy. It is while the kinetic energy is high that
calculate a difference map of electron density. local minima of potential energy, and hence local
Adjustments of the current molecular model are made by minima of the function q, can be passed through. It has
manual tuning, and this allows the minimization to enter been shown that in this way the R-factor can be mini-
realistically a new trajectory. After this trajectory reaches mized with much less need for manual adjustment of the
a new local minimum, more tuning is performed. This molecular model.88 Because rather high simulated tem-
strategy, however, is never followed today. peratures are used, however, unexpectedly large move-
The manual adjustments performed at various ments of segments of the model can occur,89 so the
times during this simplified strategy for refinement necessity to examine difference maps of electron density
require a significant amount of time. Whenever the and perform manual adjustments remains. Nevertheless,
refinement reaches a plateau and no further progress is with proper precautions, refinement by molecular
evident (Figure 4–19), the molecular model must be dynamics converges on the same structure as refinement
examined in detail and manually adjusted with the assis- performed entirely by least-squares minimization and
tance of difference maps of electron density before a new manual adjustment.85,86,88
trajectory can be initiated. It has been found that one The use of simulated annealing by molecular
way to avoid such time-consuming manual adjustments dynamics for the purpose of pushing the refinement out
during the refinement is to combine molecular dynamics of local minima includes coincidentally a large number
and refinement.84 of hidden constraints in the potential functions used for
In a molecular dynamics simulation, atoms j are covalent bonds and nonbonding interactions that are
positioned in space, for example, by the fitting of the necessary to cause the atoms j to move in each step.
molecular model into the initial map of electron density These potential functions were not constructed for
and perhaps an initial round of refinement. A global solutes in aqueous solution, which is what a molecule of
potential energy function Ep, incorporating the individ- protein in a crystal is. As a result, they introduce signifi-
ual potential energy functions of the covalent bonds and cant, uncontrolled biases into the final molecular
the nonbonded interactions, is calculated. The atoms j model.
are then given kinetic energies appropriate to a certain These biases are most clearly manifested in the final
temperature and allowed to move for a short interval positions of the charged side chains. The choice of a
(less than 1 fs) within this global potential energy func- charge number for a side chain has a dramatic effect on
tion according to classical laws of motion. The new posi- its location in the final crystallographic molecular model
tions in turn create a new global potential energy in which simulated annealing is used for refinement.89
function and the atoms j are allowed to move again in This, however, should not be the case for charged groups
response to the new component potential energy func- in aqueous solution of moderate ionic strength. A com-
tions, and so forth. pilation of the frequency of hydrogen bonds between
When molecular dynamics is used in crystallo- oppositely charged donors and acceptors of hydrogen
graphic refinement, the global potential energy function, bonds in crystallographic molecular models found them
Ep,i, for each step i in the usual molecular dynamics cal- to be no more frequent than hydrogen bonds between a
culation is augmented by an effective potential energy neutral donor and a neutral acceptor or between a
charged donor or acceptor and a neutral acceptor or
donor, respectively.90 This compilation was gathered
E p,f = wx ∑ ∑ ∑ whkl ( Fo,hkl – Fc,hkl ) 2 (4–19) from crystallographic molecular models refined by
h k l
manual adjustments rather than by simulated annealing.
Another compilation,91 gathered after refinement by
where wx is a weighting factor chosen so that Ep,f has the simulated annealing had become widespread, found that
same magnitude as Ep and Fc is the set of the amplitudes hydrogen bonds between oppositely charged donors and
of the structure factors calculated for the instantaneous acceptors were almost 5 times more frequent. Since the
distribution of atoms j after each step in the molecular actual frequency cannot change, this latter result sug-
dynamics calculation. This effective potential energy, gests that refinement by simulated annealing does con-
Ep,f, constrains the atoms j during the molecular dynamic sistently introduce artifacts into crystallographic
trajectory to the vicinity they occupied in the original molecular models. Because the potential functions for
molecular model, but if a high enough kinetic energy is the attraction between these oppositely charged groups
applied, the atoms j can move as much as 0.3 nm from in simulations performed by simulated annealing are
their initial positions.85,86 This is what allows the struc- unrealistically strong, it would not be surprising if this
ture to break out of local minima of the function q. procedure produced such interactions artifactually.
The process proceeds in several steps referred to as Another indication of the unreliability of assignments of
simulated annealing.87 Initially a high kinetic energy is hydrogen bonds between oppositely charged donors and
applied to the atoms j (high temperature), and then the acceptors is that their identity usually changes signifi-
kinetic energy is decreased to finish within a minimum of cantly, and often dramatically, from an earlier version to
178 Crystallographic Molecular Models
a later version of a crystallographic molecular model ducible features. Any of these features not incorporated
even though both versions were built from refined maps into the initial molecular model appear in a difference
of electron density calculated from data sets gathered map of electron density because they are fixed at certain
from the same crystal. locations in the real fundamental unit cell by their spe-
There are now at least five widely used procedures cific covalent bonds and noncovalent interactions with
for refining crystallographic molecular models. It is reas- the molecules of protein but are as yet missing in the
suring that even though the final models prepared with model.
each method differ detectably,92 when two of them are When the identity, location, and structure of each
used to refine the same model with the same data set, the of these fixed molecules or portions of the covalent struc-
two refinements usually converge to a common struc- ture of the polypeptide that were not included in the ini-
ture.93 Often two different refinement procedures are tial molecular model, because they did not appear in the
purposely used to reassure the investigators that a pecu- initial map of electron density, become sufficiently
liar aspect of the molecular model is real.77,94 unambiguous, they are incorporated into the molecular
As a refinement progresses, there is a noticeable model at that cycle of the refinement. Their inclusion
improvement in the shape and continuity of the tube of causes a significant decrease in the R-factor because they
refined electron density representing the polypep- are as real a feature of the actual crystallographic unit cell
tide.35,95 Segments of the polypeptide in the initial molec- as the individual amino acids in the polypeptide, and
ular model often move during the refinement, they contribute accordingly to Fo. For example, in the
sometimes as much as 1 nm, to assume their positions in refinement for deoxyribonuclease I (Figure 4–19), the
the final molecular model,96,97 especially in regions inclusion of the water molecules observed in difference
where the initial map of electron density was vague. maps at cycle 98 caused the molecular model to be much
Elements of secondary structure missing in the initial more realistic and permitted the refinement to produce
map of electron density can appear and elements of sec- a significantly lower minimum of the R-factor than it had
ondary structure in the initial map can occasionally dis- before they were included. This reasonable consequence
appear upon refinement, and positions assigned to suggests that the refinement is registering reality, but all
specific amino acids in the sequence of the protein of the changes taking place during the refinement are
within secondary structures can shift dramatically.97 adequate evidence that a crystallographic molecular
Locations where the published amino acid model is always provisional.
sequence is in error become obvious,98 and it is some- There are now more than 30 crystallographic
times possible visually to read the amino acid sequence molecular models of proteins that have been fit into
of a segment of the protein as yet unsequenced.99 If the maps of electron density calculated from data sets with
map of initial electron density has been calculated from minimum Bragg spacings so narrow (£0.1 nm) that the
a data set gathered to narrow enough Bragg spacing individual atoms j, and in one instance even bonding
(<0.16 nm), the electron density for individual amino electron density,103 are clearly observed in the initial
acids in the initial map can be sharp enough that they map.5,6,104–106 In these instances, few if any constraints
can be tentatively identified and their side chains incor- were required during refinement. Nevertheless, almost
porated into the original molecular model even though all of even the most recently constructed crystallographic
the sequence of the protein is unavailable.100 As the molecular models of proteins have not had the benefit of
refinement progresses, mistakes in these initial assign- such accurate maps of electron density.44 Consequently,
ments become obvious and can be corrected. regardless of how the refinement is performed, ideal
The electron density for the carbohydrate attached bond lengths and bond angles are almost always
to a glycoprotein, if it is not disordered in the crystal, enforced upon the crystallographic molecular model
becomes progressively more detailed. The electron because if they were not, the refinement could not be
density for coenzymes, which are almost always held performed at all. Therefore, if a refinement were per-
rigidly within the protein, also becomes easier to inter- formed entirely by the computer, the final molecular
pret. Posttranslational modifications, sometimes model would be confined by all of these implicit and
unexpected,101 as for example 3-(S-cysteinyl)tyrosine often unsubstantiated constraints. To verify that the
and b-hydroxytryptophan (Table 3–1), and sometimes process of refinement has not biased the final structure,
hoped for, as for example the ester intermediate in careful inspections of difference maps of electron density
the self-catalyzed pyruvylation of aspartate are always required to identify locations where the actual
1-decarboxylase (Equation 3–9),102 begin to appear in the structure of the protein deviates from these simple
difference maps of electron density. Previously unac- expectations.
counted-for molecules of water (oxygen atoms j) and This inspection is routinely done by using omit
anions and cations from the crystallization solution that maps of difference electron density (Figure 4–20). A seg-
are bound at specific locations on the surface or in the ment of amino acids, a coenzyme, or a posttranslational
interior of the molecules of protein begin to appear in the modification in the final refined crystallographic molec-
difference maps and they become sharp and repro- ular model is omitted, and the truncated model that
Refinement 179
results is used to calculate Fc,omit and ac,omit. The observed The omit maps calculated for successive segments
data set Fo and Fc,omit and ac,omit are used to calculate of the molecular model (Figure 4–20) must be examined
(Equation 4–10) a difference map of electron density. In carefully by the crystallographer to ensure that they do
this difference map the omitted segment appears as pos- actually represent that segment of the model. If the
itive electron density. This positive electron density has polypeptide has been incorrectly traced and the wrong
the advantage that its details are defined only by the
observed data set because nothing is present at this loca-
tion in the truncated molecular model. The atoms j in the
refined molecular model in this region are adjusted, if
necessary, to fit within this difference electron density
and added back to the molecular model. Then another
segment of the updated molecular model is omitted and
so forth over the whole structure. In this way, an attempt
is made to incorporate into the final molecular model the
ways in which the actual structure of the protein deviates
from the ideal structure dictated by ideal bond lengths
and bond angles and empirical functions of potential
energy used during the refinement. It should be stressed
at this point that the goal of all refinement is to produce
a crystallographic molecular model that reproduces as
accurately as possible the actual structure of the mole-
cule of protein, including all of its perversities,107 rather
than some ideal structure consistent with a set of theo-
retical potential energy functions.
There is one interesting and enlightening aspect of
the process of producing an omit map. After a segment of
the molecular model has been omitted, it is necessary to
perform additional cycles of refinement on the molecu-
lar model missing that segment before calculation of the
Fc,omit and ac,omit used to produce the omit map of differ-
ence electron density.108–110 The reason for this require-
ment is that the positions of all of the atoms j in the initial
refined model, not just the atoms j omitted themselves,
contains information about the positions of the atoms j
omitted. This information would be transmitted to the
calculated amplitudes, Fc,omit, but even more critically to
the calculated phases, ac,omit, that would be used to cal-
culate the difference map of electron density were it not
molecular model of this segment of polypeptide in its final
Figure 4–20: Omit map of electron density.138 The initial
crystallographic molecular model for the amino-terminal
domain of a variant surface glycoprotein from
Trypanosoma brucei was built from a map of electron den-
segment of the molecular model has been fit into a par- the envelopes of electron density for the bacteriochloro-
ticular segment of electron density, that error will be phylls b could be distinguished from the envelopes for
obvious in an omit map of that segment of electron den- the almost identical bacteriopheophytins by the bulge of
sity.111 The fit of the molecular model into the omit map electron density due to the magnesium ions present in
of electron density must be adjusted manually by the the former but missing from the latter. Usually, however,
crystallographer, not because a computer could not do a coenzyme is added to the model at a step in the refine-
so but because she must be convinced that the fit justi- ment when its electron density in the difference map
fies the final conformation imposed upon the molecular becomes detailed enough to insert it unambiguously, but
model. Only in this way, with properly calculated omit adjustments, often major ones,113 are made in its posi-
maps and properly adjusted conformations, can the tion and configuration as the map of electron density
ideal structure resulting from the theoretical biases of the becomes more detailed during the cycles of refinement.
automated fitting and refinement be replaced by the real The precise orientation of a coenzyme in the model is
structure dictated by the observed amplitudes. For assigned in the end with an omit map (Figure 4–22).114
example, a hydrogen bond produced solely by the con- The crystallographic molecular model for myoglo-
straints of the refinement for which there is no evidence bin (Figure 4–18) displays the characteristic, intimate
within the observed amplitudes will not appear in a association between a coenzyme and the polypeptide
properly calculated omit map and must be removed that enfolds it. In this case, the heme is embraced by the
from the crystallographic molecular model. The confor- a helices arranged to compose the entire structure, the
mation of a side chain produced by the constraints of the purposes of which are to isolate the heme from the solu-
refinement may differ significantly from the conforma- tion, to prevent two hemes from colliding, and to permit
tion observed in an omit map and must be adjusted the heme to dissolve in water in addition to providing a
accordingly. fifth ligand to the iron.
In addition to the polypeptide, the process of Often ligands that are known to be specifically
refinement adjusts the conformations of coenzymes and bound by the protein are included during the crystalliza-
oligosaccharides in the crystallographic molecular tion and are bound by the protein in the crystal.
model. Coenzymes can be either covalently bonded to Significant changes can occur in the position and orien-
the polypeptide as additional examples of posttransla- tation of a ligand during refinement. For example, the
tional modifications (Table 3–1) or enclosed within it so molecular model of methotrexate inserted into the initial
tightly that they form an integral structural component. map of electron density of dihydrofolate reductase had to
In either case, the coenzyme never leaves the protein and be adjusted significantly during refinement.115 The final
is incorporated with the protein into a crystal. At this position and orientation of the ligand are assigned in the
point, these molecules will be considered to be merely final crystallographic molecular model by fitting them
small clouds of electrons that have interesting shapes. into features of electron density in omit maps.116
Usually the existence and covalent structure of these The positions in the amino acid sequence of a gly-
coenzymes is known before the protein is crystallized. coprotein at which the oligosaccharides are attached are
The electron density contributed by coenzymes often known, so the locations of these serines, thre-
known to be associated with a protein is always clearly onines, or asparagines in the map of electron density can
featured because these molecules are enclosed within be identified as soon as the polypeptide has been fit into
the protein and precisely aligned for functional pur- it. Oligosaccharides are located on the outer surface of a
poses. The shapes of most coenzymes are unique, and protein and usually protrude into the aqueous phase sur-
they can usually be placed unambiguously into one of rounding it (Figure 4–23).83 Under these circumstances,
the envelopes of electron density unfilled by the they are fully solvated, flexible, and structureless. This
polypeptide, but the decision as to when during the absence of a fixed structure is carried into the crystal, and
refinement they are included in the molecular model the region within the fundamental unit cell occupied by
depends on the situation. If the initial, unrefined map is the oligosaccharide is often featureless. Attempts to
detailed enough and the coenzyme is large enough and assign an atomic structure to such regions are probably
of a shape peculiar enough, it can be inserted into its irrelevant to an understanding of the behavior of an
electron density at the same time the polypeptide is fit oligosaccharide in a biological situation where it will
into its map of electron density. For example, the have no defined structure anyway. Sometimes, however,
envelopes of electron density representing the four bac- the carbohydrate is surrounded sufficiently by protein to
teriochlorophylls b in the initial map of electron density assume a defined conformation and produce structured
for the photosynthetic reaction center calculated from electron density. If the initial map of electron density is
the phases estimated by isomorphous replacement were calculated from a data set gathered to narrow enough
clear enough that molecular models of bacteriochloro- Bragg spacing and the oligosaccharide is sufficiently con-
phyll b, with its characteristic queue, could be inserted fined by the structure of the protein, a molecular model
into several of them (Figure 4–21)112 even before the built from its previously determined sequence of mono-
polypeptide could be fit into its electron density. In fact, saccharides can be unambiguously fit into that initial
Refinement 181
Figure 4–22: Omit map of electron density for one of the phyco-
erythrobilins in B-phycoerythrin from Porphyridium sordidum.114
A crystallographic molecular model constructed from an initial
map of electron density (Bragg spacing ≥ 0.22 nm) was refined
against the observed amplitudes with the assistance of simulated
annealing. The phycoerythrobilin covalently bound by Cysteines
50 and 61 of the b subunit of the protein was then omitted from the
molecular model, and an omit map of electron density was calcu-
lated. The final conformation chosen for the coenzyme is posi-
tioned within the electron density.
182 Crystallographic Molecular Models
This drawing was produced with MolScript.139
bered to assist you in tracing the chain of the polypeptide.
tions in the amino acid sequence of the protein are num-
disordered in the crystal. The locations of convenient posi-
because they were either vibrating too widely or statically
density for the two amino acids preceding Cysteine 104
by a helices and random meander. There was no electron
pleated b sheets. The laminate is flanked above and below
ily positioned. The core of the structure is a laminate of two
other four mannoses of the molecular model were arbitrar-
Table 3–3) known to be attached at this asparagine. The
(GlcNAc2Man5; incomplete realization of the first entry in
mannoses of the high-mannose oligosaccharide
contain the first two N-acetylglucosamines and one of the
the crystallographic model, but it was only large enough to
density could be observed in the vicinity of Asparagine 18 in
protein (Bragg spacing ≥ 0.20 nm).83 Structured electron
atoms are from the crystallographic molecular model of the
top of the structure), with thick lines. The positions of the
oligosaccharide and the side chain of the asparagine (at the
backbone (260 aa) is drawn with thin lines; and the
deoxyribonuclease I. The skeletal drawing of the polyamide
Figure 4–23: Skeletal drawing of the glycoprotein bovine
atoms j of hydrogen are almost never observed in X-ray
crystallographic studies of proteins because, unlike
atoms j of carbon, nitrogen, oxygen, and sulfur, atoms j
of hydrogen have no inner-shell electrons. Because they
are in smaller orbitals, inner-shell electrons have the
highest electron density and produce most of the fea-
tures observed in the usual maps of electron density. In
general, if hydrogens are present in a crystallographic
molecular model, it is because the crystallographer
knows they are there even though they were not
observed. When, however, crystallographic molecular
models obtained from data sets gathered to Bragg spac-
ings of less than 0.1 nm are submitted to extensive
refinement, spherical features of positive electron den-
sity appear in difference maps of electron density at
positions that are occupied by hydrogens in the real
molecule of protein.5,6,105,106 These features arise because
the molecular model has no hydrogens but the molecule
of protein does. Of particular interest are those features
of difference electron density that can be assigned to the
hydrogens in hydrogen bonds.125,126
Another peculiar feature that becomes apparent as
104
75
the actual amino acid spends part of its time in one con-
formation and part of its time in the other so the electron
density in the map, which is averaged over the period in
234
1
220
14
human immunodeficiency virus, type 1 (naa = 99), so have features precisely resembling these side chains
that there are 30 identical positions and four gaps, all in because Fc and ac are calculated from the molecular
the shorter protein.130 It necessarily follows that the model itself, which always has ideal bond angles and
structures of these two proteins are superposable. The bond lengths for the entire polypeptide, and the mini-
three long gaps in the shorter amino acid sequence (10, mization has automatically caused Fc to be as close as
5, and 6 amino acids, respectively) can be assumed to
represent loops on the surface of the larger protein
missing from the smaller. A crystallographic molecular
model, produced by multiple isomorphous replace-
ment, was available for the larger of the two proteins,
that from Rous sarcoma virus.131 Crystals of the protein
from the human immunodeficiency virus were pro-
duced, and a data set was collected from them. The side
chains of the amino acids in the crystallographic molec-
ular model of the protein from Rous sarcoma virus were
replaced with the corresponding side chains in the
aligned amino acid sequence of the protein from the
human immunodeficiency virus. The loops correspon-
ding to the gaps in the alignments were removed from
the model to produce a preliminary molecular model
for the protein from the human immunodeficiency
virus.
This model was computationally aligned in the
fundamental unit cell defined by the data set collected
from crystals of the protein from human immuno-
deficiency virus. This preliminary model was then
submitted to refinement to produce a final structure with
an R-factor of 0.18.132 As this example illustrates, the
purpose of using molecular replacement is to avoid
the experimental difficulties of obtaining phases.
Because there are so many proteins for which crystallo-
graphic molecular models are already available
(http://betastaging.rcsb.org/pdb/Welcome.do), the like-
lihood that the protein in a new crystal is related closely
enough to one for which a model has already been made
is fairly high. Consequently, many of the newly reported
maps of electron density have been calculated by molec-
ular replacement.
Figure 4–24: Fitting of an oligosaccharide into its assigned
electron density.118 The oligosaccharide Man(a1,3)
Man(a1,6)[Man(a1,3)]Man(b1,4)GlcNAc(b1,4)GlcNAc was
inserted into the final, refined map of electron density
50
Suggested Reading
45
55
25
Oefner, C., & Suck, D. (1986) Crystallographic refinement and
15
40
structure of DNase I at 2-Å resolution, J. Mol. Biol. 192, 605–632.
5 20
1
35
30
Problem 4–5: The figure to the right136 is a stereo view of
71
the crystallographic molecular model of a protein con-
60
65
taining 71 amino acids. This drawing was produced with
MolScript.139 By examining the molecular model in
10
stereo, you will be able to ascertain almost all of the
amino acid sequence of the protein. You will not be able
to distinguish threonine from valine, glutamate from glu-
tamine, or asparagine from aspartate. Make an educated
guess for threonine and valine, and just choose at
50
random for asparagine and aspartate and glutamine and
glutamate. If you can’t make out an amino acid, put an X
45
in its position in the sequence.
25
55
40
15
(A) Write out the amino acid sequence of the protein
5 20
1
in one-letter code. Number every tenth amino
35
30
acid in your sequence to keep everything in regis-
71
65
ter.
60
(B) Which pairs of amino acids in the protein are
10
cystines? Identify each pair by the sequence posi-
tions of the two cysteines that form the cystine.
(C) What do the isolated atoms j scattered around in
the crystallographic molecular model represent?
10. Bokhoven, C., Schoone, J.C., & Bijvoet, J.M. (1951) Acta
(D) How did the crystallographer distinguish between Crystallogr. 4, 275–280.
threonine and valine, between glutamate and 11. Worthylake, D., Meadow, N.D., Roseman, S., Liao, D.I.,
glutamine, and among aspartate, asparagine, and Herzberg, O., & Remington, S.J. (1991) Proc. Natl. Acad.
valine? Sci. U.S.A. 88, 10382–10386.
12. Dickerson, R.E. (1964) in The Proteins: 2nd ed.
References (Neurath, H., Ed.) Vol. II, pp 603–778, Academic Press,
New York.
1. Stout, G.H., & Jensen, L.H. (1989) X-ray Structure 13. Cullis, A.F., Muirhead, H., Perutz, M.F., Rossmann,
Determination, A Practical Guide: 2nd ed, Wiley, New M.G., & North, A.C.T. (1961) Proc. R. Soc. London, A 265,
York. 15–38.
2. Stout, G.H., & Jensen, L.H. (1968) X-ray Structure 14. Weston, S.A., Camble, R., Colls, J., Rosenbrock, G.,
Determination; a Practical Guide, Macmillan, New Taylor, I., Egerton, M., Tucker, A.D., Tunnicliffe, A.,
York. Mistry, A., Mancia, F., de la Fortelle, E., Irwin, J.,
3. Andersson, I. (1996) J. Mol. Biol. 259, 160–174. Bricogne, G., & Pauptit, R.A. (1998) Nat. Struct. Biol. 5,
4. Wilson, K.S. (1998) Nat. Struct. Biol. 5 Suppl, 627– 213–221.
630. 15. Tilton, R.F., Jr., Kuntz, I.D., Jr., & Petsko, G.A. (1984)
5. Kuhn, P., Knapp, M., Soltis, S.M., Ganshaw, G., Thoene, Biochemistry 23, 2849–2857.
M., & Bott, R. (1998) Biochemistry 37, 13446– 16. Sugio, S., Petsko, G.A., Manning, J.M., Soda, K., & Ringe,
13452. D. (1995) Biochemistry 34, 9661–9669.
6. Dauter, Z., Wilson, K.S., Sieker, L.C., Meyer, J., & Moulis, 17. Weaver, L.H., Grutter, M.G., & Matthews, B.W. (1995) J.
J.M. (1997) Biochemistry 36, 16065–16073. Mol. Biol. 245, 54–68.
7. Matthews, B.W. (1977) in The Proteins: 3rd ed. 18. Ollis, D.L., Brick, P., Hamlin, R., Xuong, N.G., & Steitz,
(Neurath, H., & Hill, R. L., Eds.) Vol. III, pp 404–590, T.A. (1985) Nature 313, 762–766.
Academic Press, New York. 19. Liljas, A., Kannan, K.K., Bergstaen, P.C., Waara, I.,
8. Suck, D., Oefner, C., & Kabsch, W. (1984) EMBO J. 3, Fridborg, K., Strandberg, B., Carlbom, U., Jearup, L.,
2423–2430. Leovgren, S., & Petef, M. (1972) Nat. New Biol. 235,
9. Fraser, R.D.B., & MacRae, T.P. (1969) in Physical 131–137.
Principles and Techniques in Protein Chemistry Part A 20. Blake, C.C., & Evans, P.R. (1974) J. Mol. Biol. 84,
(Leach, S. J., Ed.) pp 59–100, Academic Press, New York. 585–601.
186 Crystallographic Molecular Models
21. Kissinger, C.R., Liu, B.S., Martin-Blanco, E., Kornberg, 47. Gruez, A., Pignol, D., Zeghouf, M., Coves, J., Fontecave,
T.B., & Pabo, C.O. (1990) Cell 63, 579–590. M., Ferrer, J.L., & Fontecilla-Camps, J.C. (2000) J. Mol.
22. Spurlino, J.C., Lu, G.Y., & Quiocho, F.A. (1991) J. Biol. Biol. 299, 199–212.
Chem. 266, 5202–5219. 48. de Vos, A.M., Tong, L., Milburn, M.V., Matias, P.M.,
23. Schneider, F., Lowe, J., Huber, R., Schindelin, H., Kisker, Jancarik, J., Noguchi, S., Nishimura, S., Miura, K.,
C., & Knablein, J. (1996) J. Mol. Biol. 263, 53–69. Ohtsuka, E., & Kim, S.H. (1988) Science 239, 888–
24. Leslie, A.G. (1990) J. Mol. Biol. 213, 167–186. 893.
25. Banyard, S.H., Stammers, D.K., & Harrison, P.M. (1978) 49. Ghosh, D., O’Donnell, S., Furey, W., Jr., Robbins, A.H.,
Nature 271, 282–284. & Stout, C.D. (1982) J. Mol. Biol. 158, 73–109.
26. Breandaen, C.I., Eklund, H., Nordstreom, B., Boiwe, T., 50. Eklund, H., Nordstreom, B., Zeppezauer, E.,
Seoderlund, G., Zeppezauer, E., Ohlsson, I., & Seoderlund, G., Ohlsson, I., Boiwe, T., Seoderberg, B.O.,
Akeson, A. (1973) Proc. Natl. Acad. Sci. U.S.A. 70, Tapia, O., Breandaen, C.I., & Akeson, A. (1976) J. Mol.
2439–2442. Biol. 102, 27–59.
27. Blow, D.M. (1958) Proc. R. Soc. London, A 247, 302– 51. Pauling, L., Corey, R.B., & Branson, H.R. (1951) Proc.
336. Natl. Acad. Sci. U.S.A. 37, 205–211.
28. Bijvoet, J.M. (1954) Nature 173, 888–891. 52. Pauling, L., & Corey, R.B. (1951) Proc. Natl. Acad. Sci.
29. Guss, J.M., Merritt, E.A., Phizackerley, R.P., Hedman, U.S.A. 37, 729–740.
B., Murata, M., Hodgson, K.O., & Freeman, H.C. (1988) 53. Richardson, J.S. (1981) Adv. Protein Chem. 34, 167–339.
Science 241, 806–811. 54. Perutz, M.F. (1951) Nature 167, 1053–1054.
30. Yang, W., Hendrickson, W.A., Crouch, R.J., & Satow, Y. 55. Venkatachalam, C.M. (1968) Biopolymers 6, 1425–1436.
(1990) Science 249, 1398–1405. 56. James, M.N., & Sielecki, A.R. (1983) J. Mol. Biol. 163,
31. Weis, W.I., Kahn, R., Fourme, R., Drickamer, K., & 299–361.
Hendrickson, W.A. (1991) Science 254, 1608–1615. 57. Harte, R.A., & Rupley, J.A. (1968) J. Biol. Chem. 243,
32. Kahn, R., Fourme, R., Bosshard, R., Chiadmi, M., Risler, 1663–1669.
J.L., Dideberg, O., & Wery, J.P. (1985) FEBS Lett. 179, 58. Sakon, J., Liao, H.H., Kanikula, A.M., Benning, M.M.,
133–137. Rayment, I., & Holden, H.M. (1993) Biochemistry 32,
33. Cramer, P., Bushnell, D.A., Fu, J., Gnatt, A.L., Maier- 11977–11984.
Davis, B., Thompson, N.E., Burgess, R.R., Edwards, 59. Teng, T.Y., Srajer, V., & Moffat, K. (1994) Nat. Struct.
A.M., David, P.R., & Kornberg, R.D. (2000) Science 288, Biol. 1, 701–705.
640–649. 60. Silva, M.M., Poland, B.W., Hoffman, C.R., Fromm, H.J.,
34. Shapiro, L., Fannon, A.M., Kwong, P.D., Thompson, A., & Honzatko, R.B. (1995) J. Mol. Biol. 254, 431–446.
Lehmann, M.S., Grubel, G., Legrand, J.F., Als-Nielsen, 61. Crosio, M.P., Janin, J., & Jullien, M. (1992) J. Mol. Biol.
J., Colman, D.R., & Hendrickson, W.A. (1995) Nature 228, 243–251.
374, 327–337. 62. Sobek, H., Hecht, H.J., Aehle, W., & Schomburg, D.
35. Ryu, S.E., Kwong, P.D., Truneh, A., Porter, T.G., Arthos, (1992) J. Mol. Biol. 228, 108–117.
J., Rosenberg, M., Dai, X.P., Xuong, N.H., Axel, R., Sweet, 63. Zhang, X.J., Wozniak, J.A., & Matthews, B.W. (1995) J.
R.W., et al. (1990) Nature 348, 419–426. Mol. Biol. 250, 527–552.
36. Geiger, J.H., Hahn, S., Lee, S., & Sigler, P.B. (1996) 64. Zaks, A., & Klibanov, A.M. (1988) J. Biol. Chem. 263,
Science 272, 830–836. 3194–3201.
37. Xu, R.X., Hassell, A.M., Vanderwall, D., Lambert, M.H., 65. Yu, N., & Jo, B.H. (1973) J. Am. Chem. Soc. 95, 5033–5037.
Holmes, W.D., Luther, M.A., Rocque, W.J., Milburn, 66. Redfield, C., & Dobson, C.M. (1990) Biochemistry 29,
M.V., Zhao, Y., Ke, H., & Nolte, R.T. (2000) Science 288, 7201–7214.
1822–1825. 67. Bycroft, M., Sheppard, R.N., Lau, F.T., & Fersht, A.R.
38. Wang, B.C. (1985) Methods Enzymol. 115, 90–112. (1990) Biochemistry 29, 7425–7432.
39. Rypniewski, W.R., Breiter, D.R., Benning, M.M., 68. Shaanan, B. (1983) J. Mol. Biol. 171, 31–59.
Wesenberg, G., Oh, B.H., Markley, J.L., Rayment, I., & 69. Haurowitz, F. (1938) Hoppe-Seyler’s Z. Physiol. Chem.
Holden, H.M. (1991) Biochemistry 30, 4126–4131. 254, 266–274.
40. Brändén, C., & Jones, T.A. (1990) Nature 343, 687– 70. Freer, S.T., Kraut, J., Robertus, J.D., Wright, H.T., &
689. Nguyen Huu, X. (1970) Biochemistry 9, 1997–2009.
41. Remington, S., Wiegand, G., & Huber, R. (1982) J. Mol. 71. Luzzati, P.V. (1952) Acta Crystallogr. 5, 802–810.
Biol. 158, 111–152. 72. Stout, C.D. (1989) J. Mol. Biol. 205, 545–555.
42. Wyckoff, H.W., Tsernoglou, D., Hanson, A.W., Knox, 73. Pai, E.F., Kabsch, W., Krengel, U., Holmes, K.C., John,
J.R., Lee, B., & Richards, F.M. (1970) J. Biol. Chem. 245, J., & Wittinghofer, A. (1989) Nature 341, 209–214.
305–328. 74. Waser, J. (1963) Acta Crystallogr. A 16, 1091–1094.
43. Perrakis, A., Morris, R., & Lamzin, V.S. (1999) Nat. 75. Hetenes, M.R., & Stiefel, E. (1952) J. Natl. Bur. Stand.
Struct. Biol. 6, 458–463. 49, 409–436.
44. Kleywegt, G.J., & Jones, T.A. (2002) Structure 10, 76. Konnert, J.H. (1976) Acta Crystallogr. A32, 614–617.
465–472. 77. Takahashi, L.H., Radhakrishnan, R., Rosenfield, R.E.,
45. Lawrence, C.M., Rodwell, V.W., & Stauffacher, C.V. Jr., Meyer, E.F., Jr., & Trainor, D.A. (1989) J. Am. Chem.
(1995) Science 268, 1758–1762. Soc. 111, 3368–3374.
46. Story, R.M., Weber, I.T., & Steitz, T.A. (1992) Nature 355, 78. Jacobson, B.L., Chae, Y.K., Markley, J.L., Rayment, I., &
318–325. Holden, H.M. (1993) Biochemistry 32, 6788–6793.
References 187
79. Yang, Y.-S., Baldwin, J., Ley, B.A., Bollinger, J.M., Jr., & 109. Baca, M., Borgstahl, G.E., Boissinot, M., Burke, P.M.,
Solomon, E.I. (2000) J. Am. Chem. Soc. 122, 8495–8510. Williams, D.R., Slater, K.A., & Getzoff, E.D. (1994)
80. Jack, A., & Levitt, M. (1978) Acta Crystallogr. A34, Biochemistry 33, 14369–14377.
931–935. 110. Mosimann, S.C., Newton, D.L., Youle, R.J., & James,
81. Konnert, J.H., & Hendrickson, W.A. (1980) Acta M.N. (1996) J. Mol. Biol. 260, 540–552.
Crystallogr. A36, 344–350. 111. Bewley, M.C., Marohnic, C.C., & Barber, M.J. (2001)
82. Brunger, A.T. (1992) Nature 355, 472–475. Biochemistry 40, 13574–13582.
83. Oefner, C., & Suck, D. (1986) J. Mol. Biol. 192, 605– 112. Deisenhofer, J., Epp, O., Miki, K., Huber, R., & Michel,
632. H. (1984) J. Mol. Biol. 180, 385–398.
84. Bruenger, A., Karplus, M., & Petsko, G.A. (1989) Acta 113. Bruns, C.M., & Karplus, P.A. (1995) J. Mol. Biol. 247,
Crystallogr. A45, 50–61. 125–145.
85. Bruenger, A.T., Kuriyan, J., & Karplus, M. (1987) Science 114. Ficner, R., Lobeck, K., Schmidt, G., & Huber, R. (1992)
235, 458–460. J. Mol. Biol. 228, 935–950.
86. Johnson, L.N., Acharya, K.R., Jordan, M.D., & 115. Bolin, J.T., Filman, D.J., Matthews, D.A., Hamlin, R.C.,
McLaughlin, P.J. (1990) J. Mol. Biol. 211, 645–661. & Kraut, J. (1982) J. Biol. Chem. 257, 13650–13662.
87. Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983) 116. Thompson, T.B., Garrett, J.B., Taylor, E.A.,
Science 220, 671–680. Meganathan, R., Gerlt, J.A., & Rayment, I. (2000)
88. Brunger, A.T. (1988) J. Mol. Biol. 203, 803–816. Biochemistry 39, 10662–10676.
89. Weis, W.I., Brunger, A.T., Skehel, J.J., & Wiley, D.C. 117. Ida, K., Norioka, S., Yamamoto, M., Kumasaka, T.,
(1990) J. Mol. Biol. 212, 737–761. Yamashita, E., Newbigin, E., Clarke, A.E., Sakiyama, F.,
90. Kyte, J. (1995) Structure in Protein Chemistry, 1st ed., p & Sato, M. (2001) J. Mol. Biol. 314, 103–112.
234, Garland Publishing, New York. 118. Aleshin, A., Golubev, A., Firsov, L.M., & Honzatko, R.B.
91. Stickle, D.F., Presta, L.G., Dill, K.A., & Rose, G.D. (1992) (1992) J. Biol. Chem. 267, 19291–19298.
J. Mol. Biol. 226, 1143–1159. 119. Shaanan, B., Lis, H., & Sharon, N. (1991) Science 254,
92. Laskowski, R.A., Moss, D.S., & Thornton, J.M. (1993) J. 862–866.
Mol. Biol. 231, 1049–1067. 120. Gomis-Ruth, F.X., Gohlke, U., Betz, M., Knauper, V.,
93. Ogata, C.M., Gordon, P.F., de Vos, A.M., & Kim, S.H. Murphy, G., Lopez-Otin, C., & Bode, W. (1996) J. Mol.
(1992) J. Mol. Biol. 228, 893–908. Biol. 264, 556–566.
94. Blanchard, H., & James, M.N. (1994) J. Mol. Biol. 241, 121. Burkhard, P., Tai, C.H., Jansonius, J.N., & Cook, P.F.
574–587. (2000) J. Mol. Biol. 303, 279–286.
95. Ji, X., Zhang, P., Armstrong, R.N., & Gilliland, G.L. (1992) 122. Brautigam, C.A., Sun, S., Piccirilli, J.A., & Steitz, T.A.
Biochemistry 31, 10169–10184. (1999) Biochemistry 38, 696–704.
96. Gros, P., Betzel, C., Dauter, Z., Wilson, K.S., & Hol, W.G. 123. Murphy, M.E., Turley, S., Kukimoto, M., Nishiyama, M.,
(1989) J. Mol. Biol. 210, 347–367. Horinouchi, S., Sasaki, H., Tanokura, M., & Adman, E.T.
97. Wilmanns, M., Priestle, J.P., Niermann, T., & Jansonius, (1995) Biochemistry 34, 12107–12117.
J.N. (1992) J. Mol. Biol. 223, 477–507. 124. Cooper, S.J., Garner, C.D., Hagen, W.R., Lindley, P.F., &
98. Breiter, D.R., Meyer, T.E., Rayment, I., & Holden, H.M. Bailey, S. (2000) Biochemistry 39, 15044–15054.
(1991) J. Biol. Chem. 266, 18660–18667. 125. Sanders, D.A., Moothoo, D.N., Raftery, J., Howard, A.J.,
99. Li, H.M., Wang, D.C., Zeng, Z.H., Jin, L., & Hu, R.Q. Helliwell, J.R., & Naismith, J.H. (2001) J. Mol. Biol. 310,
(1996) J. Mol. Biol. 261, 415–431. 875–884.
100. Kumar, P.R., Eswaramoorthy, S., Vithayathil, P.J., & 126. Esposito, L., Vitagliano, L., Sica, F., Sorrentino, G.,
Viswamitra, M.A. (2000) J. Mol. Biol. 295, 581–593. Zagari, A., & Mazzarella, L. (2000) J. Mol. Biol. 297,
101. Ursby, T., Adinolfi, B.S., Al-Karadaghi, S., De Vendittis, 713–732.
E., & Bocchini, V. (1999) J. Mol. Biol. 286, 189–205. 127. Martinez-Oyanedel, J., Choe, H.W., Heinemann, U., &
102. Albert, A., Dhanaraj, V., Genschel, U., Khan, G., Ramjee, Saenger, W. (1991) J. Mol. Biol. 222, 335–352.
M.K., Pulido, R., Sibanda, B.L., von Delft, F., Witty, M., 128. Wilson, M.A., & Brunger, A.T. (2000) J. Mol. Biol. 301,
Blundell, T.L., Smith, A.G., & Abell, C. (1998) Nat. Struct. 1237–1256.
Biol. 5, 289–293. 129. Czapinska, H., Otlewski, J., Krzywda, S., Sheldrick,
103. Jelsch, C., Teeter, M.M., Lamzin, V., Pichon-Pesme, V., G.M., & Jaskolski, M. (2000) J. Mol. Biol. 295, 1237–
Blessing, R.H., & Lecomte, C. (2000) Proc. Natl. Acad. 1249.
Sci. U.S.A. 97, 3171–3176. 130. Weber, I.T., Miller, M., Jaskaolski, M., Leis, J., Skalka,
104. Teeter, M.M., Roe, S.M., & Heo, N.H. (1993) J. Mol. Biol. A.M., & Wlodawer, A. (1989) Science 243, 928–931.
230, 292–311. 131. Miller, M., Jaskaolski, M., Rao, J.K., Leis, J., & Wlodawer,
105. Anderson, D.H., Weiss, M.S., & Eisenberg, D. (1997) J. A. (1989) Nature 337, 576–579.
Mol. Biol. 273, 479–500. 132. Wlodawer, A., Miller, M., Jaskaolski, M.,
106. Housset, D., Habersetzer-Rochat, C., Astier, J.P., & Sathyanarayana, B.K., Baldwin, E., Weber, I.T., Selk,
Fontecilla-Camps, J.C. (1994) J. Mol. Biol. 238, 88–103. L.M., Clawson, L., Schneider, J., & Kent, S.B. (1989)
107. Schreuder, H.A., Prick, P.A., Wierenga, R.K., Vriend, G., Science 245, 616–621.
Wilson, K.S., Hol, W.G., & Drenth, J. (1989) J. Mol. Biol. 133. MacArthur, M.W., & Thornton, J.M. (1996) J. Mol. Biol.
208, 679–696. 264, 1180–1195.
108. Lauble, H., Kennedy, M.C., Beinert, H., & Stout, C.D. 134. Artymiuk, P.J., & Blake, C.C. (1981) J. Mol. Biol. 152,
(1994) J. Mol. Biol. 237, 437–451. 737–762.
188 Crystallographic Molecular Models
135. Czjzek, M., Payan, F., Guerlesquin, F., Bruschi, M., & 137. Ito, N., Phillips, S.E., Yadav, K.D., & Knowles, P.F. (1994)
Haser, R. (1994) J. Mol. Biol. 243, 653–667. J. Mol. Biol. 238, 794–814.
136. Betzel, C., Lange, G., Pal, G.P., Wilson, K.S., Maelicke, 138. Freymann, D., Down, J., Carrington, M., Roditi, I.,
A., & Saenger, W. (1991) J. Biol. Chem. 266, Turner, M., & Wiley, D. (1990) J. Mol. Biol. 216, 141–160.
21530–21536. 139. Kraulis, P.J. (1991) J. Appl. Crystallogr. 24, 946–950.
Chapter 5
Noncovalent Forces
Crystallographic studies have demonstrated that a mole- A (H2O)x + B (H2O)y 1 A·B (H2O)z + (x + y – z) H2O
cule of protein, dissolved in aqueous solution, is com-
posed of polypeptides, each of which is folded into a (5–1)
structure that is the same as or closely similar to the
structure of all of the other polypeptides of the same The species A(H2O)x and B(H2O)y are the separated
amino acid sequence. A polypeptide, as it emerges from solutes dissolved in water and surrounded on all sides by
the ribosome, however, is a fluid polymer of undefined water. Presumably, there are a certain number of water
structure. Each newly synthesized polypeptide then folds molecules, x and y, respectively, that are significantly
spontaneously to assume its unique secondary and terti- affected by the presence of A or B. The effects of the
ary structure. solute on the surrounding molecules of water and
The folding of polypeptides to form the native the effects of the surrounding molecules of water on the
structure of a protein, the association of folded solute are referred to as solvation or hydration. Around
polypeptides to form multimeric proteins, and the a particular molecule of solute at a particular instant, a
binding of substrates, coenzymes, or other molecules to particular number of water molecules are affected signif-
proteins usually proceed without the formation of cova- icantly by the presence of the solute. This number fluc-
lent bonds and are consequently controlled by nonco- tuates with time, and the coefficients x, y, and z are
valent forces. It appears that four noncovalent forces intended to represent averages over a range of possible
are involved in these chemical reactions: ionic interac- configurations for the hydration. When A and B associate
tions, hydrogen bonds, the hydrophobic effect, and van to form the noncovalent complex, that complex will also
der Waals forces. In the refined crystallographic molec- be surrounded by water, and there will be a number of
ular model of a folded protein, the consequences of water molecules, z, that are significantly influenced by
these noncovalent forces are evident. The chemical and the complex. As A·B always has a smaller surface area
physical properties of these interactions, as they occur than the sum of the surface areas of A and B, z should be
in aqueous solution, must be understood before those less than x + y, and (x + y – z) molecules of water will
consequences can be appreciated. Therefore, a discus- return to the bulk phase of the water when the complex
sion of these interactions must precede a detailed is formed.
description of the atomic details of refined crystallo- The change in standard free energy for the overall
graphic molecular models. None of the four categories reaction can be expressed as
of noncovalent forces—ionic interactions, hydrogen
bonds, the hydrophobic effect, and van der Waals DG ª = DG ªA·B + DG ªhyd(A·B) +
forces—can be completely separated from all of the
DG ªr H – DG ªhyd(A) – DG ªhyd(B)
others. Van der Waals forces must play a part in each of 2O
almost any other solvent but are dissociated by water, 0.298 nm Figure 5–1: Selected dimensions
of the dimer of two molecules of
while the hydrophobic effect is observed only when the ´ water in the gas phase. The dis-
solvent is water. To appreciate fully these influences of O H tances and angles were obtained
water on the outcome of noncovalent associations, the H O by microwave spectroscopy.5,6
O
O
properties of liquid water itself must be understood.
H
60∞ H
Water
The properties of liquid water, when considered in their the two hydrogens and the oxygen of the proton accep-
entirety, are unlike those of any other liquid. For exam- tor is inclined at an angle of 60 ∞ to the line of centers
ple, the surface tension of water at 20 ∞C is 73 dyne cm–1, between the two oxygen atoms. This means that the four
while those of most other liquids are between 20 and substituents around the acceptor—the two hydrogens,
40 dyne cm–1. The relative permittivity,* er, of water at the shared hydrogen, and the lone pair of electrons—are
20 ∞C is 80.2, while the relative permittivities of other liq- tetrahedrally arrayed in the dimer. This arrangement
uids, with few exceptions, are less than 30. The high suggests that the oxygen–hydrogen bond on the donor
melting point and boiling point of water, for a molecule points directly at one of the two s lone pairs of electrons
of its size and composition, are well-publicized anom- on the acceptor oxygen. The axis of the sp3 orbital in
alies. Not only are the numerical values of the physical which that s lone pair resides should be congruent with
constants anomalous, but the qualitative behaviors of the line of centers. The interaction between the hydro-
the thermodynamic properties of the liquid, when it is gen–oxygen s bond on the donor molecule of water and
exposed to variations of physical forces such as pressure, the s lone pair of electrons on the acceptor is an unhin-
temperature, electric field, and electromagnetic energy, dered, intermolecular example of a hydrogen bond. The
are unique. The details of these peculiarities provide an formation of dimers and higher oligomers in steam con-
intuitive picture of the structure of liquid water that can tributes significantly to its nonideal behavior at higher
serve as a basis for understanding the behavior of solutes concentrations of water in the gas phase.
such as polypeptides in this solvent. Unfortunately, there The ice that is in equilibrium with liquid water at
is no adequate molecular model for the structure of atmospheric pressure and 0 ∞C is known as ice Ih. Ice Ih
liquid water, and an informed intuitive picture is the is a tetrahedral diamond lattice of oxygen atoms (Figure
closest approach to reality currently available. 5–2A),4 each 0.276 nm from its nearest neighbor.7 The
A water molecule in the dilute, ideal vapor is an oxygens are held in the lattice by hydrogen bonds to each
oxygen atom bonded covalently to two hydrogen atoms. of their four nearest neighbors. Between any oxygen
Quantum mechanical calculations1,2 of the isolated mol- atom and each of its four nearest neighbors in the lattice
ecule in the vacuum seem3 to support the conventional is one hydrogen atom. At any instant, each hydrogen is
orbital picture of an oxygen hybridized sp3 with two covalently bound to one of the two oxygens between
covalent bonds to two hydrogens and two s lone pairs of which it is found, and every oxygen has only two hydro-
electrons; these four substituents are oriented tetrahe- gens covalently bound to it. These two requirements
drally around the oxygen. The HOH bond angle4 is create a situation in which only a predictable number of
104.5 ∞, distorted from 109.5 ∞ by the electron repulsion of arrangements for these hydrogens can occur, and this
the lone pairs or by a rehybridization, driven by energy of number of arrangements can explain almost exactly the
promotion, that gives the oxygen–hydrogen s bonds observed residual entropy of ice Ih at 0 K.4 There is a sig-
more p character. The oxygen–hydrogen bond lengths nificant amount of empty space in ice Ih (Figure 5–2B),
are 0.096 nm. and this is one of the properties permitting it to be less
In more concentrated vapor, dimers of water form dense than the liquid water with which it can be in equi-
(Figure 5–1).5,6 From results of molecular beam librium.
microwave spectroscopy, the mean structure of the The structure and properties of ice Ih and water
dimer can be calculated.5,6 The two oxygens are sepa- vapor have been exhaustively investigated and unam-
rated by a distance of 0.298 nm. One of the four hydro- biguously established. At atmospheric pressure, liquid
gens lies on the line of centers between the two oxygens, water lies between these two extremes on the phase dia-
and it is covalently bonded to one of them, which is gram, and its properties can be compared with them.
referred to as the proton donor. The other oxygen, which From the transitions between solid and liquid and
is referred to as the proton acceptor, has two of the four between liquid and vapor, insight into the structure of
hydrogens covalently bonded to it. The plane defined by the liquid can be gained.
When ice melts, the reaction involves a standard
enthalpy of fusion, and when the liquid vaporizes, the
* The relative permittivity or dielectric constant of a substance is reaction involves a standard enthalpy of vaporization.
its permittivity relative to the permittivity of vacuum. The enthalpy of water at atmospheric pressure can be
Water 191
A
100
Melting
point
80
60
40
CP
20 S – S0
Boiling
B H – H0 point
0
G – H0
–20 CP J mol –1 K –1
(S – S0 )/2 J mol –1 K –1
H – H0 kJ –1 mol –1
–40 G – H0 kJ –1 mol –1
translational, and rotational energy levels of these two liquid has almost the same relative permittivity as the
substances, agree quite closely with observed values.4 solid indicates that much of the lattice remains.
The observed values of the isochoric heat capacity of The molar volume of ice (19.6 cm3 mol–1) is some-
liquid water, however, are almost twice that calculated what greater than that of liquid water (18.0 cm3 mol–1) at
from its estimated vibrational, translational, and rota- 0 ∞C and much greater than the molar volume that would
tional energy levels (Figure 5–4).4 This excess or configu- be expected if spheres the radius of molecules of water
rational heat capacity can be explained by postulating (0.14 nm) were randomly packed in an unstructured, dis-
that much of the hydrogen-bonded structure of ice ordered array (10 cm3 mol–1).4 The large molar volume of
remains in the liquid and its gradual deterioration as the ice Ih is due to the vacant space created by the fact that
temperature is raised is responsible for the anomalous oxygens are held in a tetrahedral array by the hydrogen-
absorption of heat. The high and relatively constant bonded network (Figure 5–2B). When ice melts, the mol-
value for the heat capacity throughout the range of tem- ecules of water are allowed to occupy some of the vacant
perature between 0 and 100 ∞C suggests that the hydro- space in the hydrogen-bonded lattice and the density
gen-bonded network in the liquid is gradually and increases. A related fact is that the molar volume of liquid
constantly deteriorating as the temperature is rising. water increases as the temperature is decreased below
Another indication of the extensive hydrogen- 4 ∞C, presumably because the expansion caused by the
bonded structure in liquid water is its high static relative strengthening of the hydrogen-bonded lattice is greater
permittivity (er = 88 at 0 ∞C), which is almost equivalent to than the usual contraction experienced by most liquids
that of ice Ih (er = 99 at 0 ∞C). The large value for the rela- resulting from the decrease in thermal energy. It is only
tive permittivity of ice Ih is usually explained semiquan- above 4 ∞C that the latter effect becomes dominant. The
titatively4 as a result of the high correlation among the contraction of water upon melting and the expansion of
orientations of the individual dipole moments of the the liquid upon cooling below 4 ∞C are almost unprece-
water molecules caused by their rigid arrangement in the dented. Diamond, silicon, and germanium are tetrahedal
hydrogen-bonded lattice. When an electric field is solids that also float upon their melts, as ice floats upon
applied, the dipole moments reorient cooperatively, pro- water. Aside from these peculiar features, the molar vol-
ducing the large relative permittivity. The fact that the umes of ice and liquid water at 0 ∞C are both large and
not that different from each other. Consequently, much
Observed C V of the vacant space created by the hydrogen-bonded lat-
tice in the solid remains in the liquid.
Configurational C V The fact that much of the vacant space remains in
liquid water also explains the unique decrease in isother-
60
mal compressibility that occurs in liquid water as tem-
perature is raised from 0 to 50 ∞C (Figure 5–5).4 The
(J mol –1 K –1)
T
Ice Ih Liquid water Water vapor
20
0.52
kT (GPa –1)
0.48
0
–40 0 40 80 120
Temperature (∞C)
0.44
Figure 5–4: Separation of the observed isochoric heat capacity CV
of water (solid line) into calculated (dashed line) and configura-
tional (shaded difference) components.4 The heat capacity of ice Ih
was calculated from the two vibrational absorption bands of lowest 0.40
energy (n = 840 cm–1 and n = 230 cm–1); the heat capacity of water 0 20 40 60 80 100
vapor was calculated from the vibrational, rotational, and transla- Temperature (∞C)
tional energies of the water molecules; and the heat capacity of the
liquid was calculated on the assumption that each molecule in the Figure 5–5: Isothermal compressibility kT (gigapascals–1) for
liquid has three hindered degrees of translation and three hindered liquid water at unit atmosphere (101.3 kPa) pressure, presented as
librations. Adapted with permission from ref 4. Copyright 1969 a function of temperature.4 Reprinted with permission from ref 4.
Clarendon Press. Copyright 1969 Clarendon Press.
Water 193
in units of reciprocal pressure (pascal–1). In almost every change the structure of the liquid is due to the fact that
other liquid, isothermal compressibility increases liquid water is in an extensively hydrogen-bonded form
monotonically with temperature. In liquid water at low at normal pressures but not at higher pressures.
temperatures, most of the structured vacant space of Such a transition between an ordered and a less
ice Ih remains when the transition from solid to liquid ordered state caused by an increase in pressure would
occurs, and this structured vacant space is gradually also explain why the application of pressure decreases
replaced with randomly distributed, unstructured vacant the viscosity of liquid water rather than increasing it as
space, similar to that in other liquids as the temperature it does the viscosities of other liquids.8 The viscosity
is raised. The high compressibility at low temperatures of water is anomalously large in the first place
results from the ability of the lattice to decrease its (h = 1.00 mPa s at 20 ∞C) compared to the viscosity of
volume upon the application of pressure at the expense liquids such as acetonitrile (h = 0.36 mPa s at 20 ∞C),
of the significant vacant space among the oxygen atoms. pentane (h = 0.24 mPa s at 20 ∞C), and carbon disulfide
The idea that liquid water at lower temperatures (h = 0.36 mPa s at 20 ∞C).
retains a structure similar to that of ice Ih is also sup- Additional evidence for the retention of a signifi-
ported by the small cubic expansion coefficient of liquid cant fraction of the hydrogen-bonded lattice in liquid
water. Upon heating at atmospheric pressure between water is provided by scattering of X-radiation. When a
temperatures of 20 and 30 ∞C, other liquids expand about beam of X-radiation is passed through a liquid, it is
4 times more rapidly than does water (Figure 5–6).8 As scattered by the electrons of the molecules in the liquid.
pressure is applied, however, the cubic expansion coeffi- The intensity of the scattered X-radiation varies as a
cient for water increases while the coefficients of thermal function of the angle between the incident beam and the
expansion for other liquids decrease. At high pressures, direction at which the scattered radiation emerges from
both water and other liquids have about the same cubic the solution. This angular dependence of the intensity
expansion coefficient. If liquid water at atmospheric can be used to calculate a radial molecular correlation
pressure is extensively hydrogen-bonded with an function, GM(r). This function is an approximation9 of
expanded structure similar to that of ice Ih (Figure 5–2B), the variation of electron density as a function of the
then as the temperature is raised, the decrease in struc- radial distance from any one molecule in the liquid. The
tured empty volume due to the deterioration of this actual variation of electron density is distinguished from
hydrogen-bonded network could almost cancel the its approximation by designating it as g(r). The function
increase in unstructured volume due to increased ther- GM(r) registers any local variations in the electron density
mal motion. As pressure is applied, however, it causes of the liquid, relative to the mean electron density of the
the hydrogen-bonded network to deteriorate or restruc- liquid, that are maintained around any one of the mole-
ture and the liquid to have a more normal cubic expan- cules. Because it is a relative quantity, the value of GM(r)
sion coefficient. In this view, the ability of pressure to is unity when the electron density is equal to the mean
electron density. Any variations in density that are
12 observed are assumed to be permanent features of the
structure of the liquid. As GM(r) or g(r) is proportional to
the electron density as a function of radial distance from
(∞C –1)
∫
p0 r2
ns = 4 p r 2 g (r ) dr (5–4)
P
6 g r1
V
1
1
hydrogen-bonded network has become considerably
50∞C more elastic in water than in ice, permitting the second
1
and third groups of neighbors to approach the molecule
25∞C at the origin much more closely, rather than being held
1
at a distance by a rigid lattice. There also seems to be too
20∞C much electron density in the actual liquid between the
1
first maximum and the second maximum.10 This has
4∞C been interpreted to mean that molecules are able to
1
break out of the lattice and become interstitial mole-
cules of water, transiently occupying the vacant spaces
(Figure 5–2B).12
0 So far the discussion has emphasized similarities
0.3 0.5 0.7 0.9 between ice Ih at 0 ∞C and liquid water at low tempera-
r (nm) tures. There are, of course, remarkable differences. The
most obvious is the fact that ice is a solid and water is a
Figure 5–7: Molecular correlation functions for liquid water7,9 and liquid. Even though ice Ih is a solid, however, it, like
ice Ih.10 The molecular correlation functions for liquid water at sev-
eral temperatures (solid lines) were calculated from the angular liquid water, is able to flow. In order for condensed
dependence of the intensity of the scattered X-rays from samples of matter to flow, layers of molecules in that matter must be
pure water through which a collimated beam of X-rays was passed. able to slide past layers of other molecules above and
A molecular correlation function for liquid molecules arranged on below them. In the case of water or ice Ih the manifesta-
the lattice of ice Ih (dashed line) was calculated from the length of tion of this ability requires extensive and simultaneous
the hydrogen bonds in ice Ih (0.276 nm) and the fact that the
oxygen atoms lie upon a tetrahedral diamond lattice. The calcula- disruption of continuous layers of hydrogen bonds in
tion was performed with the assumption that the distributions of the liquid or solid as it flows. This capacity to flow is far
electron density around the maxima defined by the lattice could be more evident in water than in ice. It is quantified by
approximated by error functions. The width of the first error func- values for the viscosity of the liquid and the solid. Liquid
tion was made the same as the width of the first maximum in liquid water at 0 ∞C has a viscosity of 1.8 mPa s, and ice Ih at
water at 4 ∞C, and the widths of the two subsequent error functions
were made proportional to the square of their distances from the 0 ∞C has a viscosity of about 1016 mPa s.13,14 The difference
origin.7 Adapted with permission from ref 7, originally from ref 9. between liquid water and ice Ih is so large because, to
Copyright 1971 American Institute of Physics. flow, hydrogen bonds must be broken simultaneously
over significant regions. There are, however, measure-
ments that quantify the behavior of individual molecules
the curve at 4 ∞C, upon the assumption that it is a of water.
Gaussian function, indicates that it is produced by about When individual molecules in water change their
four nearest neighbors. In ice there are four nearest relative positions, hydrogen bonds must be broken and
neighbors to each water molecule and they are held at a re-formed elsewhere. The capacity to change positions is
distance of 0.276 nm. It can be assumed that these are reflected in the process of self-diffusion, a measure of the
retained in the liquid. That the peak is centered at a dis- rate at which the average molecule of water diffuses
tance so close to the hydrogen-bonded distance in ice through a condensed phase of water molecules. The self-
has been interpreted to mean that each water molecule diffusion coefficient for ice Ih is about 10–10 cm2 s–1 at
in the liquid has about four hydrogen-bonded nearest 0 ∞C, and for liquid water it is 1.4 ¥ 10–5 cm2 s–1 at 5 ∞C.4
neighbors. This difference of 105 demonstrates that water molecules
A radial molecular correlation function can be cal- can exchange their hydrogen-bonded neighbors far
culated11 for liquid molecules of water confined to the more rapidly in the liquid than in the solid. To the extent
tetrahedral lattice of ice Ih (Figure 5–7). In ice Ih, there that this exchange involves breaking and making of
are four nearest neighbors at 0.276 nm, 12 next neigh- hydrogen bonds, the hydrogen bonds in the liquid are
bors at 0.45 nm, and 12 farther neighbors at 0.52 nm weaker than those in the solid.
Water 195
An even more easily understood measurement of converted into the liquid are weaker than the hydrogen
the rate at which a molecule of water can detach itself bonds formed when the vapor is converted into the solid.
from the hydrogen-bonded network in the liquid in order The weakening of the hydrogen bonds upon melt-
to reorient itself is the dielectric relaxation of liquid ing that is indicated by both the increases in self-diffu-
water. The relative permittivity of a chemical substance sion and dielectric relaxation and the increase in the
is a function of the frequency of the alternating electric stretching frequency of the oxygen–hydrogen bond
field used to measure it. Tabulated relative permittivities requires that the dissociation constant for the hydrogen
are usually static relative permittivities that are meas- bond in liquid water be significantly larger than that for
ured with an alternating electric field with a frequency of ice Ih. This increase in dissociation constant upon melt-
alternation so low that the measured values may be con- ing may be large enough to produce a significant popu-
fidently extrapolated to zero frequency. The low fre- lation of unbonded molecules of water in the liquid,
quency of alternation allows the molecules in the presumably the interstitial water the existence of which
substance more than ample time to align themselves, as is implied in the radial molecular correlation function.
far as they are able, with the electric field while the meas- There are infrared spectra of liquid water which
urement is made. If the frequency of the applied field, suggest that there are two distinct species of
however, is gradually increased, at some point the mole- oxygen–hydrogen bonds in the liquid,16 and these two
cules in the substance are unable to invert their align- species could represent intact and broken hydrogen
ments at rates sufficient to keep up with the alternations bonds.17 It is possible to fit the temperature dependence
of the applied field. Their inability to keep up results of both these infrared spectra and the heat capacity of
from intermolecular forces that hinder their rotation. the liquid with a model of the liquid that assumes that
The dielectric relaxation time is the time that an applied there are only two types of oxygen–hydrogen bonds
field must be in operation before exp(-1) of the increase present, those participating in intact hydrogen
in relative permittivity due to the rotation of the mole- bonds and those the hydrogen bonds of which are
cules aligning themselves with the field has occurred. broken.17 From such a fit, the standard free energy of
The dielectric relaxation time of ice Ih at 0 ∞C is 2 ¥ 10–5 s, formation of a hydrogen bond in liquid water is esti-
that of liquid water at 0 ∞C is 2 ¥ 10–11 s, and that of a water mated to be –2.0 kJ mol–1 at 25 ∞C; and the fraction of
molecule in a dilute solution of water in benzene is broken hydrogen bonds, 0.30 at 25 ∞C. From this fraction
1 ¥ 10–12 s.4 A similar value for the rotational correlation it would follow that at a given instant about 3–4% of the
time of a water molecule in ice Ih (1.5 ¥ 10–5 s at –6 ∞C)15 molecules of water would be either attached to the lattice
has been measured by nuclear magnetic resonance. by only one hydrogen bond or completely free of the lat-
Although a water molecule in liquid water is constrained tice. There are, however, results suggesting both that
so that it rotates 20 times more slowly than it does in a these infrared spectra do not result from only two popu-
condensed phase lacking hydrogen bonds, it rotates 106 lations of oxygen–hydrogen bonds8 and that no simple
times faster in liquid water than in ice. This again two-state model can explain both the cubic expansion
demonstrates that the hydrogen bonds between water coefficient and the temperature coefficient of the
molecules in liquid water are weaker than those in ice. isothermal compression of liquid water simultaneously.7
This weakening of the hydrogen bonds in the liquid Consequently, the question of the molar concentration
is also reflected in a shift that occurs in the frequency of of intact hydrogen bonds in liquid water remains open.
the maximum infrared absorption of the oxygen–hydro- The mental picture of liquid water that forms intu-
gen stretching vibration of water when it melts.4 The fre- itively as its peculiarities are described is presently more
quency at which a covalent bond absorbs infrared adequate than any sophisticated physical model of its
electromagnetic energy is correlated with its bond structure. The impression that is formed from a consid-
energy. The greater the bond energy, the higher the fre- eration of these properties is that liquid water retains
quency of the light required to excite its vibration. In the most of the hydrogen bonds that are present in ice Ih but
case of the stretching frequency of the oxygen–hydrogen that these hydrogen bonds are more elastic, weaker, and
bond in water, the stronger the hydrogen bond in which break and re-form much more rapidly than those in ice Ih.
it participates, the weaker will be the covalent oxygen–
hydrogen bond itself and the lower the frequency of its Suggested Reading
absorption. The stretching frequency of the oxygen–
Eisenberg, D., & Kauzmann, W. (1969) The Structure and Properties
hydrogen bond in ice Ih is 3220 cm–1, in liquid water it is
of Water, Clarendon Press, Oxford, England.
3490 cm–1, and in the dilute vapor it is 3700 cm–1. In the
dilute vapor, no hydrogen bond weakens the oxygen–
hydrogen bond. In ice Ih, a strong, fixed hydrogen bond Problem 5–1: The isopiestic heat capacity of a substance,
weakens the oxygen–hydrogen bond significantly. In Cp, is defined as the amount of heat required to raise one
liquid water, less than half the decrease in frequency mole of the substance one degree in temperature at con-
between the vapor and ice Ih occurs, presumably stant pressure. The units of this quantity are joules
because the hydrogen bonds formed when the vapor is degree–1 mole–1.
196 Noncovalent Forces
A substance has a certain intrinsic enthalpy at 0 K, and observations, a decision must be made on the standard
this intrinsic enthalpy increases as the temperature state to be used. Unlike those of the standard enthalpy
increases and the substance absorbs heat: and those of the heat capacity, the numerical values of
both standard entropy and standard free energy depend
∫ C dT + DH
T
significantly on this choice of standard states.18,19 When
HT – H0 = p pc dealing with reactants and products dissolved in solu-
0
tion, such as molecules of proteins and their ligands in
water or alkanes in hexadecane, the choice of standard
where HT is the intrinsic enthalpy at T = T, H0 is the intrin-
state, other than the obvious conventions of standard
sic enthalpy at T = 0 K, and DHpc is the sum of the
temperature and pressure, is a choice of the units in
enthalpy changes for all phase transitions between 0 K
which the concentrations of the reactants and products
and T. The heat capacity of H2O is the following function
are to be expressed. The desire in choosing the units for
of temperature (Figure 5–3):
the concentrations is to eliminate any contributions to
the entropy arising simply from the act of dispersing the
C p = (0.172 J K –2 mol –1)T [T = 0 – 60K ] solutes in the solvent and inescapably from the volumes
of the solutes and the solvent. These contributions are
C p = 2.47 J K –1 mol –1 + (0.129 J K –2 mol –1)T the entropy of mixing. The reason for eliminating
[T = 60 – 273K ] entropy of mixing is that the entropy that remains is
the entropy of only the reaction itself and changes in the
C p = 77.5 J K –1 mol –1 [T = 273 – 373K ] entropy of solvation that accompany the reaction.
It can be assumed, as seems reasonable, that the
and thermodynamic activity of benzene should be the same
whether it is dissolved in octane, decane, dodecane,
tetradecane, or hexadecane. It has been shown experi-
DH fus = 6.0 kJ mol –1 at 0 ªC mentally20 that this assumption is valid only if the
thermodynamic activity of benzene is expressed in units
DH vap = 40.7 kJ mol –1 at 100 ªC of corrected volume fraction as defined by the
equation18
∫
T Cp in the experiment), gv,A,j is the activity coefficient neces-
ST – S0 = dT + DSpc sary to convert real behavior into ideal behavior, VA,j is
0 T
the volume of a mole of solute A when it is dissolved in a
solution with solvent j, Vj is the volume of a mole of sol-
where ST is the intrinsic entropy at T = T, S0 is the intrin- vent j in the solution, and fA,j is the volume fraction of
sic entropy at T = 0, and DSpc is the sum of the entropy solute A in the solution with solvent j:
changes for all phase transitions.
When phase transitions occur, DG∞ = 0 at the transition nA VA ,j
temperature. Use DH ∞fus and DH ∞vap to calculate DS ∞fus f A,j = = [ A ]VA ,j (5–6)
nA VA ,j + nj Vj
and DS ∞vap.
(B) Draw a graph of ST – S0 as a function of T.
where nA and nj are the moles of solute A and solvent j,
(C) If GT – G0 = HT – H0 – STT + S0T, are changes in G, as respectively, in the solution and [A] is its molar concen-
the temperature changes, greater than or less tration.* Most measurements of activity are performed in
than changes in H in the case of H2O? such a way that the activity coefficient gv,A,j is insignifi-
cantly different from 1 or becomes 1 by extrapolation.
That the thermodynamic activity of a solute should be
Standard States and Units of Concentration
* One must remember that molarity is defined as moles liter–1 and
Whenever standard entropy or its representative, stan- the volume of a mole of a substance is defined as centimeters3
dard free energy, are calculated from experimental mole–1.
Standard States and Units of Concentration 197
defined by Equation 5–5 was predicted theoretically 21,22 vent or the solute and Vm is the molar volume.20 Equation
before it was verified experimentally. Expressing activi- 5–9 is also used to estimate the partial molar volumes of
ties of solutes by using Equation 5–5 can be thought of as other solvents, including water, when solutions are
correcting the concentration of the solute in units of dilute.
mole fraction for the differences in the volumes of solute The partial molar volumes of most solutes when
and solvent because when the volumes of a mole of they are dissolved in water are significantly less than
solvent and a mole of solute are the same and gv,A,j is 1, those defined by Equation 5–9.23,24 If the solute is a
Equation 5–5 becomes hydrocarbon, it occupies a significant portion of the
empty space already present in the water (Figure 5–2).
nA Direct measurement of partial molar volumes of hydro-
a A,j = = x A ,j (5–7) carbons in water have rarely been performed, usually
nA + nj
because such solutes are poorly soluble in water. In the
absence of such measurements, the algorithms of
where xA,j is the mole fraction of solute A in solvent j. Traube23,24 are used to estimate partial molar volumes of
Equation 5–7 is Raoult’s law. hydrocarbons in water and those of most other solutes as
The difficulty with the corrected volume fraction is well.
deciding what volume to use for the volume of a mole of Traube concluded from direct measurement that
solute in the solution. When the solute is a liquid dis- the partial molar volume of any neutral solute (centime-
solved miscibly in a nonpolar liquid, the molar volume of ter3 mole–1) when it is dissolved in water is the sum of the
the solute, Vm, which is the volume of a mole of the pure partial molar volumes of its atoms and functional groups
liquid solute, is a reasonable choice. If, however, the plus the covolume, which is a universal correction. The
solute is a gas or a solid at the temperature of the meas- partial molar volumes of the atoms and functional
urement, its volume in the solution may be quite differ- groups at 25 ∞C are for hydrogen, 3.1 cm3 mol–1; carbon,
ent from its volume at the temperature or pressure 10.0 cm3 mol–1; nitrogen, 1.5 cm3 mol–1; the oxygen in an
required to liquify it. In water, even a solute the pure ether, 5.5 cm3 mol–1; a hydroxyl group (–OH), 5.4 cm3
phase of which is a liquid at the temperature of the meas- mol–1; the oxygen in an amide, thioester, ketone, or
urement may have a volume in the solution that is sig- aldehyde (=O) 5.5 cm3 mol–1; an acyl group (–COO–),
nificantly different from its molar volume. In the present 15.9 cm3 mol–1; phosphorus, 17.1 cm3 mol–1; sulfur,
discussion, the partial molar volume of the solute at infi- 15.6 cm3 mol–1; chlorine, 13.3 cm3 mol–1; bromine,
nite dilution has been chosen as an approximate esti- 17.8 cm3 mol–1; and iodine, 21.6 cm3 mol–1. From the sum
mate of the volume of a mole of the solute in the solution. of the partial molar volumes of its atoms and functional
Unlike the partial molar volume, however, which is only groups, 8.2 cm3 mol–1 must be subtracted for each mono-
an estimate of the volume of a mole of the solute in the cyclic ring, either saturated or unsaturated, and 26.6 cm3
solution, the actual volume of a mole of the solute in the mol–1 for each bicyclic aromatic ring, such as a naphthyl
solution does not vary with the concentration of that group. To the final sum for the constituents of a particu-
solute. lar molecule, a covolume of 12.5 cm3 mol–1 must be
The partial molar volume (centimeters3 mole–1) of added.
solute A or solvent j is defined as the increase in the When ions are dissolved in water, their charge con-
volume of the solution, ! V, that occurs when an infini- stricts the solvent in their vicinity. These electrostrictions
tesimally small number of moles, ! n, of solute A or sol- vary between –10 and –30 cm3 mol–1 for the addition of
vent j is added to the solution salts of monovalent ions or monovalent zwitterions.24
How electrostriction is to be treated in estimating the
( )
volumes of ions to be used in calculating their corrected
!V volume fractions is unclear. The volumes of their ionic
V = (5–8)
!n T, P solids, however, are also poor estimates of their volumes
in solution.
When a reaction takes place in solution, its equilib-
at the concentrations of solute and solvent at which the
rium constant can be defined by use of units of corrected
measurement is made. If the solvent and solute are both
volume fraction (Equation 5–5), which have the conven-
hydrocarbons, it is usually assumed that the partial
ient advantage that they are dimensionless. For example,
molar volumes of solvent and solute are
the equilibrium constant for the association
M (5–9)
V = = Vm A + B 1C (5–10)
r
where M is the molar mass (grams mole–1) and r is the occurring in solvent j when activity coefficients can be
density (grams milliliter–1) of the pure phase of the sol- ignored, would be
198 Noncovalent Forces
( ) ( )
example benzene, is added to the system at low concen-
[C] V C ,j VA ,j + VB ,j – VC ,j – Vj tration, and its partition between the two phases is
exp allowed to reach equilibrium.20 The concentration of the
[ A ][ B ] VA ,j VB ,j Vj
solute in each phase is measured, and a partition coeffi-
(5–11) cient, Kp,A, is calculated. Although the concentration of
solute A in each of the two phases is initially tabulated in
If the reaction proceeds with no change in volume units convenient to the method of measurement,25 for
example grams of solute (grams of solvent)–1, units for
V A ,j + V B ,j = V C ,j (5–12) the concentrations used to calculate the standard free
energies of transfer, and hence the definition of standard
state, has, as always, a significant effect on the magnitude
and of the standard free energy of transfer. If it is assumed
that corrected volume fractions are the proper units and
( )
a C,j also that activity coefficients can be ignored, the parti-
[ C ] V A ,j + V B ,j
K eq = = exp (–1) tion coefficient for the transfer of solute A from water to
a A,j a B,j [ A ][ B ] V A ,j V B ,j solvent j is
(5–13)
(5–17)
the molar concentrations is a constant, namely, the equi-
librium constant that is usually measured when units are
and the standard free energy of transfer is
molarity. But if entropy of mixing is to be eliminated, the
( )
equilibrium constant in units of corrected volume frac-
tion (Equation 5–11) should be used for the calculation of K p, A V A , H2O
the standard free energy of the reaction: DG ªA,H2O Æ j = lim – RT ln =
aA Æ 0 V A ,j
DG ª = – RT ln K eq (5–14)
either solute A or the molecules of water or solvent that dard free energy of transfer of solute A from the gas phase
surrounds it is affected in its behavior by the presence to a solution in solvent j, DG∞A,gÆj, becomes
of another molecule of solute A. This is the reason for
( )
the limit in Equation 5–18, which defines the standard
state of infinite dilution. In this way, the only contribu- f A,j V A,j
tions to the difference in standard free energy of solva- DG ªA,g Æ j = – RT ln exp 1 – =
[ A ]g V A,j Vj
tion are the specific interactions between the molecule
of solute A and the solvent j or the water.
The choice of units of concentration and standard
state is also critical in calculating the transfer of a
solute from the gas phase to a solution. The usual
– RT ln
[ A ]j
[ A ]g
exp 1 –
( V A,j
Vj )
choice of standard state for the solution in such a reac- (5–21)
tion is the solute at infinite dilution in the solvent so
that the solute is fully solvated and no interactions Again, the intention of Equation 5–21 is to apply the
occur among the molecules of solute. The usual choice appropriate corrections so that the standard free energy
of standard state for the gas is the real gas extrapolated of transfer is only the standard free energy of solvation
to zero pressure in order to eliminate the nonideal for solute A by solvent j.
behavior of the real gas represented by its virial coeffi-
cients. Because of the proportionality between molarity
and pressure, the practical units of concentration for a
gas are usually pressure, but the thermodynamic activ- Ionic Interactions
ity of the gas should be defined as its molarity.18 The possibility that a positively charged cation might
To avoid both the standard entropy of mixing and interact favorably with a negatively charged anion and
changes in volume at constant pressure during the trans- bring two molecules or two segments of the same
fer of solute from the gas phase to a solution, the volume polypeptide together has a lasting appeal. Such an asso-
occupied by a mole of the solute in the gas phase would ciation seems plausible because, as everyone knows,
have to be equal to the volume occupied by a mole of the unlike charges attract each other. When a positive ion in
solvated solute in the solution.18,26 Consider a large a solution encounters a negative ion and a complex
volume of solution at standard state in contact with a between these two ions is formed, it is referred to as an
large volume of the gaseous solute. Only when the pres- ion pair. In terms of Equation 5–1, a hydrated ion pair
sure of the gas is such that 1 mol of gaseous solute has the forms whenever a hydrated anion associates with a
same volume as the partial molar volume of the solute in hydrated cation. In this reaction, the various changes of
the solution will the volume of the system not change standard free energy identified in Equation 5–2 can be
when 1 mol of solute is transferred from the gas to the separately considered by writing the following thermo-
solution at constant pressure. Only under these circum- dynamic cycle:
stances is the transfer both isochoric and isobaric. As a
result, the standard entropy of mixing is 0, no work is per- DGªA +· B –
formed by the system, and the transfer occurs at constant A + (g) + B – (g) A +· B – (g)
pressure. To achieve this condition, the gas must be + + +
compressed mathematically to a volume equal to the x H2O y H2O z H 2O
partial molar volume of the solute in the solution. The
standard free energy change for the compression of the
DGªhyd(A + ) DGªhyd(B – ) DGªhyd(A +· B – )
gaseous solute to a volume equal to its partial molar
volume in the solution is
DH ∞ (kJ mol-1)
a A+ + a B– –600
where zA+ and zB- are the charge numbers of the ions, ea is
the elementary charge (1.602 ¥ 10–19 C), and aA+ and aB- –400
are the radii of the two ions. Values for the ionic radii in
crystalline lattices, based on crystallographic studies of
salts,28 are usually used for aA+ and aB-.7,29 The standard
enthalpy of formation defined by Equation 5–23 can be –200
presented for monovalent ions (zA+ = -zB- = 1) as a func-
tion of the sum of the two ionic radii (Figure 5–8).
When an ion is transferred from the gas phase to
water, there is a large release of heat.* This large negative 0.1 0.2 0.3 0.4
change in standard enthalpy is referred to as the stan- Ionic Radius (nm)
dard enthalpy of hydration, H ∞hyd(I). Measurements of
these standard enthalpies of hydration have been tabu- Figure 5–8: Electrostatic enthalpies and standard enthalpies of
hydration. The standard enthalpy change for bringing together a
lated7 for a number of monovalent, divalent, and triva- monovalent cation and a monovalent anion in a vacuum is pre-
lent spherical ions. The values for the spherical sented as a function of the sum of the two respective ionic radii
monovalent cations and anions can be presented as a (solid dark line), as calculated from Equation 5–23. The standard
function of their ionic radii (Figure 5–8). enthalpies of hydration7 for monovalent cations (Í) are presented
as a function of their ionic radii. The ions are, in order of increas-
The large negative standard enthalpies of hydra-
ing radius, Li+, Na+, K+, Rb+, and Cs+. The line connecting the points
tion for ions are commonly explained to be the result of is drawn by hand. The standard enthalpies of hydration7 for mono-
the ability of the fixed charge on an ion to gather around valent anions (3) are presented as a function of their ionic radii.
itself a layer of tightly held molecules of water that are ori- The ions are, in order of increasing ionic radius, F–, Cl–, Br–, and I–.
ented either with the positive ends of their dipoles, their The line connecting the points is drawn by hand. The standard
enthalpy change for the hydration of a monovalent ion of either
hydrogens, toward an anion or the negative ends of their
charge, based on the assumption that the standard enthalpy of
dipoles, their lone pairs, directed toward a cation (Figure hydration is due only to the difference in self-charging energies in
5–9). This explanation is probably incorrect. From meas- the vacuum and in water (Equation 5–25), is presented as a func-
urements of the standard enthalpy of formation for com- tion of ionic radius (dark dashed line). All enthalpies are presented
plexes between monovalent cations in the gas phase and in kilojoules mole–1 for a standard temperature of 25 ∞C.
1–7 molecules of water, it has been concluded30 that
about four molecules of water are sufficient to hydrate a H H
H O
cation such as NH4+, H3O+, H2COH+, Li+, or Na+. This result H O
H O H
H
suggests that the innermost shell of the layer of hydration H O
around an ion is not large. Furthermore, when the stan-
dard enthalpy changes for the formation of 1:1 complexes
(
O O H
)
HH
H
H H
between a molecule of water and various cations and O
HH O
anions in polar nonaqueous solvents were determined,
the values observed were quite small (0 > DH ∞ > –13 kJ Figure 5–9: Schematic drawing of molecules of water with the
mol–1).31 These two results suggest that the large standard negative ends of their dipoles directed toward a cation and the pos-
itive ends of their dipoles directed toward an anion.
enthalpies of hydration observed for ions arise far more
from the influence exerted by the ion over a significant
region of the water surrounding it than from the specific,
intimate noncovalent contacts between the ion and its energy required to charge a sphere of a given radius a in
immediate neighbors. a medium of relative permittivity er.32 The self-charging
In electrostatics the self-charging energy is the energy, Esc, for placing the charge zj ea on an ion j of radius
aj would be
* This large release of heat when any ion is transferred from the gas
phase to water should not be confused with the small releases or z j 2 e a2
absorptions of heat that occur when the ions in the solid crystals of E sc = NA (5–24)
2a j e r
a salt are dissolved in water.
Tonic interactions 201
The standard enthalpy change DH ∞sc associated with the the two ions separately and the standard enthalpy of
electrostatic energy required to move an ion from the bringing them to within a certain distance of each other:
vacuum (er = 1) to water (er = 78) at 25 ∞C would be the dif-
ference in the two self-charging energies:
DH ªhyd(A+· B– ) =
e a2
( 1
+
1
–
2
)( 1
)
– 1 NA
( )
2 a A+ a B– d A+·B– e r , H2O
z j e a2
2
1
DH ªsc = – 1 NA (5–25)
2a j e r , H2O (5–26)
( )
water, however, is actually a result of the cooperative
e a2 1
behavior of the molecules of water in the liquid over a DH ªIP @ NA (5–29)
significant volume. Because the high relative permittivity e r , H2O a A+ + a B–
of water arises from the correlation of the individual
dipoles of the molecules of water, the necessity to rely on
that relative permittivity to explain the large enthalpy of For aA+ ≥ 0.1 nm and aB- ≥ 0.1 nm, –9 kJ mol–1 < DH∞IP <
hydration means that an ion influences the structure of 0 kJ mol–1.
the water over a significant distance, not just in its imme- These changes in standard enthalpy do demon-
diate vicinity. strate quite clearly why an ion pair sequestered in the
The standard enthalpy of hydration for a monova- middle of a folded polypeptide would be unstable rela-
lent ion pair, DH ∞hyd(A+·B–), can be estimated from electro- tive to the separated hydrated ions in solution. The only
static theory just as standard enthalpies of hydration reason an ion pair is almost stable in aqueous solution is
were estimated. It should be equal to the difference that there is considerable standard enthalpy of hydration
between the sum of the standard enthalpies of charging for the ion pair itself, DH ∞hyd(A+·B–). In the center of a pro-
202 Noncovalent Forces
tein, this standard enthalpy of hydration would not be face by at least 1 nm indicates that each ion orders the
exerted and the ion pair would be much less stable. There waters around it over a significant distance.*
will never be sufficient electrostatic energy in the ion pair In the case of molecules of protein, the ion pairs
alone to overcome the large negative standard enthalpies that have received the most attention are those that
of hydration that are lost when the separated ions are would form between the carboxylate ion of an aspartate
removed from water during the folding of the protein. or a glutamate and the ammonium ion of a lysine or the
This fact can be verified by examining Figure 5–8. The guanidinium ion of an arginine. The association constant
total standard enthalpy of hydration lost is the sum of in water38 for the ion pair between an acetate ion and an
the two values for the individual enthalpies of hydration. ammonium ion is around 0.5 M–1, and that for the ion
The standard enthalpy of association gained is that pair between an acetate ion and a guanidinium ion is
for the sum of the two ionic radii. The former is always of somewhat less than 0.5 M–1. Consequently, the concen-
a greater magnitude than the latter. tration of either an ammonium or a guanidinium cation
The standard entropies of hydration,35 in marked would have to be greater than 2 M for half of the acetate
contrast to the standard enthalpies of hydration, are anion in the solution to be complexed with it. These weak
small. Values of the entropies of hydration for a series of interactions have free energies of formation of around
small monovalent ions of either charge lie between –67 –3 kJ mol–1 when the concentrations are expressed in
and +21 J K–1 mol–1 when the two standard states are units of corrected volume fraction. They are probably the
chosen as the molten salt at one mole fraction in the ion result of hydrogen bonding between the anion and
and the ideal solution at one mole fraction in the ion.35 At cation rather than ionic interactions, because the com-
298 K, these standard entropies of hydration would plex is stronger for ammonium than guanidinium and
cause the standard free energies of hydration to differ there is no evidence that small monovalent cations and
from enthalpies of hydration by less than 4%, certainly anions that lack donors and acceptors of hydrogen
less than the error in the estimation of enthalpies of bonds associate to form ion pairs in water.
hydration from experimental data.7 Several additional observations demonstrate that
The small standard entropies of hydration seem at ion pairs between an ammonium ion and a carboxylate
first glance to be inconsistent with the formation of a ion are unstable relative to the separated ions. The
region of oriented water around an ion, which is the dielectric increment is the change in the relative permit-
explanation given for the large standard enthalpies of tivity of a solution with the concentration of an added
hydration. The apparent inconsistency is usually solute. The dielectric increments for a series of
explained by noting that the region of oriented molecules zwitterionic amino acids containing an ammonium and
of water surrounding either an anion or a cation cannot a carboxylate, namely, glycine, 3-aminopropionate,
merge flawlessly with the hydrogen-bonded lattice of the 4-aminobutyrate, 5-aminopentanoate, and 6-amino-
bulk water. Therefore, there must be an outer spherical hexanoate, have been measured. The values display a
shell of disorder between the inner sphere of order and monotonic increase with the number of methylenes
the order of the lattice beyond the influence of the ion. between the positively charged ammonium ion and the
The negative standard entropy change of forming the negatively charged carboxylate ion. The values observed
sphere of oriented water should be canceled by the pos- are in agreement with theoretical calculations of their
itive standard entropy change of forming this outer shell magnitude from a simple model in which the distance
of the disordered transition.7 between the elementary positive charge and the elemen-
Explanations of the large enthalpies of hydration and tary negative charge is determined only by random,
the small entropies of hydration both predict that an ion unbiased rotation around the carbon–carbon bonds
will affect the structure of the water well beyond the few connecting them.24 Were the formation of an ion pair
molecules in its immediate vicinity. Direct evidence for between an ammonium cation and a carboxylate anion a
such an extended region of oriented water comes from favorable interaction in aqueous solution, this regularity
measurements of the repulsion of hydration.36,37 When could not have occurred. In glycine, an intramolecular
two identical surfaces that have dense arrays of both neg- ion pair cannot form. In 3-aminopropionate and
ative and positive ions spread over them—both, however, 4-aminobutanoate, excellent intramolecular ion pairs,
in exactly equal concentration so that each of the two sur- forming rings five and six atoms in size, should form even
faces is electrostatically neutral—are brought together in more readily than a similar intermolecular ion pair. If
water, a repulsive force between the surfaces is evident. these intramolecular ion pairs were able to form, how-
This repulsive force becomes significant when the two
surfaces come within about 2 nm of each other and
increases in magnitude exponentially as the distance is * The distance of this repulsive force (1 nm) is about the distance
decreased. It has been proposed that this repulsive force (0.7 nm) calculated for the decrease in the electrostatic field
around a univalent ion in water to a potential energy equal to kT. If
is the resistance of the layers of hydration around the ions an ion significantly influences the water around it to a radius of
on each surface to their interpenetration. That this repul- about 1 nm, this region of its influence would contain about 100
sion of the layers of hydration extends out from each sur- molecules of water.
Tonic interactions 203
ever, the dielectric increments of these two amino acids The fact that, at low concentrations, the activity coeffi-
should both be less than that of glycine, yet no anomaly cients of ions are near 1 means that as long as they are
is observed in their dielectric increments compared to far enough apart their activities increase in proportion
the other compounds within the complete series. The to their concentration as expected for any solute. As
behavior of the interaction volumes for the same series of their concentrations become high enough that each ion
amino acids also shows no evidence of peculiarities that begins to experience the presence of the others, the
would result from intramolecular ion pairing.39 presence of the others decreases the tendency of that
There is no steric hindrance to the formation of an ion to leave the solution. This decreased tendency arises
ion pair between the ammonium ion of the lysine in from the departure of all of the ions in the solution from
the peptide Na-acetyl-WLKLL and its carboxy terminus, a random distribution in such a way that a region
and such an intermolecular ion pair forms readily when enriched in counterions forms around each individual
the peptide is dissolved in octanol. When the peptide is ion as expected of an ionic double layer (Equation
dissolved in water, however, no ion pair can be 1–71).42 These enriched counterionic layers around each
detected.40 dissolved ion make each of them more stable in the
Although ion pairs do not have net favorable stan- aqueous solution than it would be if it were an ideal
dard free energies of formation in aqueous solution and solute, and this is what causes the decrease in its activity
do not contribute to the stability of a folded polypep- coefficient.
tide, the electrostatic repulsion of amino acids of like This effect of ionic strength makes the formation
charge can destabilize a particular structure. This dis- of an ion pair even less likely than it would be in the
tinction is illustrated by the effect of ionic strength on absence of added salt, because its formation would
the stability of coiled coils of a helices.41 A coiled coil of involve the diminishment of a considerable fraction of
a helices is a stable structure that forms when two the counterionic layer around each separated ion. The
a helices coil around each other in a supercoil. This presence of these ionic layers also makes it more diffi-
supercoil can stabilize the two a helices sufficiently that cult to remove an ionic functional group from a solu-
they can form in water. Few isolated a helices are stable tion of moderate ionic strength than it would be to
in aqueous solution, but a coiled coil is one way to cir- remove it from pure water. The activity coefficients for
cumvent this problem. A series of peptides designed to most ionic solutes are between 0.2 and 0.8 at the ionic
form coiled coils were synthesized chemically. One of strengths encountered in biochemical situations, and
the peptides (naa = 30) had glutamates at the positions these values should lead to decreases in the standard
flanking its hydrophobic core; the other (naa = 30) had free energies of hydration between –4 and –0.4 kJ mol–1,
arginines flanking the core. The stability of the het- respectively.42
erodimeric coiled coil formed from a positively charged Although ion pairs between simple monovalent
peptide and a negatively charged peptide was not cations and anions have positive standard free energies
affected by changing the ionic strength, and this result of formation, there are two situations in which ion pairs
indicated that electrostatic interactions such as ion pair- are favorable. Ion pairs involving divalent metal ions
ing were not contributing to the standard free energy of often have negative standard free energies of formation.
formation of that coiled coil. The stabilities of homo- For example, significant concentrations of the ion pairs
dimers formed from either two of the positively charged Ca2+·SO42– and Mg2+·SO42– are present in aqueous solu-
peptides or two of the negatively charged peptides, how- tions of the respective salts, and ion pairs between diva-
ever, decreased significantly as the ionic strength of the lent cations such as Ba2+, Ca2+, and Mg2+ and hydroxide
solution was lowered, and this result demonstrates that ion in aqueous solution show appreciable stabilities.29
these complexes were destabilized by charge repulsion. There are, however, no divalent side chains among the 20
It is this destabilization of the two homodimers, not the natural amino acids. Phosphorylated amino acids, such
formation of ion pairs, that accounts for the fact that the as serine phosphate (2–30), are divalent at high pH and
heterodimer forms preferentially.41 can readily form ion pairs with divalent cations such as
Proteins are generally dissolved in aqueous solu- Ca2+.
tions the ionic strengths of which are between 0.1 and The other situation in which ion pairs become
0.3 M, and the effect of such ionic strengths on ionic favorable is encountered when chelation can occur.
interactions, although small, should be noted. The activ- Chelation is the binding of an ion to a molecule, the
ity coefficients of electrolytes decline sharply from a chelating agent. The chelating agent contains two or
value of 1 at low concentrations to minimum values, more functional groups of opposite charge to the bound
depending on the salt, of from 0.05 to 0.8 at concentra- ion that can simultaneously associate with it, or it con-
tions of about 0.3 M.42 When activity coefficients of tains two or more dipoles that simultaneously can be
solutes are less than 1, it means that the solute is behav- favorably directed toward the bound ion, or it contains
ing with a chemical potential less than it would have if it some combination of such charges and dipoles. The
were an ideal solute at the same concentration, and its paradigm of chelating agents is N,N,N ¢,N ¢-tetra-
tendency to leave the solution is less than it should be. carboxymethyl-1,2-diaminoethane:
204 Noncovalent Forces
-
OOC the magnitude of the charges zj ea and the distance, r, that
- separates them: m = zj ear. In each structure you have
N COO
- made, zj ea is the same but r changes.
OOC N
COO- (C) Examine the structures you have drawn and rank
5–1 them in order of increasing dipole moment.
Indicate ranking with the symbols < and =.
which can wrap its nitrogens and carboxylates around a The observed dipole moments for these molecules dis-
divalent or trivalent metal ion and form an ion pair of solved in water at pH 7.0 are43
high stability. It has already been mentioned that the
binding of monovalent cations and anions by proteins n 1 2 3 4 5 6
is thought to involve particular binding sites that have m 12 D 15 D 18 D 20 D 22 D 24 D
advantageous dispositions of functional groups, often
with charge opposite to the charge on the bound ion. (D) Explain why the actual dipole moments for these
Chelation, however, assumes a preexisting arrangement molecules fail to agree with the theoretical pre-
of two or more charged groups or dipoles that create a dictions that you made in part C.
pocket within which an ion can be held, and this
arrangement does not exist in an unfolded polypeptide
or with isolated anions and cations in solution. The Hydrogen Bond
Chelation could be important in forming an interface
between two already folded polypeptides or binding a A hydrogen bond is a noncovalent force that arises
charged substrate to an already folded enzyme. between an acid, known as the donor, A–H, and a base,
known as the acceptor, ”B. The atoms A and B in the case
Suggested Reading of proteins are the heteroatoms oxygen, nitrogen, and
sulfur. A hydrogen bond is an intermediate on the trajec-
Parsegian, A. (1969) Energy of an ion crossing a low dielectric mem- tory of an acid–base reaction:44,45
brane: solutions to four relevant electrostatic problems, Nature
221, 844–846.
R2 R2
” ”O: H
”:
1 1
”
OH + N N
O
O
O HN O + HN
O
O
gas. The water dimer (Figure 5–1) is an example of such a carbon or an oxygen;46 and deuterons are prominent
situation; its existence lowers the pressure of water features in maps of neutron scattering density, as
vapor. Abnormally positive values for the standard opposed to hydrogens in maps of electron density.
enthalpy of vaporization or abnormally negative values Furthermore, a proton has a negative scattering ampli-
for the standard enthalpy of mixing can often be tude for neutrons while a deuteron has a positive scat-
explained as the result of either the breaking of hydrogen tering amplitude. This causes difference maps of neutron
bonds as the molecules depart the liquid or the forma- scattering density for deuteronated against protonated
tion of hydrogen bonds as a donor and acceptor are molecules to display sharp maxima where the protons
mixed, respectively. When an acceptor is added to a solu- are located in the former.
tion of a donor, the infrared spectrum of the resulting In crystallographic molecular models of small mol-
mixture often displays a new absorption band, at a lower ecules known to be hydrogen-bonded, the bond is rec-
frequency than the absorption of the A–H stretching ognized as an enforced orientation of the donor and
vibration observed with the solution of the donor alone. acceptor (Figure 5–10).47 Associated with this orientation
This new absorption increases in magnitude in propor- are certain bond lengths and bond angles.46 It is these
tion to the amount of acceptor added, while the ampli- bond lengths and bond angles that are the most impor-
tude of the absorption of the unshifted stretching tant property of a hydrogen bond as far as the structures
vibration of the A–H bond of the donor decreases in pro- of proteins are concerned. The hydrogen bond provides
portion. The new stretching vibration is assigned to that no net standard free energy to the process of folding a
of the A–H covalent bond within a hydrogen bond polypeptide, but hydrogen bonds are responsible for
between the donor and the added acceptor. A similar aligning atoms and holding them at precise distances
observation is made in nuclear magnetic resonance and constrained angles to each other in the folded struc-
spectra of mixtures of donors and acceptors. In this case, ture. The A–H s bond of the donor in a hydrogen bond is
two separate absorptions are not observed because the pointed at the heteroatom B of the acceptor. The dis-
rates at which the hydrogen bonds are interchanging tance, d, between A and B is always less than it would be
among the molecules in the solution are faster than the if the proton on the donor atom and the atom acting as
time resolution of the method, but the chemical shift of the acceptor were simply in van der Waals contact. For
the proton participating in the A–H bond moves down- example, in a hydrogen bond of the type O–H9N
field until it reaches a maximum value, associated with (Equation 5–30), the distance between oxygen and nitro-
the chemical shift of the proton within the hydrogen gen is 0.28 ± 0.01 nm,46 while the distance between
bond. carbon and nitrogen in a van der Waals contact of the
Taken together, these commonly encountered type C–H9N would be 0.35 nm. It is this shortened dis-
observations demonstrate three features of a hydrogen tance between donor and acceptor that reflects the
bond. First, a hydrogen bond causes two molecules to bonding. The bond lengths, d, for most types of sterically
associate with each other and form a complex that pre- unconstrained hydrogen bonds (Table 5–1) between
vents them from changing their relative positions as neutral donors and neutral acceptors lie between 0.25
readily as they would otherwise; in other words, it corre- and 0.30 nm, but the bond angles are more variable.
lates their movements. Second, there is a release of heat In general, the angle a between the axis of the
associated with the formation of this complex. Third, the hydrogen bond and one of the s covalent bonds to
proton in the A–H covalent bond of the donor experi- the heteroatom of the donor, A (Figure 5–10A), will reflect
ences a change in its environment during the formation
of this complex. The results of both infrared and nuclear
magnetic resonance spectroscopy are consistent with a Table 5–1: Length of Hydrogen Bonds a
lengthening of the covalent bond between A and H con-
comitant with a movement of the proton away from the A-H9B compounds average bond lengthb (nm)
electrons of the s bond. c
OH9O carboxylic acids 0.26 ± 0.01c
The arrangement of the atoms in crystallographic OH9O phenols 0.27 ± 0.01
molecular models of small molecules that display these OH9O alcohols 0.27 ± 0.01
physical manifestations of hydrogen bonding usually OH9N all O–H 0.28 ± 0.01
displays a pattern that can be assigned to the hydrogen NH9O ammoniums 0.29 ± 0.01
bond itself. The positions in the unit cell of the atoms of NH9O amides 0.29 ± 0.01
NH9O amines 0.30 ± 0.01
the second and third periods of the periodic table, for NH9N all N–H 0.31 ± 0.01
example, carbon, nitrogen, oxygen, and sulfur, are deter-
mined by X-ray crystallography, and the positions of the a
The values in this table are reproduced directly from tables in ref 46. With the
protons, often as deuterons, are determined most exception of the hydrogen bonds involving ammonium cations, these are hydro-
gen bonds between a neutral donor and a neutral acceptor. b These are the dis-
reliably by neutron diffraction. Whereas a proton has tances between the heteroatoms, nitrogens or oxygens. c These standard
little ability to scatter X-rays, inasmuch as it has no core deviations may be standard deviations of actual lengths or standard deviations of
the measurement or both.
electrons, a deuteron scatters neutrons as readily as a
206 Noncovalent Forces
O
bonds to the heteroatom of the acceptor, B, although
much more flexible than angle a, will tend to reflect the
d
hybridization of the lone pair of electrons on atom B.
The type of hydrogen bond that accounts for the
majority of those in biological macromolecules, both
proteins and nucleic acids, is the hydrogen bond between
X the sp2 lone pair on an acyl oxygen as an acceptor and the
b nitrogen–hydrogen bond of an acyl derivative such as an
C Y X = N, O, C
B amide or an amidine as a donor (Figure 5–10B). From the
NH O
O
X
between the nitrogen and oxygen of the hydrogen bond
b defines a line, two angles define the hydrogen bond:
angle b, the angle in the plane between the projection
upon the plane of the line and the carbon–oxygen double
bond (Figure 5–10B); and angle c, the angle that the line
of centers between the nitrogen and oxygen makes with
D NH O C
O the projection of that line of centers on the plane of the
acyl group (Figure 5–10C). Angle c determines how far
O
50
There has been some disagreement over the ability
of sulfur to participate as an acceptor in a hydrogen
25 bond because of the poor overlap between its atomic
orbitals and those of nitrogen or oxygen. In a survey of
crystallographic structures for a number of compounds
in which nitrogen donors and sulfur acceptors both
1 2/3 1/3 0 90 120 150 180 appear,52 juxtapositions were frequently observed and
sin c Angle b these were consistent with hydrogen bonds of the type
Figure 5–11: Distribution of values for the angle b (Figure 5–10B) NH9S. The most telling observation in favor of the exis-
and the sine of angle c (Figure 5–10C) over a population of hydro- tence of such hydrogen bonds was the fact that the nitro-
gen bonds between nitrogen–hydrogen donors and carbonyl gen–sulfur distances (0.33–0.35 nm) were shorter than
oxygen acceptors or acyl oxygen acceptors observed in crystallo-
graphic molecular models of small molecules.47 (A) Hydrogen
the distance expected from purely van der Waals contact.
bonds involving a carbonyl oxygen or acyl oxygen in which the The pairs of electrons in p bonds in a simple olefin
oxygen atom accepts no other hydrogen bonds. (B) Hydrogen or a phenyl ring are less basic than the s lone pairs on the
bonds involving a carbonyl oxygen or acyl oxygen in which the acyl oxygen in a secondary amide (pKa = –0.5)53 or the
oxygen atom accepts one other hydrogen bond. In each of the four oxygen of a molecule of water (pKa = –1.7). The values of
panels, the number of bonds falling within a range of values is plot-
ted as the value of the ordinate. In the two left panels, the values on
pKa for the conjugate acids of ethene and propene are
the abscissa defining the ranges are values of the sine of angle c –24.3 and –19.3, respectively.54 The values of pKa for the
(sin c). In the two right panels, the values on the abscissa defining conjugate acids in which a carbon in the ring is proto-
the ranges are values of the angle b in degrees. Adapted with per- nated are –24.3 for benzene, –16.3 for benzofuran, –10 for
mission from ref 47. Copyright 1983 American Chemical Society. 3-hydroxy-5-methyltoluene, –7.8 for 3-hydroxyphenol,
–5.8 for 3,5-dihydroxytoluene, and –3.1 for 3,5-dihy-
All of the observations presented in Figure 5–11 are droxyphenol.54,55 From these values, the values of pKa for
for acyl oxygens that are not in carboxylates. In the case the conjugate acids of a phenylalanyl side chain and a
of the oxygens in carboxylates, such as those on the side tyrosyl side chain, in which a carbon in the ring is proto-
chains of aspartate and glutamate, the tendency for the nated, should be around –20 and –13. The phenyl ring of
nitrogen to reside in the plane of the carboxylate is less- a tryptophanyl side chain should have a pKa somewhat
ened and the tendency for angle b to assume 120 ∞ is greater than that of a tyrosyl side chain. The differences
increased.47 Although it has been proposed that a syn pair between the values of pKa for donor and acceptor in the
of electrons on a carboxylate (Equation 2–12) should be hydrogen bonds between two water molecules or
more basic than an anti pair, no preference is shown for between two molecules of N-methylacetamide, however,
one over the other in forming hydrogen bonds in crystal- are already large, around 17–18 units, so it would not be
lographic molecular models of small molecules49 or pro- surprising if the differences in pKa required to use one of
teins.50 these aromatic side chains as an acceptor, although even
The shift in frequency of the infrared absorption for larger, would still permit the formation of a hydrogen
an oxygen–hydrogen stretching vibration has been used bond.
to examine the effect of stereochemistry on the strength There are indications that hydrogen bonds can
208 Noncovalent Forces
form between the p clouds of aromatic rings as accep- In the separated donor and acceptor, these two wells of
tors and biologically relevant donors.56 In the complex potential energy are also present and are the wells of
between water and benzene in the gas phase, the water potential energy associated with protonating the accep-
sits upon the p cloud with the positive end of its dipole tor or protonating the conjugate base of the donor. As the
oriented towards the ring and its two hydrogens lie donor and acceptor approach each other, these wells of
0.1 nm closer to the plane of the ring than van der Waals potential energy overlap. The point of their intersection
contact should allow.57 All of these features suggest that (Figure 5–12) is the height of the barrier of potential
a hydrogen bond has been formed. There are crystallo- energy that must be crossed if the proton is to be trans-
graphic studies of other complexes that also suggest that ferred from donor to acceptor (Equation 5–30). The more
hydrogen bonds between a hydroxyl group and the closely the heteroatom of the donor and the heteroatom
p electrons of an aromatic ring do form,58,59 and theoret- of the acceptor approach each other, the lower will be
ical calculations suggest that a hydrogen bond between this barrier.
an amido nitrogen–hydrogen and a phenyl ring could be The difference in the zero-point energies between
as much as half as strong as a normal hydrogen bond the well for the donor and the well for an oxygen–hydro-
between an amido nitrogen–hydrogen and a s lone pair gen bond in the hydronium ion is the standard enthalpy
of electrons.60 change associated with the pKa of the donor; the differ-
Associated with any hydrogen bond are two wells of ence between the zero-point energies of the well for the
potential energy (Figure 5–12), the well of potential conjugate acid of the acceptor and the well for the hydro-
energy for a proton within the lone pair of the donor and nium ion is the standard enthalpy change associated
the well of potential energy for a proton within the lone with the pKa of the acceptor; and the difference in zero-
pair of the acceptor. As there is only one proton between point energies of the well for the donor and the well for
the donor and the acceptor, only one of the two wells is the acceptor is the standard enthalpy change associated
occupied at any given instant. When the proton is with the difference in pKa (DpKa) between them. If the
located in a particular well, it is participating in a cova- difference in pKa between donor and acceptor is small,
lent bond with the heteroatom to which the well belongs. the two wells of potential energy will have about the
When in that covalent bond the proton cannot have an same minimum; or better yet, if the acceptor is the con-
energy less than that of the lowest or first vibrational jugate base of the donor, the two wells of potential
energy level, and it is usually occupying that level. energy will be mirror images of each other. In such situ-
Because energy is quantized, the energy of the first vibra- ations, as the heteroatoms are brought closer together,
tional level is above the bottom of the well of potential the barrier between them decreases rapidly until it is
energy. The energy of the first vibrational level is the equal to or less than the zero-point energy (Figure 5–12).
zero-point energy of the bond to which the well applies. When this occurs, the two wells become continuous, as
far as the proton is concerned, even though there are still
two minima of potential energy. Hydrogen bonds in
which the distance between the heteroatoms of donor
and acceptor approaches but does not necessarily reach
this point at which the barrier vanishes are low-barrier
Æ hydrogen bonds.
In a hydrogen bond in which the distance between
Æ
Zero point Æ donor and acceptor has become short enough that the
barrier has vanished, the two wells have become one and
´ the proton necessarily occupies a position midway
dHH between the two heteroatoms.61 A number of such
H
hydrogen bonds have been observed by neutron diffrac-
Figure 5–12: Overlap of wells of potential energy for the covalent
bonds between the proton and the heteroatom of the donor (left tion in the crystalline state. When the same hydrogen
panel) and between the proton and the heteroatom of the acceptor bond in which the proton is found to be centered in the
(right panel) in a hydrogen bond. In a case where the values of pKa crystalline state is formed in solution, however, the
for donor and acceptor are matched, the zero-point energies (thin proton is usually not centered62,63 because solvation of
horizontal lines) are the same. (Left panel) If the distance between the bond is more favorable for the situation in which the
donor and acceptor is long, the intersection of the two wells of
potential energy is above the zero-point energy and there is a bar- proton is closer to one of the heteroatoms than to the
rier to transfer of the proton between the wells (arrow pointing to other.64 It is as if solvation has recreated the barrier,
the left). The proton divides its time between the wells and the two probably by biasing the relative energies of the occupied
mean positions it assumes are separated by dHH, the distance and unoccupied wells at a given instant even if they are
between the bottoms of the two wells. (Right panel) If the distance identical when unoccupied. When the proton is trans-
between donor and acceptor is short, the intersection between the
two wells is below the zero-point energy and the barrier to transfer ferred to the other heteroatom, the change in solvation
between the wells (arrow pointing to the left) is no longer effective. causes the levels of the wells to switch. Because water
The proton (H) is found halfway between donor and acceptor. strongly solvates dipoles, a barrierless hydrogen bond in
The Hydrogen Bond 209
which the proton is centered between the heteroatoms of the same type, as the hydrogen bond becomes shorter
and in which the distinction between donor and accep- and the proton moves farther away from the heteroatom
tor has disappeared rarely if ever exists in water.63 of the donor and becomes even more deshielded, its
The length of the covalent bond A–H between the chemical shift becomes even larger.68 An absorbance in
proton and the heteroatom of the donor is longer when nuclear magnetic resonance spectroscopy between 16
it is in a hydrogen bond than when it is not. In a series of and 24 ppm for a proton demonstrates that it is in a low-
hydrogen bonds of the same type (Figure 5–13),65 regard- barrier hydrogen bond. For example, chemical shifts of
less of whether they are intermolecular or intramolecular 20.5 ppm for the proton between the two oxygens in
examples of the class,65 as the distance dAB between the hydrogen maleate monoanion (5–2),
two heteroatoms in a hydrogen bond decreases, the
length of the bond between the proton and the het- OO
O
eroatom of the donor increases66,67 from its length when H 3C CH 3
O
OO H 3C N H ON CH 3
O
it is not hydrogen-bonded (horizontal dashed lines) until
O
O H
the bond becomes so short that the proton sits halfway H :O (
:
O
between donor and acceptor (Figure 5–12). There are
O ) H 3C CH 3
several physical measurements that register this increase O
:
in the length of the bond between the proton and the het- O
O
eroatom of the donor as the bond shortens. O
The movement of the proton away from the het- 5–2 5–3 5–4
eroatom of the donor that occurs as the hydrogen bond
is formed decreases the electron density of the covalent of 16.1 ppm for the proton between the two oxygens of
bond surrounding that proton, and this deshielding the enol of 2,4-dioxopentane (5–3), and 18.5 ppm for
shifts its peak of absorption downfield in a nuclear mag- the proton between the two nitrogens in hydrogen
netic resonance spectrum. In a series of hydrogen bonds 1,8-diamino-N,N,N ¢,N ¢-tetramethylnaphthalene mono-
cation (5–4), each measured in organic solvents,69 indi-
cate that these are low-barrier hydrogen bonds, as do
their lengths (0.241, 0.243–0.251, and 0.258 nm, respec-
0.12
tively).70–75
When the donor enters into a hydrogen bond and
d(O–H) (nm)
its A–H bond becomes longer, the force constant for its
0.11 stretching vibration becomes smaller and the frequency
at which it absorbs infrared light becomes lower than
when it is not in a hydrogen bond. Consequently, the
0.10 peak of absorption for the A–H bond of the donor when
it is in a hydrogen bond appears in the infrared spec-
trum at a lower frequency than the peak of absorption
for the free A–H bond of the donor. The existence of these
0.09 two distinct peaks of absorption allows the concentra-
0.12 0.14 0.16 0.18 0.20
tions of bonded and unbonded donor to be quantified
d(H”O) (nm) (Problem 5–7). In a series of hydrogen bonds of the same
Figure 5–13: Length of the bond between the proton and the type, as the hydrogen bond becomes shorter and the A–H
oxygen atom of a donor [d(O–H)] in a hydrogen bond between two bond becomes longer, the stretching frequency of the
oxygens as a function of the distance between the oxygen atom of A–H bond decreases.
the acceptor and the proton [d(H9O)].65 Crystallographic molecu- The fractionation factor f is the equilibrium con-
lar models of complexes containing either intermolecular or
stant defined by
intramolecular hydrogen bonds between two oxygens were
retrieved from the Cambridge Crystallographic Database. The
types of complexes collected were carboxylic acid–carboxy- [AD9B][L2O9HOL]
lates (3), metal oximes (Í), inorganic acid salts (¥), hydronium f = (5–31)
hydroxyls (+), b-diketone enols (ˆ), carboxylic–carboxylics (fi), [AH9B][L2O9DOL]
alcohols (‡), and ice Ih (䉭). The dashed diagonal line in the upper
left-hand corner is drawn for d(O–H) = d(H9O). In the shortest
hydrogen bonds, d(O–H) does equal d(H9O), the proton sits where H is protium, D is deuterium, and L is either pro-
halfway between donor and acceptor, and donor and acceptor are tium or deuterium. AD9B is the hydrogen bond between
indistinguishable. As the hydrogen bond increases beyond a length the deuterated donor and the acceptor, L2O9HOL is a
of about 0.24 nm, the proton is closer to the more basic oxygen, and hydrogen bond between two molecules of water in which
donor and acceptor become distinguishable. The horizontal
dashed lines indicate the range of the values for the length of an
a proton is within the hydrogen bond, AH9B is the
oxygen–hydrogen bond in an isolated, non-hydrogen-bonded hydrogen bond between undeuterated donor and accep-
molecule in the gas phase. tor, and L2O9DOL is a hydrogen bond between two mol-
210 Noncovalent Forces
ecules of water in which a deuteron is within the hydro- Ordinarily, the values of these thermodynamic
gen bond. A fractionation factor of less than 1 indicates properties are obtained systematically.46 A method of
that a proton has a greater preference than a deuteron for measurement, such as infrared spectroscopy, is used to
sitting in the hydrogen bond being examined, relative to provide values for the molar concentration of free donor,
the preferences of proton and deuteron for sitting in a [HA], the molar concentration of free acceptor, [B9], and
hydrogen bond between two molecules of water. The the molar concentration of hydrogen bonds, [B9HA], in
fractionation factor scales the relative preferences of a solution. The total concentrations of donor and accep-
proton and deuteron for any hydrogen bond to their rel- tor are systematically varied at a given temperature, and
ative preferences for the reference hydrogen bond the experimental association equilibrium constants are
between two water molecules, much as the acid dissoci- measured for each set of concentrations:
ation constant scales the basicity of any lone pair of elec-
trons to the basicity of a lone pair of electrons on a [ B9HA ]
molecule of water. The fractionation factor is measured K AHB = (5–32)
[ B9 ][ HA ]
by following the concentration of the protonated form of
the hydrogen bond of interest as the mole fraction of H2O
is varied in mixtures of H2O and D2O.76 From the association equilibrium constant at a particu-
In a series of hydrogen bonds of the same type, as lar temperature converted into the proper units
the distance between the heteroatoms of donor and (Equation 5–13), the standard free energy of formation of
acceptor decreases, so does the fractionation factor.77 the hydrogen bond can be calculated (Equation 5–14).
This decrease states that as the hydrogen bond becomes The variation of the equilibrium constant with tempera-
shorter, the proton has a greater and greater preference ture is determined experimentally, and from these obser-
for its occupation relative to that of a deuteron. A value of vations, the standard enthalpy of formation of the
less than 1 indicates that the hydrogen bond is a short, hydrogen bond can be calculated:
low-barrier hydrogen bond. The fractionation factors for
( )
the hydrogen bonds in hydrogen maleate monoanion ! ln K DH ªAHB
= – (5–33)
(5–2) and hydrogen 1,8-diamino-N,N,N ¢,N ¢-tetramethyl- !T –1 R
P
naphthalene monocation (5–4) are 0.84 and 0.90 in
water.78,79 The fractionation factor for aqueous FHF–,
which contains one of the shortest hydrogen bonds, is Finally, the standard entropy of formation is calculated
0.60.80 Fractionation factors for hydrogen bonds in from the experimental results by the relationship
organic solvents, however, can be as small as 0.4.81
The chemical shift, the stretching frequency, and DH ª – DG ª (5–34)
DS ª =
the fractionation factor all monitor the length of the T
hydrogen bond. They do not, however, provide any indi-
cation of its strength. The standard enthalpies of formation for hydrogen
The strength of a hydrogen bond is expressed in bonds between uncharged donors and acceptors of bio-
thermodynamic parameters. The standard enthalpy of logical interest (Table 5–2) lie between –12 and –23 kJ (mol
formation, or the heat released when the bond forms, is of bond)–1 when the donor and acceptor are dissolved in
a measure of the electronic strength of the bond. It is organic solvents such as CCl4 or benzene. In spite of these
usually the property that is referred to when the strength favorable standard enthalpies of formation, the equilib-
of the bond is discussed indiscriminately. The standard rium constants for the formation of the complexes in a
free energy of formation determines the degree to which similar set of hydrogen bonds, disregarding those that
the hydrogen bond will be favored over the unbonded involve two hydrogen bonds, are quite small when
reactants. Its magnitude is complicated by the fact that it expressed in units of molarity–1 (Table 5–3). When
is a function of both the standard enthalpy of formation, expressed in units of corrected volume fraction (Equations
the electronic term, and the standard entropy of forma- 5–5 and 5–13) to eliminate entropy of mixing, the values
tion, the quantitative measure of the total change in dis- are somewhat larger. The small magnitude of these values
order occurring during the reaction. The standard results from the fact that the negative standard enthalpy
entropy of formation is usually a negative term because of formation is canceled to a considerable degree by a neg-
order increases when hydrogen bonds are formed. It is ative standard entropy of formation because even though
also affected significantly by changes in solvation. In the correction for volume fraction takes care of the entropy
addition, the standard entropy of formation depends on change involved in their finding each other, the two mol-
the choice of units for concentration because of the ecules still must reach the proper relative orientations so
entropy of mixing. The association equilibrium constant, that the bond can form. Even in the best of circumstances,
which is usually the quantity that is directly measured, is a hydrogen bond is a weak interaction.
connected directly to the standard free energy of forma- The standard enthalpy of formation of a hydrogen
tion, not to the standard enthalpy of formation. bond is a function of the difference in pKa between the
The Hydrogen Bond 211
O HO
O
O HO CH3
O
OH
H3CH2C OO C2H5 CCl4 25 0.64 8.1
H
H
O HO CCl4 21 2.3 19
O
CCl4 20 55 480
N HO
O
H3C
OC2H5
CCl4 25 1.7 17
OH O
O
H3CH2C
O
CH3
O HN benzene 25 6.2 60
O
H3C CH3
N CH3
H
a
Values are copied directly from tables in ref 46. bValues for the the association constant given in the dimensionless units of corrected volume fraction (Equation 5–13).
the pKa of the donor is equivalent to the pKa of the con- fact that DG∞AHB tracks DH∞AHB, it follows that the strength
jugate acid of the acceptor. If the acidity of the donor is of the hydrogen bond, as measured by its association
increased further, the proton will be transferred between equilibrium constant KAHB, is determined by the differ-
donor and acceptor, the conjugate acid of the former ence in pKa between the donor and the conjugate acid of
acceptor becomes the new donor, and the conjugate the acceptor (Figure 5–14); the smaller the difference, the
base of the former donor becomes the new acceptor. If stronger the bond. If this is the case, it must follow that
the pKa of the former donor is decreased below the pKa of the strongest possible hydrogen bond in a given series is
the conjugate acid of the former acceptor, the standard the one in which the pKa of the donor is equal to the pKa
enthalpy of formation for the hydrogen bond, because of the conjugate acid of the acceptor. The most obvious
donor and acceptor have switched roles, will begin to examples of such a hydrogen bond are those between a
increase. A corresponding argument could be made for donor and its conjugate base.
the situation in which the donor remains the same and a It is important to note, however, that such a sym-
series of acceptors, the conjugate acids of which increase metric hydrogen bond, even one between a donor and
in pKa, is examined. From these considerations and the its conjugate base, is no stronger than would be pre-
The Hydrogen Bond 213
O
)
:
(N O
)O O O
H
K AHB
:
O N(
O
10 2 O)
5–5
–6
Energy (kJ mol –1)
hydrogen bonds are special cases that are irrelevant to shorter than the unconstrained hydrogen bond between
hydrogen bonds in a molecule of protein at neutral pH. a dimethylalkylamine and its dimethylalkylammonium
Other hydrogen bonds between acids and their conjugate cation* (0.264 nm).106 Both of these short intramolecu-
bases, however, are more relevant to the acids and bases lar hydrogen bonds display downfield chemical shifts
found in a protein (Table 5–4). The hydrogen bond (20.5 and 18.5 ppm) and fractionation factors less than 1
between an acid of a type found in proteins and its con- (0.84 and 0.90). In a comparison between intramolecular
jugate base is about 0.02–0.03 nm shorter than the hydro- and intermolecular hydrogen bonds between carboxy-
gen bond between the neutral acid and itself or an late anions and the corresponding carboxylic acid or
equivalent functional group on a different molecule between enols of b-diketones and the corresponding car-
(Table 5–1) because it is a stronger hydrogen bond (Figure bonyl oxygen,65 the ranges of lengths of intramolecular
5–14). It seems reasonable that such strong, short hydro- hydrogen bonds (0.239–0.242 and 0.243–0.255 nm,
gen bonds—for example, that between the imidazole of respectively) were about 0.006 nm shorter than those of
histidine and the imidazolium of another histidine or the intermolecular hydrogen bonds (0.244–0.249 and
between the carboxylate of a glutamate and the carboxylic 0.246–0.265 nm, respectively).
acid of another glutamic acid—should be found in crys- In such intramolecular situations where a hydrogen
tallographic molecular models of proteins, but they are bond is constrained by the framework of the molecule to
rarely observed, probably because the neutralization of be shorter, this compressed hydrogen bond must have a
the cationic acid or the anionic base required to produce less negative standard enthalpy of formation relative to
the acceptor or the donor, respectively, requires more free an equivalent, intermolecular, uncompressed one. This
energy at neutral pH than would be gained by forming the conclusion follows from the fact that repulsive potential
stronger hydrogen bond. energy has to be overcome to compress the bond (Figure
It is also possible to shorten a hydrogen bond by 5–15). The energy necessary to compress the bond is pro-
physically compressing it within a covalent framework. A vided by the covalent framework of the molecule. It fol-
number of intramolecular hydrogen bonds are short lows that an intramolecular hydrogen bond that is
because of such compression. For example, the hydro- shorter than an equivalent intermolecular hydrogen
gen bond in hydrogen maleate monoanion* bond must be weaker than that intermolecular hydrogen
(0.241 nm)70–72 is shorter than the unconstrained hydro- bond even though it has a lower barrier to proton trans-
gen bond between two hydrogen fumarate monoanions* fer (Figure 5–12).
(0.247 nm),102–105 and that in hydrogen 1,8-diamino- The measurements available for the free energies of
N,N,N ¢,N ¢-tetramethylnaphthalene cation (0.258 nm) is formation for such intramolecularly shortened hydro-
gen bonds confirm this expectation. The hydrogen bond
in hydrogen maleate anion in dimethyl sulfoxide is only
–18 kJ mol–1 more stable than that in neutral maleic acid,107
Table 5–4: Lengths of Hydrogen Bonds between Acids and
about what one would expect for the difference in standard
Their Conjugate Basesa
enthalpy of formation for two hydrogen bonds the accep-
tors of which differ in pKa by 10 units. In water, the differ-
acid or conjugate base bond length (nm)
ence is only –2 kJ mol–1. The hydrogen bond between the
p-nitrophenol93 0.246 carboxylic acid and the carboxylate anion in 5–6
8-hydroxyquinoline94 0.243
pentachlorophenol95 0.244
0.248b
1-(p-hydroxyphenyl)thianium96 0.247
cyclohexylamine97 0.280
1,10-diaminodecane98 0.280c
N,N,N-tris(2-aminoethyl)amine99 0.280c O O
0.285 O O
N-methylimidazole100 0.265
O OH O O
:
O
)O
O
N N
a
Values presented in this table were gathered during a search of the Cambridge
Structural Database by Dr Hens Borkent at the Catholic University of Nijmegen. O O O O
b
Two different hydrogen bonds in the same unit cell. cHydrogen bond between
two monocations of the diamine or triamine, respectively. dHydrogen bond H 3C CH 3
between two hydrogen succinates.
5–6
mation differs from the standard free energy of forma- relieves the repulsion. It comes as no surprise that the
tion for the hydrogen bond in the homologous acid activation energy for the exchange of the proton in such
amide, in which an NH2 replaces the OH, by only a confined location is much higher than that for a proton
–10 kJ mol–1 in benzene (er = 2.3) and –6 kJ mol–1 in in an unconstrained, intermolecular hydrogen bond.110
dichloromethane (er = 8.9).108 These differences, if any- Whether an intermolecular hydrogen bond is short-
thing, are less than expected for the differences in free ened by its strength or an intramolecular hydrogen bond
energies of formation in these two solvents for two is shortened by the compression exerted by the molecu-
hydrogen bonds the donors of which differ by so much in lar framework to which it is covalently attached, the same
pKa. The proton in the hydrogen bond in zwitterionic cis- increase in the overlap of the wells of potential energy for
urocanic acid the proton associated with the heteroatoms of donor and
acceptor (Figure 5–12) and the same lowering of the bar-
O rier occur. Both strong intermolecular hydrogen bonds
O and compressed intramolecular bonds can be low-
O:
O
± H+
O ) O ones, and stronger, shorter hydrogen bonds may have
O O more covalent character.65,80
:
O O
O
O
1.0
– + – + –
–1.8
H2O
A H B
H
8 10 12 14 16
– + pK a,AH + log(p/q)
+
A H – – Figure 5–17: The apparent association equilibrium constants
(Kapp,AHB) for a series of hydrogen bonds between the phenolate ion
and a series of aliphatic ammonium ions in aqueous solution as a
A H B function of the acid dissociation constants Ka,AH for those ammo-
nium ions.114 The associations between the phenolate ion and the
Figure 5–16: Molecular orbitals for a covalent hydrogen bond. ammonium ions were followed spectrophotometrically by changes
The molecular orbitals are for a symmetric hydrogen bond in in absorbance at 300 nm as ammonium ion was added to an aque-
which the donor and the conjugate acid of the acceptor are equiv- ous solution of phenolate ion at 2 M ionic strength and 25 ∞C. The
alent. The covalent molecular orbital system is formed from two sp2 values of the apparent association equilibrium constants were
or sp3 orbitals, one from atom A and one from atom B, and the divided by the number of protons (p) on the respective ammonium
s orbital on hydrogen. These three atomic orbitals combine to form ion to convert the molar concentration of the free cation to the
the three molecular orbitals—bonding, nonbonding, and anti- molar concentration of donors. The acid dissociation constants
bonding—shown in the middle of the diagram. The final molecular were also statistically corrected by multiplying the observed acid
orbital system is constructed formally in steps by first mixing the dissociation constants by the number of lone pairs on the conju-
atomic orbitals of atom A and the hydrogen to form the two molec- gate base (q) and dividing by the number of protons on the conju-
ular orbitals—bonding and antibonding—of the A–H covalent gate acid (p), so that the corrected values are for the molar
bond and then mixing the A–H molecular orbital system with the concentration of protons on the respective conjugate acid and the
atomic orbital on atom B containing the lone pair of electrons. molar concentration of the lone pairs of electrons on the respective
conjugate base. The logarithms of the apparent association equi-
librium constants (in units of molarity–1) are linearly correlated
the less advantage will there be in their combination to with the logarithms of the corrected acid dissociation constants (in
form the hydrogen bond. The higher the relative permit- units of molarity–1) by a line with Brønsted coefficient of 0.15. As the
tivity of the solvent, the weaker will be the bond. One way same corrections would be made to both the association constant
to quantify this effect80 is to compare the slopes of corre- and the acid dissociation constant in converting to units of cor-
rected volume fraction, the slope of the line would be unaffected.
lations between the free energies of formation of a set of The value of log (Kapp,AHB p–1) calculated by Equation 5–51 for an
hydrogen bonds and either the values for pKa of the acid with a pKa equal to that of water is indicated by a filled square.
donor (Figure 5–14) or the values for pKa of the conjugate The ammonium ions used were (1) hydroxylammonium ion,
base of the acceptor (Equation 5–37). In water (er = 78 at (2) piperazine dication, (3) sym-tetramethylethylenediammonium
25 ∞C), the slope for such a correlation for hydrogen dication, (4) N,N,N-trimethylethylenediammonium dication,
(5) ethylenediammonium dication, (6) 2-hydroxy–1,3-diamino-
bonds between phenolate ion and a set of ammonium propane dication, (7) 1,3-diaminopropane dication, and
ions (Figure 5–17)114 is 0.6 kJ mol–1 (unit of pKa)–1 and that (8) (2-hydroxyethyl)ammonium ion. Adapted with permission
for a correlation114 of hydrogen bonds between ethylene- from ref 114. Copyright 1986 American Chemical Society.
diammonium dication and a series of phenolate ions is
–0.9 kJ mol–1 (unit of pKa)–1. The magnitudes of these
slopes are less than the 3.1 kJ mol–1 (unit of pKa)–1 for the vents such as carbon tetrachloride or benzene. The situ-
correlation of hydrogen bonds between 4-nitropheno- ation changes dramatically when the donor and acceptor
late or 3,4-dinitrophenolate ion (Figure 5–14) and a are dissolved in water because of the competition of the
series of phenols in tetrahydrofuran (er = 7.5) or the donors and acceptors of the water molecules themselves.
–1.3 kJ mol–1 (unit of pKa)–1 for the correlation (Equation Pauling and Pressman115 noted that the standard free
5–37) of hydrogen bonds between fluorophenol and a energy of formation of a hydrogen bond in water must be
diverse set of bases in CCl4 (er = 2.2). the difference between its own standard free energy of
To this point, most of the hydrogen bonds that have formation and the free energies of formation of the
been discussed are those formed in aprotic organic sol- hydrogen bonds of its donor and acceptor with water.
The Hydrogen Bond 217
The fact that the concentrations of donors and acceptors amido nitrogen–hydrogen and one acyl carbon–oxygen
in water are both 110 M is a sufficient observation in itself (indicated in Figure 5–18 by N-H and C=O9). The com-
to lead to the conclusion that the hydrogen bond between plete standard free energy diagram for the hydrogen
a solute A-H and a solute 9B would be unlikely to form. bond (Figure 5–18)117–119 suggests that the standard free
The intermolecular hydrogen bond between the energy of transfer of a hydrogen-bonded amido nitro-
nitrogen–hydrogen bond of an amide and the lone pair gen–hydrogen and acyl carbon–oxygen from water to
of electrons on the acyl oxygen of another amide can be CCl4 at 25 ∞C is around +2 kJ mol–1, a value that registers
used again as an example of the majority of the hydrogen the polarity of the hydrogen bond. The most significant
bonds in biological macromolecules. When N-methyl- difference between CCl4 and H2O, however, is the high
acetamide is dissolved in carbon tetrachloride or stability of the separated donors and acceptors in the
dioxane, an absorption appears in the infrared spectra H2O. The large unfavorable standard free energy of trans-
of the two solutions that can be assigned116 to the stretch- fer for the amido group from water to CCl4 reflects the
ing vibration of the hydrogen–nitrogen bond in a necessity to break hydrogen bonds between it and the
hydrogen bond of the structure water before the transfer can occur.
If this is the case, the formation of the hydrogen
H
CH 3
H
CH 3 bond in water must be written, in analogy to Equation
: N C CH 3 (N C CH 3 5–1, as
H 3C O HN
:
1 H 3C OH N
O
O
H
O
C CH 3 C CH 3 O
O : H
O
O H H H
O
O
K AHWBW
O
)O 1
” ”
”:
”:
O
NH O + O NH O + O H O
:
H O O H ””
5–8 (5–39)
1
1
1
1
K ass,AHW K ass,BW K ass,AHB K ass,WW
Hydrogen bond 5–8 is the one supposedly holding
a helices and b structure together in a molecule of protein
O
and the base pairs together in DNA. From calorimetric N H + H 2O O + H 2O NH + O H 2O + H 2O
O
measurements, it can be calculated that the standard free O O
energy of formation of this hydrogen bond117 at 25 ∞C in (5–40)
CCl4 is –16 kJ mol–1, when an infinitely dilute solution of
N-methylacetamide is defined as the standard state and
the association equilibrium constant is expressed in units H2O CCl4
of corrected volume fraction (Equation 5-13). When
N-methylacetamide is dissolved in water, however, the NH + O C
_____________
O
infrared absorption arising from hydrogen bond 5–8 can
barely be detected even at a concentration of 12.5 M. From
the small absorption that was observed, the standard free
energy of formation of the hydrogen bond in aqueous solu- –16 kJ mol –1
tion was judged116 to be about +7 kJ mol–1 at 25 ∞C, again + 25 kJ mol –1
in units of corrected volume fraction.
A more complete picture of the situation is gained NH O C
_____________
O
where the association constant for any of the complexes, ence between the lone pair of electrons on the acyl
Kass,XY, is oxygen of N-methylacetamide (pKa = –0.6) and the lone
pair of electrons on water (pKa = –1.7) as acceptors.
[ X9Y ] Therefore, the standard enthalpy of formation for the
K ass, XY = (5–41) following four hydrogen bonds should be similar
[ X ][ Y ]
O
It is entirely possible44 that the concentrations of OO OO
O
unbonded donors and unbonded acceptors in aqueous
C CH 3
solutions are negligible and that only the upper part of C CH 3 H 3C N
Equation 5–40 is thermodynamically relevant. H 3C N H
H
The equilibrium constant for the upper part of O O H
Equation 5–40, KAHWBW, is defined by O N
O
O H
O
C CH 3
H H 3C
[ B9HA ][ H2O9H2O ]
K AHWBW = (5–42)
[ B9H2O ][ H2O9HA ] 5–9 5–10
”:
”
is the amido nitrogen–hydrogen bond and B9 is a lone O O H 3C
”:
”
pair of electrons on the acyl oxygen. The equilibrium H
H N H
constant actually observed, Kapp,AHB, is O
H O O C
O
[ B9HA ] CH 3
K app, AHB =
([ B9H2O ] + [ B9 ]) ([ H2O9HA ] + [ HA ]) 5–11 5–12
(5–43)
and the standard enthalpy change for the upper part of
from which it follows that Equation 5–40 should be near zero. If the upper part of
Equation 5–40 were isoentropic as well as isoenthalpic,
1 so that KAHWBW @ 1 (Equation 5–42), then Kapp,AHB would be
equal to [H2O9H2O]–1. The observed apparent associa-
K app, AHB = K ass, AHB
K ass, AHW tion constant for the formation of hydrogen bond 5–8 in
1 + [ H2O9H2O ]" aqueous solution when expressed in units of reciprocal
(K ass,WW)" molarity116 is about (190 M)–1, which is in the range
expected for the reciprocal of the concentration of
hydrogen bonds in pure water, [H2O9H2O] < 110 M. The
1
conclusion to be drawn from these considerations is that
Z a hydrogen bond in aqueous solution will always have a
K ass, WB
1 + [ H2O9H2O ]" small apparent association constant and a large appar-
(K ass,WW) "
ent standard free energy of formation because the con-
centration of hydrogen bonds between water molecules
(5–44) in the solution is a hidden and significant term in that
apparent association constant (Equation 5–45).
If, as is reasonable, Kass,AHW > (Kass,WW)", Kass,WB > The hydrogen bond represented by that of
(Kass,WW)", and [H2O9H2O] > 1 M, it follows that N-methylacetamide accounts for the majority of those
found in proteins and nucleic acids, and yet its stan-
K AHWBW dard free energy of formation is positive by a consider-
K app, AHB @ (5–45) able degree. From this it follows that each hydrogen
[ H2O9H2O ] bond of this type in a protein or a nucleic acid is an
energetic liability rather than an asset. It is possible,
where [H2O9H2O] is the molar concentration of hydro- however, that some other combination of donor and
gen bonds in the water. acceptor might produce a hydrogen bond strong
The difference in pKa between the nitrogen–hydro- enough to overcome the competition of the water and
gen bond in N-methylacetamide as a donor (pKa = 16) provide a negative standard free energy of formation. In
and the oxygen–hydrogen bond in water as a donor assessing this possibility, it would be useful to have an
(pKa = 15.7) should be negligible, as should be the differ- equation that could be used to estimate the apparent
The Hydrogen Bond 219
equilibrium constant, Kapp,AHB, for the formation of any – log K ass, AHWBW = t (pK aHA – pK aHOH) (pK aHB – pK aH3O+)
hydrogen bond in aqueous solution. Such an equation
has been derived44 and has been demonstrated to be (5–50)
reliable.114
Consider the hydrogen bond and44
O)
or O
X H R
N(
DH ªAHWBW = n9H ( sA – s OH ) (s B – s H2O ) (5–48) H H
5–14
where sOH is the s constant for OH taking the place of A
in 5–13 and sH2O is the s constant for H2O taking the place where the substituents X and R were various electron-
of B in 5–13. As the values of the s constants are pro- donating and electron-withdrawing groups chosen to
portional to the values of pKa for the appropriate acids vary the values of pKa for the donor and acceptor. The log-
arithms of the association equilibrium constants for the
DH ªAHWBW = formation of these hydrogen bonds varied with the pKa of
either the donor (Figure 5–17) or the acceptor as predicted
2.303 RT t ( pK aHA – pK aHOH ) ( pK aHB – pK aH3O+ ) by Equation 5–51. Extrapolating the relationships to
either pKa,HA = pKa,HOH or pKa,HB = pKa,H3O+ gave the same
(5–49) value, 2.0, for log [H2O9H2O]. This numerical value is a
reasonable estimate for the logarithm of the concentra-
where t incorporates the constants of proportionality. If tion of hydrogen bonds in liquid water, where [H2O9H2O]
it is assumed for the moment that the standard entropy < 110 M, and it is in reasonable agreement with the results
change for the upper part of Equation 5–40 is negligible gathered independently with N-methylacetamide.
and that differences in standard enthalpy are the only The value of t (Equations 5–49 and 5–51) at 25 ∞C
significant determinants of the relative strengths of the and 2 M ionic strength was found to be 0.013, from which
hydrogen bonds being considered, then it follows that even the strongest possible hydrogen
220 Noncovalent Forces
bond, where the pKa of the donor equals the pKa of the ever, to decrease its standard free energy of formation by
acceptor, would have an association equilibrium con- increasing its standard entropy of formation.
stant in aqueous solution of considerably less than 1 M–1.
For example, the hydrogen bond between the lone pair of Suggested Reading
electrons on imidazole (pKa,HB = 6.4) and the
Jencks, W.P. (1987) Hydrogen Bonds, in Catalysis in Chemistry and
nitrogen–hydrogen bond of the imidazolium cation Enzymology, Chapter 6, pp 323–350, Dover, New York.
(pKa,HA = 6.4) would have an apparent association equi-
Perrin, C.L., & Nielson, J.B. (1997) “Strong” hydrogen bonds in
librium constant of only 0.040 M–1 at 25 ∞C. At pH 6.4, a chemistry and biology, Annu. Rev. Phys. Chem. 48, 511–544.
2 M solution of imidazole would have a concentration of
hydrogen bonds 5–15 equal to only 0.04 M.
Problem 5–4: Draw structures that represent all of the
( possible hydrogen bonds that can form between the fol-
O
(B) Write the acid dissociations to which the follow- Below is a table of amplitudes from the infrared spectra
ing apparent values of pKa refer, and correct them for various total concentrations of propionamide at
for number of protons and number of lone pairs. 298 K in CCl4.
[propionamide]TOT
pKa1 pKa2
(M) AM1 AP1
O –3
0.6 15.70 1.71 ¥ 10 0.415 0.055
CH 3CNH 2 2.15 ¥ 10–3 0.509 0.083
2.43 ¥ 10–3 0.566 0.102
H 2O –1.75 15.75 4.72 ¥ 10–3 0.981 0.308
6.90 ¥ 10–3 1.32 0.558
H 3C 10.40 ¥ 10–3 1.79 1.020
–7.51 15.10
HN N Recall that, by Beer’s law, [monomer] = (eM1)–1AM1 and
[polymeric species] = (eP1)–1AP1. If the hydrogen bonding is
a dimerization, it should be described by the following
(C) Rank the eight hydrogen bonds in order of
equation:
strength.
2 propionamide 1 propionamide2
Problem 5–6: Draw the structure of only the most stable
hydrogen bond that forms between side chains of the fol- [propionamide2]
K eq =
lowing pairs of amino acids. Include all relevant angles
[propionamide]2
and distances around the various hydrogen bonds.
(A) glutamic acid and histidine (A) Show that the data are consistent with a dimer-
ization.
(B) serine and tyrosine
(B) Use the data to determine Keq at 298 K in units of
(C) glutamine and histidine
corrected volume fraction. (Hint: [propi-
onamide]TOT = [propionamide] +
Problem 5–7: The panel below is an infrared spectrum of 2[propionamide2].)
propionamide in carbon tetrachloride.124 The bands
marked M1 and M2 are the absorptions of the monomeric Values of Keq were determined at a number of different
propionamide, and the bands marked P1–P5 are absorp- temperatures.
tions of hydrogen-bonded species. As the concentration
of propionamide is increased, within each of these two temp (K) Keq (M–1)
sets [(M1, M2) and (P1–P5)] the amplitudes of the individ- 303 35.5
ual absorptions remain in constant ratio to each other 313 24.6
and must be different absorptions of the same species. 329 13.3
:
H O O K eq,TI O
s faces of the bases are transferred from water to a com-
O
1 H
:
) O
O
pletely nonpolar environment (Figure 6–48) and that the O O
””
)
:
proper tautomers for base pairing are present at the pH O
O
O
of the experiment.
:
(A) Write complete equations for the formation of an
A·T pair and a G·C pair as the double helix forms, 5–16
drawing in all hydrogen-bonded waters to all
donors and acceptors on each side of the equation H O
H
O
and the hydrogen bonds that form between the E k H
residual waters. Assume that the donors and
Æ H O + O)
O
:
acceptors of hydrogen bonds accessible to water
O
O
in the major and minor grooves retain their O
hydrogen bonds with H2O.
(5–53)
(B) Why is poly[d(GC)·d(GC)] more stable than
poly[d(AT)·d(AT)]? The tetrahedral intermediate 5–16 that leads to the
anhydride has both a phenolate and a carboxylate as poten-
tial leaving groups, and because the latter is the better leav-
Problem 5–9: There are two possible hydrogen bonds
ing group, the intermediate should decompose to reactant
that can form between the neutral form of phenol, a
much more frequently than to anhydride. Therefore, the
model for tyrosine, and the free base of imidazole, a
reaction involves a preequilibrium between the reactant
model for histidine.
and the tetrahedral intermediate. Occasionally the pheno-
(A) Draw the full structures of both partners in both of late is ejected from the tetrahedral intermediate in a kinet-
the possible hydrogen bonds with proper ically irreversible step. It is the ejection of the phenolate
hybridization on the central atoms and proper that is monitored as the reaction progresses. The first-order
bond angles around the hydrogen bond. rate constant, in units of reciprocal seconds, for the appear-
ance of phenolate would be equal to Keq,TIkE, where Keq,TI is
(B) Which of the two possible hydrogen bonds is the
the equilibrium constant for the formation of the tetrahe-
more stable? Why?
dral intermediate. If it is assumed that for all compounds
(C) Estimate the standard free energy of formation of in the series the rate constant kE, which in all cases is for a
the more stable hydrogen bond at 25 ∞C when it is chemically equivalent first-order reaction, has the same
formed in aqueous solution if the statistically cor- value, then the differences in observed rates result from
rected values of pKa are 9.65 for phenol and 7.35 differences in Keq,TI, an equilibrium constant.
for the conjugate acid of imidazole. A comparison of the first-order rate constants of
phenolate release for a series of intramolecular reac-
tions125 to the estimated pseudo-first-order rate constant
for the intermolecular reaction between phenyl acetate
and excess acetate anion, when the catalysis by acetate
anion proceeds through acetic anhydride as an interme-
Intramolecular and Intermolecular Processes: diate,126,127 indicates how large these intramolecular
Molecularity and Approximation increases in an association constant can be (Table 5–5).
The 6 ¥ 105 increase in the association equilibrium con-
Intramolecular chemical reactions often occur at rates stant for the intramolecular formation of succinic anhy-
much faster than equivalent intermolecular reactions, dride,128 when compared to the association equilibrium
and intramolecular associations often occur with associ- constant for the intermolecular formation of acetic anhy-
ation equilibrium constants much larger than those of dride expressed in units of corrected volume fraction, is
equivalent intermolecular associations. A particularly somewhat greater than the increase seen with phenyl
informative series illustrating such effects can be gath- succinate (Table 5–5). This fact suggests that the
ered125 from among the reactions involving intramolecu- increases listed in Table 5–5 are reasonable.
lar nucleophilic catalysis of the hydrolysis of phenyl A situation similar to the intramolecular hydrolysis
esters by the carboxylate anion. The mechanism for this of phenyl esters is encountered in the alkaline hydrolyses
nucleophilic catalysis has been shown to involve the for- of endo-6-hydroxybicyclo[2.2.1]heptane-endo-2-carbox-
mation of an intermediate anhydride, which in the amides in which a preequilibrium between reactant and
intramolecular examples such as phenyl succinate would tetrahedral intermediate precedes the expulsion of the
be cyclic: amine:129
Intramolecular and Intermolecular Processes: Molecularity and Approximation 223
Table 5–5: Relative Preequilibrium Constants125,126 for the Formation of a Tetrahedral Intermediatea
O
H3C O-
1.0
O + H3C
O
O
H3C
H 3C O 100 –11
O-
O
O
O 2 ¥ 104 –24
-
O
O
O
O 1 ¥ 106 –34
-
O
O
O
O 4 ¥ 106 –38
-
O
O
O
a
The first-order rate constants for the ejection of phenol from the several phenyl monoesters of dicarboxylic acids (last four entries) were determined125 as a function of pH
in the range pH 4–8. From the pH-rate behavior of each of these rate constants, the first-order rate constant for intramolecular nucleophilic catalysis of the respective ejec-
tion by the appended carboxylate could be calculated. From these values, the first-order rate constants for intramolecular anhydride formation could be calculated. These
rate constants were determined for each phenyl monoester at 25, 30, or 35 ∞C. The values of these rate constants were adjusted to the same temperature and originally pre-
sented relative to the first-order rate constant for intramolecular ejection of phenol from phenyl glutarate.125 These first-order rate constants were later related to the
pseudo-first-order rate constant for the intermolecular formation of acetic anhydride from excess acetate anion and phenyl acetate during the intermolecular nucleophilic
catalysis of the hydrolysis of phenyl acetate by acetate anion.126 bFirst-order rate constants for the formation of anhydride were originally presented relative to the calcu-
lated126 pseudo-first-order rate constant for the formation of acetic anhydride from phenyl acetate and acetate anion. The latter intermolecular rate constant was in units
of molarity–1 second–1. It has been assumed that all of the rate constants for the formation of the anhydrides are directly proportional to the equilibrium constants for the
formation of the respective tetrahedral intermediates (Equation 5–53). The resulting units of molarity–1 for the equilibrium constant for the formation of the tetrahedral
intermediate from acetate anion and phenyl acetate were converted to units of corrected volume fraction with Equation 5–13. No correction is required for the intramol-
ecular reactions if it is assumed that they involve only negligible changes in molar volume. All equilibrium constants for the formation of the tetrahedral intermediates are
presented relative to that for the reaction of acetate anion with phenyl acetate. cStandard entropy of approximation, calculated by Equation 5–60 with the intermolecular
equilibrium constant in units of corrected volume fraction and with the assumptions that all of the rate constants are directly proportional to the equilibrium constants
for the formation of the tetravalent intermediate and that DDH∞ @ 0. This entropy of approximation was multiplied by 298 K.
presented in Table 5–5 are among the largest that have already been accomplished synthetically or biologically,
been noted for each particular size of ring formed during the standard entropy change for the intramolecular reac-
each of these intramolecular reactions. In some of the tion is more positive than the standard entropy change
other instances, electronic effects are difficult to separate for the intermolecular reaction.
from the effects of approximation. For example, in the These relationships can be expressed in equations.
intramolecular reaction These equations are not intended to reflect actual
changes in standard entropy, which are often dominated
HO O by changes in solvation, but to represent the quantitative
H consequences of approximation underlying the
OO O enhancements in rate or equilibrium constant. The stan-
1 + H 2O dard entropy change of the intramolecular reaction,
DS∞intra, should be related to the standard entropy change
of the intermolecular reaction, DS∞inter, by
(5–55)
DS ªinter = DS ªintra + DS ªapprox (5–56)
the two reactants are connected electronically by the
p system in addition to being juxtaposed by the s system.
At one time it was fashionable to refer to these if the same reaction with the same change in standard
accelerations in rate or increases in association as the enthalpy is occurring once the reactants have been
result of an increase in the effective molarity of one of approximated. If this were an adequate description of the
the reactants brought about by attaching it covalently to situation and the same change in standard enthalpy did
the other. As more exaggerated examples of this phe- occur in each reaction, then
nomenon were reported, however, the unreality of dis-
( )
cussing concentrations of millions of molar became
K eq,intra
apparent,128 and a more reasonable view of the situation R ln = DS ªintra – DS ªinter = – DS ªapprox
was required.128,130 K eq,inter
In all instances in which an intramolecular associa-
tion, for example, the intramolecular formation of a (5–57)
hydrogen bond in a folding polypeptide, is compared to
an equivalent intermolecular association, the difference Because DS∞approx < 0, Keq,intra > Keq,inter.
observed in the two equilibrium constants Keq,intra and The magnitude of the standard entropy of approxi-
Keq,inter is due in large part to an increase in the change in mation is determined by the difference between two
standard entropy caused simply by the fact that a uni- other standard entropy changes, the standard entropy of
molecular reaction is being compared to a bimolecular molecularity and the standard entropy of rotational
reaction. This increase in the standard entropy change restraint.128,130 The formation of a unimolecular product
for the intramolecular association results from the fact during an intermolecular reaction requires that two or
that the standard entropy of approximation is missing more independent molecules become one molecule, and
from the standard free energy change for the intramole- this involves a considerable decrease in standard
cular association. The standard entropy of approxima- entropy. The standard entropy change responsible for
tion, DS∞approx, is the change in standard entropy due this decrease, the standard entropy of molecularity,
solely to bringing the separate reactants together into the DS∞molec, has a negative value and is a major, unavoidable,
same molecule or into the same complex, respectively, unfavorable term in the change in standard free energy
prior to the beginning of the reaction. It is a negative in any intermolecular reaction. In an intramolecular
number because the intrinsic entropy of two separate reaction, however, the decrease in standard entropy due
reactants relative to the unimolecular product of a reac- to the standard entropy of molecularity does not occur,
tion is greater than the intrinsic entropy of one molecule because reactants are already on only one molecule, and
containing the two reactants or of one complex into this has the effect of increasing dramatically the change
which the two reactants have been assembled relative to in standard entropy for the intramolecular reaction rela-
the product of the intramolecular reaction. For example, tive to the intermolecular reaction and hence increasing
the intrinsic entropy of a free acetate anion and a free its yield of product. There is affiliated with an intramole-
phenyl acetate is much larger relative to the tetrahedral cular reaction, however, a standard entropy of rotational
intermediate formed when they associate than is the restraint, which, conversely, is irrelevant to an intermol-
intrinsic entropy of a phenylsuccinate anion relative to ecular reaction. The standard entropy of rotational
the cyclic tetrahedral intermediate in Equation 5–53. restraint is the increase in standard entropy that results
Because the standard entropy of approximation is miss- from the fact that the formation of the transition state or
ing from the standard entropy change for the intramole- product during an intramolecular reaction requires that
cular reaction, owing to the fact that approximation has a portion of the rotational entropy in the molecule be
Intramolecular and Intermolecular Processes: Molecularity and Approximation 225
eliminated because only a fraction of the accessible rota- ene illustrates, an intramolecular reaction, because it
tional isomers can participate in the reaction produc- usually involves a larger and more flexible molecule than
tively. The standard entropy change accompanying this any of the reactants in an intermolecular reaction, can
decrease in the number of rotational isomers, the stan- never realize all of this favorable standard entropy of
dard entropy of rotational restraint, DS∞rot, has a negative molecularity. Major factors in decreasing the portion of
value and its inclusion causes the standard entropy the standard entropy of molecularity that an intramolec-
change for the reaction to be smaller than it would be if ular reaction will enjoy are the internal rotations within
no rotational freedom were lost during the reaction the intramolecular reactant itself. These rotations
because only a productive rotational isomer was present. decrease the probability that the necessary juxtaposition
The relationships between the standard entropy of of reactants will occur. For example, in the case of phenyl
approximation and the standard entropy of molecularity succinate (Equation 5–53) the dihedral angles around
and standard entropy of rotational restraint are three carbon–carbon single bonds must be appropriate if
the carboxyl oxygen is to be placed adjacent to the acyl
DS ªapprox = DS ªmolec – DS ªrot (5–58) carbon. It has been estimated128,130 from the results of
thermodynamic and kinetic measurements from a
number of intramolecular reactions that the standard
The magnitudes of each of these terms can be discussed entropy of rotational restraint decreases by about 20 J K–1
in turn. mol–1 for every bond that lies between the two atoms par-
The standard entropy of molecularity is the ticipating directly in the reaction and about which free
decrease in standard entropy that should accompany the rotation can occur.
change of an intermolecular reaction to a rigidly oriented When two similar intramolecular associations are
intramolecular reaction.128 In the specific case of a compared, for which it is assumed that differences
bimolecular reaction, the two independent reactants between their standard enthalpies of formation are neg-
have six translational and six rotational degrees of free- ligible126
dom, but the one molecule, formed by the association of
the two others, should have only three translational and
three rotational degrees of freedom. The standard
entropy change associated with the loss of the three
DDS ª = R ln
( )
K eq1
K eq2
(5–59)
ciation become involved in a five-membered or six- the reaction or the yield of the product due to covalent
membered ring in the product. A four-membered ring is approximation of the reactants, the major effect of the
usually too strained, because of the normal bond angles approximation is on the standard enthalpy change of the
of commonly encountered molecules, to provide any reaction rather than the standard entropy change.29 For
favorable approximation. A seven-membered ring, if example,133 2,2,3,3-tetramethylsuccinanilide at pH 5 dis-
there is free rotation about every bond, has too small a plays a rate of aniline release 1200 times greater than that
value of DS∞rot to exhibit a DS∞approx small enough to over- of succinanilide itself. This increase in rate, however,
come the strain of the ring and have a noticeable effect which is equivalent to a change in the standard free
on equilibrium. For example, even in the intramolecular energy of activation of –18 kJ mol–1, is accompanied by a
nucleophilic catalysis of phenyl ester hydrolysis, a series change in the standard enthalpy of activation, DDH ∞‡, of
of reactions unusually prone to intramolecular catalysis, –25 kJ mol–1. Therefore, in this case the standard entropy
phenyl adipate would show a rate of phenolate release of activation actually decreases as the rate of the reaction
due to nucleophilic catalysis only 4-fold greater than that is enhanced by approximation. It is difficult, however, to
for the same reaction of phenyl acetate in 1.0 M sodium interpret such observed changes in the thermodynamic
acetate. Large, rigid molecules in which the two atoms parameters of activation because they are usually domi-
that must react are more than six atoms apart yet close nated by solvent effects that mask the underlying effects
enough to collide have been synthesized,131,132 but pro- of approximation on the rates or equilibrium
teins and nucleic acids are the ultimate examples. constants.130 In the case of the intramolecular catalysis
The magnitude of the actual difference in the manifested in the alkaline hydrolyses of endo-6-hydroxy-
change in standard entropy between a given intramolec- bicyclo[2.2.1]heptane-endo-2-carboxamides (Equation
ular reaction and the corresponding intermolecular 5–54), it has been concluded that the rate enhancement
reaction will be less than the magnitude of the standard of 1 ¥ 106 (TDS∞approx < –34 kJ mol–1) “results almost
entropy of approximation because vibrational degrees of entirely from the entropy effect”, probably because there
freedom, unavailable to the reactants in the intermolec- is no strain involved in the formation of the additional
ular reaction, are available to the necessarily larger reac- five-membered ring of the tetrahedral intermediate.129
tant in the intramolecular reaction and because steric When the upper limits of TDS∞approx are calculated
effects that do not apply to the intermolecular reaction from the relative rates between the intramolecular and
are often unavoidable consequences of designing the bimolecular nucleophilic catalysis of the hydrolysis of
intramolecular reactant. For example, the intramolecu- phenyl esters, on the assumption that DDH∞ is equal to
lar rates of lactonization for a series of bicyclic g-hydroxy- 0,125 they are all equal to or greater than –38 kJ mol–1
carboxylic acids decrease as the strain energies of the (Table 5–5). The upper limit of TDS∞approx calculated from
rigid five-membered rings of the tetrahedral intermedi- the increase in the equilibrium constant for the forma-
ate (see Equation 5–54) increase,130 even though in each tion of the tetrahedral intermediates in the hydrolyses of
case the hydroxy group and the acyl carbon are posi- endo-6-hydroxybicyclo[2.2.1]heptane-endo-2-carbox-
tioned rigidly in the same orientation and at about the amides is –34 kJ mol–1. If it is assumed that the fused rings
same distance from each other. retain two rotational axes, which is a generous assump-
If the magnitude of DDS∞, the actual difference tion, DS∞approx (Equation 5–58) should be about –140 J K–1
between the standard entropy changes in the reactions, mol–1 and TDS∞approx about –40 kJ mol–1 when units of cor-
must be less than the magnitude of DS∞approx, then rected volume fraction are used. The upper limit of
TDS∞approx calculated from the increase in the equilibrium
( )
K eq,intra constant for the formation of tetrahedral intermediates
TDS ªapprox < –RT ln + DDH ª (5–60) in the hydrolyses of endo-6-hydroxybicyclo[2.2.1]hep-
K eq,inter tane-endo-2-carboxamides is –34 kJ mol–1.
At least three points are illustrated by this exercise.
where Keq,intra and Keq,inter are the intramolecular and First, the largest intramolecular increases in rate or degree
intermolecular association equilibrium constants. If the of association yet measured, with the educational excep-
differences in the actual standard enthalpies of forma- tions of the cases involving severe compressive steric
tion, DDH∞, are known, the estimates of the upper limits effects in the transition states or products, are of a magni-
for DS∞approx can incorporate them. If they are unknown, tude less than that expected simply for the transformation
they can be assumed to be zero, for the sake of argument, of an intermolecular reaction into a fully constrained
but such an assumption can be misleading. intramolecular reaction. Second, the maximum decrease
In relating the change observed in an equilibrium in standard free energy of association to be expected when
constant to the entropy of approximation, the difference a bimolecular association such as the formation of a hydro-
in standard enthalpy change resulting from the chemi- gen bond is turned into an intramolecular association is
cal strategy used to accomplish the approximation, about –55 kJ mol–1, which would produce an increase in
DDH∞, complicates the interpretation (Equation 5–60). In its association equilibrium constant, when the units are
several reactions displaying large increases in the rate of corrected volume fraction, of 5 ¥ 109. Third, the larger the
Intramolecular and Intermolecular Processes: Molecularity and Approximation 227
O
H NH
ular hydrogen bonds in aqueous solution can be H
addressed. The standard enthalpy change for a hydrogen
bond forming in water should be quite small, but possi-
O
bly of a negative value (Equation 5–49). The competition <1c
:
) O” HN: 10
of water molecules for donor and acceptor seems to con- N
”
tribute an entropic effect the magnitude of which is O
–38 J K–1 mol–1, which is R ln 100, when concentrations of
donor and acceptor are expressed in units of molarity. In
an intramolecular association, the consequent elimina-
tion of the standard entropy of approximation should be 10 1.5c
O
able to compensate for the entropic deficit caused by the
:
) O” HN :
presence of the water. N
”
Evidence for the existence of intramolecular hydro- O
gen bonds within solutes dissolved in aqueous solution
has been reported. The extensive lore surrounding
involvement of hydrogen bonds in the equilibrium O
13 13c
:
acid–base behavior of the monoanions of dicarboxylic ) O” HN :
N
”
acids is by and large equivocal,29 but it has been noted45 O
that decreases in the rates of the reactions of the acidic
hydrogens in the monoanions of salicylates (5–17) with
hydroxide ion O
3 100d
:
) O” HN : (
O N
”
H
O
O a
O ) b
Difference in pKa between intermolecular equivalents of donor and acceptor.
Value for Kintra,AHB is for the ratio of the concentration of the hydrogen-bonded
species (see column 1) to the concentration of the same tautomer not hydrogen-
H
:
O bonded. The equilibrium constants Kintra,AHB for the formation of the noted
O intramolecular hydrogen bonds in aqueous solution were estimated from values
of the N1–N3 tautomeric equilibrium constants (Equation 2–31) for the respective
4-substituted imidazoles determined by 15N nuclear magnetic resonance.134 It
5–17 was assumed that a difference between the value of the tautomeric equilibrium
constant for a 4-substituted imidazole in which a hydrogen bond can form and
the value of the tautomeric equilibrium constant for a similar compound in which
suggest that they contain intramolecular hydrogen a hydrogen bond cannot form is due to the formation of the noted hydrogen
bonds the standard free energies of formation of which bond. It was also assumed that the value of the tautomeric equilibrium constant
for the form of the hydrogen-bonding species in which the bond is not formed is
are around –15 kJ mol–1. A series of compounds capable equal to that of the reference compound and that the observed excess of one of
of forming intramolecular hydrogen bonds either the two tautomers represents entirely the hydrogen-bonded form. d Estimated
from the difference between the second macroscopic acid dissociation constants
between the pyrrole nitrogen–hydrogen bond on imida- of cis- and trans-urocanic acid and the N1–N3 tautomeric ratios for neutral cis-
zole as donor (pKa = 15) and a carboxylate as acceptor urocanic acid.109
(pKa = 5) or between the pyridinyl lone pair on imidazole
(pKa = 7.5) as acceptor and the nitrogen–hydrogen bond
on an ammonium cation as donor (pKa = 10) has been The question that these results raise is whether or
described (Table 5–6). As the standard entropy of not the hydrogen bond in a conformation such as an
approximation was decreased by confining the juxta- a helix, a hairpin of b structure, or a b turn can be made
posed donor and acceptor more severely, or as the dif- favorable by sufficient standard entropy of approxima-
ference in pKa was decreased, the equilibrium constant tion. Certain cyclic hexapeptides are rigid enough to
for the intramolecular hydrogen bond enforce the conformation of a b turn in which an
intramolecular hydrogen bond is formed between the
[ B9HA ] acyl oxygen of one of the six amino acids and the amido
K AHB = (5–61) nitrogen–hydrogen of the amino acid three positions to
[ B9 + HA ] the amino-terminal side of it (Figure 4–16D),135 and the
conformation of the b turn can be varied by changing the
increased in magnitude (Table 5–6). sequence of the cyclic peptide. In fact, the first tight turn
228 Noncovalent Forces
to be observed was in such a cyclic hexapeptide.136,137 An can occur between donor and acceptor;138 for the hairpin
a helix or b structure, however, cannot be encompassed of b structure, there are four. It should, however, be
so easily. remembered that when the problem is stated in these
If it is assumed that either an a helix or a hairpin of terms, the difficulties involved in the initiation of either
b structure has already been initiated, could the standard of these structures are ignored, and these can be even
free energy of formation for the next hydrogen bond more formidable.138
(Figure 5–19) during the propagation have a negative Recently, several short peptides (8–16 aa in length)
value? In Figure 5–19, the next donor and acceptor in have been shown to form antiparallel b structure in
each structure are marked with asterisks. Because each aqueous solution by folding back on themselves to form
of these reactions occurs in aqueous solution, the stan- a hairpin.139,140 These hairpins of antiparallel b structure
dard enthalpy of formation for such a hydrogen bond is have marginal stability at room temperature and the
close to 0 (Equation 5–49). The value for the standard free pairing across the sheet is as yet unpredictable. Short
energy of formation for the hydrogen bond will be deter- stretches of parallel b structure (encompassing three or
mined in part by the difference between the unfavorable four pairs of amino acids) have been observed in aque-
competition of the water and the favorable elimination ous solution in molecules in which two short peptides
of entropy of approximation that the structures provide. are coupled at their carboxy-terminal ends to two prop-
For the a helix, there are two bonds about which rotation erly spaced positions in a somewhat rigid molecular tem-
plate that assists in properly orienting them.141 These
results suggest that approximation and the cooperative
A * formation of the hydrogen bonds between the two
O strands in these parallel and antiparallel b structures can
overcome the competition of the water for donors and
H acceptors, but only barely.
N N
H H All short, linear peptides examined so far fail to
* N form a helices when they are dissolved at room temper-
O H ature in water at neutral pH, even if they have the same
N
H O amino acid sequence as an a helix in a crystallographic
N O molecular model.142 This is almost certainly due to the
H
difficulty of forming the first few hydrogen bonds
ON H required to initiate the a helix rather than the formation
N of the hydrogen bonds that fall in line after it has been
O H
N O initiated (as in Figure 5–19). When short peptides are
H attached to rigid structural templates that provide prop-
N O erly oriented acceptors for hydrogen bonds and thereby
promote initiation, those peptides display a considerable
fraction of a helix even at room temperature.143
It was noted that when a cyanogen bromide frag-
B
*H ment containing the first 12 amino acids of ribonuclease,
O O N O KETAAAKFERQHHse (where Hse is homoserine), was
H H dissolved in 33 mM Na2SO4 between pH 4 and 5, an equi-
N N
N N librium existed between the structureless form of the
H O H O peptide and an a-helical form. At 0 ∞C, 15% of the peptide
was in the a-helical form at equilibrium.142 From this
O O
H H original observation, it was eventually144–146 discovered
N N that when the peptide acetyl-AAQAAAAQAAAAQAAY-
N N
H H a-amide is dissolved at 0 ∞C in 1.0 M NaCl, about 50% of
O O H it is a-helical and 50% is structureless at equilibrium.147
O N
* This peptide, however, even with as peculiar a sequence
as it has, exhibits significant a-helical content only at low
Figure 5–19: Intramolecular formation of a hydrogen bond to
elongate an a helix or a b hairpin. (A) To add the next hydrogen temperature. No other simple peptide examined so far
bond in an elongating a helix, the acyl oxygen marked with the displays a significantly higher amount of a helix at equi-
asterisk must combine with the amido nitrogen–hydrogen marked librium.144,148,149 Considerably higher a-helical content,
with the other asterisk. The two bonds about which rotation can however, is displayed even at room temperature by con-
occur between the last acyl group fixed in the a helix and the acyl tinuous segments of polyalanine (4–19 alanines in
oxygen in question have been highlighted with arrows. (B) To add
the next hydrogen bond in an elongating antiparallel b hairpin, the length) when they are appropriately isolated from the
acyl oxygen marked with the asterisk must combine with the amido charged amino acids at the two ends of these hybrid mol-
nitrogen–hydrogen marked with the other asterisk. ecules that are required to dissolve the polyalanine in
Intramolecular and Intermolecular Processes: Molecularity and Approximation 229
water.150 Polyalanine, however, is considerably different not state that the hydrogen bond is a net contributor to
from the variable and almost unbiased amino acid the stability of the a helix; in fact, each of the a helices
sequences of the a helices in proteins. All of the peptides that contains hydrogen bonds is less stable than an
displaying a-helical structure, because of the marginal a helix in which the donor and acceptor are replaced by
stability of those a helices and the peculiar sequences alanine.147 Rather, it is the formation of the a helix itself
required, reemphasize the difficulty of overcoming the that provides the standard entropy of approximation
competition of the molecules of water for donors and necessary to make the standard free energy of formation
acceptors. of each of these intramolecular hydrogen bonds less
The existence of marginally stable synthetic than 0.
a helices, albeit at 0 ∞C, has provided a biochemically The difference in standard free energy of formation
relevant framework on which to examine the advantages between the hydrogen bond of a lysinium cation and a
provided by such a structure and its resulting standard glutamate anion and the hydrogen bond of a lysinium
entropy of approximation to the formation of intramole- cation and a glutamic acid is –0.3 kJ mol–1; that between
cular hydrogen bonds. Because the side chains protrude the hydrogen bond of a lysinium cation and an aspartate
from an a helix at intervals of 99 ∞,151–153 the side chain anion and the hydrogen bond of a lysinium cation and an
four amino acids to the amino terminus of any position aspartic acid is –0.2 kJ mol–1; that between the hydrogen
in an a helix lies almost directly (396 ∞) below the side bond of a histidinium cation and a glutamate anion and
chain at that position (Figure 4–17). A hydrogen bond the hydrogen bond of a histidinium cation and a glu-
will form between a donor and an acceptor placed tamic acid is –0.5 kJ mol–1; and that between the hydro-
synthetically at the i and i + 4 positions of an a-helical gen bond of a histidinium cation and an aspartate anion
peptide.144 For example, in the peptide acetyl- and the hydrogen bond of a histidinium cation and an
AAQAAEAQAKAAQAAY-a-amide, a hydrogen bond can aspartic acid is –0.7 kJ mol–1 (Table 5–7). Because in each
form between the glutamate and the lysine when the two case the carboxylate is more basic than the carboxylic
are held rigidly above and below each other in the a-hel- acid, and hence a better acceptor, Equation 5–51 predicts
ical conformation of the peptide. The free energy of for- that these differences should be –3.5, –3.5, –6.5 and
mation of this hydrogen bond can be assessed by –6.5 kJ mol–1, respectively. The fact that the actual differ-
measuring the differences between the equilibrium con- ences are significantly less negative than the expected
stant at 0 ∞C for the formation of the a helix of this pep- differences is further evidence for the fact that an ion
tide and those for the a helices of various controls in pair is unstable in aqueous solution relative to the
which the hydrogen bond is unable to form. separated ions because the conversion of the monoca-
A series of such measurements have been made tionic hydrogen bond into an ion pair actually destabi-
(Table 5–7). None of these free energies of formation is lizes. This conclusion is reinforced by the fact that the
remarkable, again presumably because the standard difference in standard free energy of formation between
entropy of approximation in such a situation barely over- the hydrogen bond of a glutamine and aspartic acid and
comes the competition for donors and acceptors from the hydrogen bond of a glutamine and aspartate anion,
the water. An important point that should be reiterated is neither of which is an ion pair, is –2.3 kJ mol–1, even
that these small negative free energies of formation do though glutamine is a much weaker acid than either
lysinium cation or histidinium cation (Table 2–2).
One could imagine that the standard base pairs
Table 5–7: Standard Free Energies of Formation of between adenine and uracil or between guanine and
Hydrogen Bonds within an a Helix a cytosine might form in water because the formation of
the second hydrogen bond or the second and third hydro-
donor/acceptor standard free gen bonds in the respective complexes would be aided by
energy of standard entropy of approximation gained by the forma-
formation (kJ mol–1) tion of the first. Such a cooperative enhancement of
hydrogen bond strength has been observed in complexes
lysinium cation/glutamic acid147,149 –1.1
lysinium cation ion/glutamate anion147,149 –1.4 between glutaric acid and a cyclic tetraresorcinol in
lysinium cation/aspartic acid149 –0.9 CHCl3. In these complexes the two carboxylic groups of
lysinium cation/aspartate anion149 –1.1 the diacid form hydrogen bonds with the phenolic
histidinium cation/glutamic acid149 –0.6 hydroxyls of resorcinols on opposite sides of the ring.156
histidinium cation/glutamate anion149 –1.1 The advantage, however, to the formation of the second
histidinium cation/aspartic acid154 –2.4
histidinium cation/aspartate anion154 –3.1 hydrogen bond, a cooperative but intermolecular situa-
glutamine/aspartic acid155 –1.8 tion, was only –10 kJ mol–1, and the hydrogen-bonded
glutamine/aspartate anion155 –4.1 complex could be observed only in aprotic solvents such
as CHCl3. In a similar fashion, standard and nonstandard
a
Standard free energy of formation for a hydrogen bond between the i and i + 4 base pairs between two nucleic acid bases will form in
positions of an a-helical peptide.
organic solvents or within micelles, but they do not form
230 Noncovalent Forces
in water.157 In fact, the normal bases in DNA itself can be acceptors to the same side of the monomer if not in
replaced by isosteres that cannot form any hydrogen proper alignment. This stacking of the bases results from
bonds, and the complementary base pairs form as effi- the hydrophobic effect. It must contribute significantly
ciently from steric complementarity as from base pair- to this association by roughly aligning the bases before
ing.158–161 the dimerization occurs. All of these observations
The hydrogen bonds in DNA contribute to the demonstrate that the hydrogen bonds between the base
specificity of the pairing of its bases, but in a negative pairs of a double-stranded nucleic acid do not contribute
sense. For example, in order to compensate partially for significantly to its stability.
the strongly unfavorable act of removing the two donors
and the one acceptor of guanine and the two acceptors Suggested Reading
and one donor of cytosine from water, the three hydro-
Page, M.I., & Jencks, W. (1971) Entropic contributions to rate accel-
gen bonds of the base pair must be formed in a double erations in enzymic and intramolecular reactions and the
helix. When only the thymine in the base pair with ade- chelated effect, Proc. Natl. Acad. Sci. U.S.A. 68, 1678–1683.
nine is replaced by a 2,4-difluoro-5-methylphenyl group
so that the two hydrogen bonds cannot form, the base Problem 5–10:
pair is less stable162 by about 7 kJ mol–1.
There are, however, situations in which hydrogen (A) Draw the structure of a hydrogen bond between
bonds to nucleic acid bases can be sufficiently assisted the propionate anion and a proton on N3 of neu-
by standard entropy of approximation so as to be mar- tral 4-methylimidazole (see Equation 2–31 for
ginally stable in aqueous solution. The complex between numbering). Include all lone pairs of electrons in
9-ethyladenine and a specifically designed host your drawing.
do not have significant numbers of donors and acceptors change in solvation of solute A between the other solvent
of hydrogen bonds. An ionic solute is held in water by and the water experienced when solute A, dissolved in
large, negative standard enthalpies of hydration. Solutes water at standard state, is transferred from the water into
that have donors and acceptors of hydrogen bonds are that other solvent at standard state.
held in water by the hydrogen bonds they form with it. As expected from everyday experience, the stan-
Solutes that neither are ions nor have donors and accep- dard free energies of transfer of hydrophobic solutes
tors of hydrogen bonds are expelled from liquid water. between water and an organic solvent such as benzene,
This expulsion is the hydrophobic effect. carbon tetrachloride, or the liquid solute itself are nega-
The nature of the hydrophobic effect has been suc- tive (Table 5–8). It is this negative change in standard free
cinctly described by G.S. Hartley:168 energy that produces the hydrophobic effect. The
hydrophobic effect is the only noncovalent force in aque-
The antipathy of the paraffin chain for water is, however,
ous solution that proceeds with a net negative change in
frequently misunderstood. There is no question of actual
repulsion between individual water molecules and paraffin
standard free energy, and it is thought to provide all of
chains, nor is there any very strong attraction of paraffin the driving force for the folding of polypeptides, the asso-
chains for one another. There is, however, a very strong ciation of ligands with a protein, and the formation of
attraction of water molecules for one another in comparison interfaces between subunits in oligomeric proteins.
with which the paraffin–paraffin or paraffin–water attrac- The explicit reason for the negative standard free
tions are very slight. energies of transfer at physiological temperatures is that
the standard entropies of transfer are larger than the
Aside from the overuse of “very”, it is clear from this
standard enthalpies of transfer (Table 5–8). At 25 ∞C, the
description that the term hydrophobic is misleading, if
standard enthalpy of transfer, which in most cases is pos-
its etymology is examined closely.169 The oil does not dis-
itive and thus unfavorable, is overcome by a much larger
like the water. In fact, measurements of interfacial ener-
positive and thus favorable standard entropy of transfer.
gies suggest that the oil prefers the water to itself.170
This peculiarity has led to the maxim that the hydropho-
Rather, water ejects the oil because water molecules
bic effect is entropy-driven, but this is a misleading view.
have a greater like for other water molecules.
Edsall and Scatchard180 noted that the incremental stan-
The hydrophobic effect upon a hydrophobic
dard entropies of solution for –CH2– groups in water had
solute A can be represented formally by the transfer of
anomalously large negative values, but they also pointed
the solute from water to another solvent j.171,172
out that because of the large changes in standard molal
heat capacity, these incremental standard entropies of
A ( H2O ) 1 A ( solvent j ) (5–62)
solution would become less and less significant as the
temperature was raised (Figure 5–20).181 As the tempera-
It has been proposed169 that the hydrophobic effect can ture increases, the standard enthalpy of transfer becomes
also be represented by the formation of a macroscopic more and more exothermic. At high enough tempera-
interface between an immiscible phase of hydrocarbon tures the standard entropy of transfer passes through
and water. There are, however, significant differences zero and becomes endergonic. As a result, at intermedi-
between the physical properties of such a macroscopic ate temperatures the reaction changes from an entropi-
interface and those of the microscopic layer of hydration cally driven process to an enthalpically driven process,
surrounding an isolated molecule of hydrocarbon dis- and at high temperatures the standard entropy of trans-
solved in water.173 Because, upon the folding of a mole- fer is actually unfavorable even though the transfer itself
cule of protein, individual side chains of the amino acids remains favorable because the standard free energy of
are transferred from the water to the interior of the folded transfer does not vary significantly with temperature.
structure, the transfer of a molecule of solute from water This behavior illustrates the fact, first noted by Edsall,
to another phase (Equation 5–62) seems to be the more that the most characteristic feature of the hydrophobic
appropriate process to examine for the present purposes. effect is not its change in standard entropy but its change
In the case of the failure of oil and water to mix, in standard heat capacity.182 The anomalously large, pos-
solute A in Equation 5–62 is a molecule of oil and solvent j itive incremental change in standard molal heat capacity
is the liquid oil itself. Because, in general, pure phases of of solution for solutes in water remains the most reliable
the different solutes A in a comparison may in themselves signature of the hydrophobic effect.181
have unique peculiarities, the transfer of a solute from The changes in the standard thermodynamic state
water to a nonpolar solvent should be studied in a sys- functions, such as standard entropy, standard enthalpy,
tematic fashion by choosing a common solvent for all of and standard heat capacity, that are associated with the
the transfers.174,175 The hydrophobic effect can be quanti- hydrophobic effect (Table 5–8) have been assigned to
fied24,176 by measuring a standard free energy for this trans- changes in the thermodynamic properties of the water
fer (Equation 5–18). The standard free energy of transfer surrounding the solute as it leaves the aqueous phase and
of solute A between water and solvent j, DG∞A,H2OÆj, is the changes in the thermodynamic properties of the nonpo-
change in standard free energy that results only from the lar solvent as it enters; in other words, to differences in
232 Noncovalent Forces
a
Values were calculated from the thermodynamic behavior of the partition coefficients (Equation 5–17). Original published values for the parti-
tion coefficients were in units of mole fraction at infinite dilution. These units were converted to units of molarity by dividing them by the appro-
priate molar volumes of the respective solvents. The partition coefficients in units of molarity were then converted to free energies of transfer and
entropies of transfer for units of corrected volume fraction (Equation 5–18). All values are for a temperature of 25 ∞C.
40 40
Standard state functions of transfer (kJ mol -1)
Benzene Pentane
TDS ª T DS ª
20 20
DH ª DH ª
0 0
DG ª
–20 –20
DG ª
–40 –40
300 400 300 400
Temperature (K) Temperature (K)
Figure 5–20: Dependence on temperature (Kelvin) of the changes in standard free energy (DG∞), standard enthalpy (DH∞), and standard
entropy multiplied by temperature (TDS∞) for the transfer of benzene between water and benzene (left panel) and the transfer of pentane
between water and pentane (right panel).181 The activities are expressed in units of mole fraction rather than as corrected volume fraction as
in Table 5–8; and the three state functions, free energy, enthalpy, and entropy, are expressed in units of kilojoules mole–1. The lines were cal-
culated from the values of the thermodynamic state functions measured in the range from 0 to 40 ∞C and either the assumption that the
observed behavior of DC op can be fit by an analytic function of temperature and that analytic function can be extrapolated beyond the range
of measurement (solid lines) or the assumption that DC op is independent of temperature and has a value that is the mean of the observed
values (dashed lines). The values of DC op over the range of measurements seem to decrease somewhat with increasing temperature and this
deviation is the basis for considering adjusted values of DC op for changes in temperature. The trends are the same regardless of the assump-
tion. Reprinted with permission from ref 181. Copyright 1988 Academic Press.
The Hydrophobic Effect 233
the solvation of the solute by the two solvents. The reason standard enthalpy of liquid water because the hydrogen
that these changes of the thermodynamic state functions bonds in this more structured cage should be stronger. If
are assigned to the respective solvents rather than the a noncovalent chemical transformation occurs in any
solutes is that the solutes in most cases are too small, as solution, the change in standard entropy observed is
in the case of methane, ethane, or propane, or too rigid, usually compensated by a change in standard enthalpy.
as in the case of benzene, to account internally for the sig- This observation can be stated mathematically7 as
nificant changes that are observed.
There are several observations which suggest that a DH ª (a ) = DH ¢ + T c DS ª (a ) (5–63)
more rigid hydrogen-bonded lattice, similar to the lat-
tice in ice Ih, surrounds hydrophobic solutes when they
are dissolved in water.183,184 Macroscopic solids known as where a refers to any noncovalent process and DH ¢ and
clathrates form spontaneously when hydrocarbons are Tc are parameters peculiar to that process. Many nonco-
mixed with pure water at proper molar ratios. Clathrates valent processes occurring in water188 satisfy this rela-
are solid, crystalline hydrates that are composed of iso- tionship with Tc = 280 ± 10 K.*
lated individual molecules of hydrocarbon encased in In the particular case of the hydrophobic effect, it
rigid hydrogen-bonded networks of water molecules. The has been noted29 that the decrease in standard entropy
thermodynamic parameters associated with these solids associated with the formation of a rigid cage of hydration
are of a magnitude sufficient to lead to the conclusion that as the solute enters water is not accompanied (Table 5–8)
similar rigid networks of water molecules should also sur- by the decrease in standard enthalpy (Equation 5–63) to
round hydrophobic solutes when they are present at be expected from such compensation.17 The argument is
dilute concentration.185 Whether or not clathrates are of that the missing enthalpy in this reaction is the enthalpy
relevance to the hydrophobic effect, their very existence that was required to crack open the lattice of the liquid
has subconsciously influenced our views of the process. water to form a cavity for the hydrophobic solute. Because
The unusually large changes in standard heat capacity standard entropy and standard enthalpy should com-
(Table 5–8)182 associated with the hydrophobic effect are pensate almost completely in the formation of the more
believed to result from an increase in the order of the rigid shell of hydration and have little effect on the over-
water surrounding the solute.186 It should be recalled that all reaction because of this cancellation, it is actually this
it is the gradual melting of the hydrogen-bonded lattice positive enthalpy of opening the lattice to form a cavity
that is supposed to be responsible for the anomalously or, conversely, the negative enthalpy realized upon col-
high heat capacity of liquid water itself, and increasing lapsing the cavity that produces the hydrophobic effect.
the amount or the degree of structure of this lattice should The positive enthalpy required to open the lattice
produce even greater capacity for melting and, hence, could result from the fact that some of the hydrogen
greater heat capacities. The partial molar volume of a bonds of the fluid lattice within liquid water must be
hydrophobic solute in water is usually about 13 cm3 mol–1 broken irretrievably when a cavity is formed for a
less than that of the same solute in other solvents,23 and hydrophobic solute. The empty donors and acceptors of
this difference has been thought to result from the effi- such broken hydrogen bonds can be observed in molec-
cient packing of the solute within a cage of hydrogen- ular dynamics simulations of aqueous solutions of
bonded waters that resembles the networks in clathrates. hydrophobic solutes.56 Water, however, is presumably
It has already been noted, however, that ice Ih, and pre- adept at rearranging around small nonpolar solutes to
sumably liquid water, contain a large amount of vacant form cages like those formed in the clathrates and in the
space that a solute could occupy (Figure 5–2), and this process retaining as many hydrogen bonds as there
occupation by itself could explain the smaller values for would be in the absence of those solutes, and other sim-
partial molar volume. The increase in the dielectric relax- ulations indicate that it is only when all of the dimen-
ation time187 and the anomalously large increase in the sions of a nonpolar solute are more than twice the
viscosity29 observed when a hydrophobic solute is added diameter of a molecule of water that significant numbers
to water and the expansion of the structure of water of hydrogen bonds are lost upon the formation of the
around hydrophobic solutes detected by neutron scat- cavity.189 Nevertheless, in this view, the magnitude of the
tering,184 however, also indicate that a more rigid and expulsion of hydrophobic solutes from aqueous solution
structured shell of water forms around the solute than the should depend on the number of hydrogen bonds that
water in the bulk solvent. must be broken to form the cavity.
If it is the case, however, that the water surrounding If this picture of the driving force propelling the
a hydrophobic solute is more structured and held within
a more rigid hydrogen-bonded lattice than water in the
bulk phase, there should be compensatory thermody- * This value for Tc means that values of changes in standard
enthalpy or changes in standard entropy for most processes occur-
namic changes29,186 associated with this increase in ring entirely in aqueous solution are monotonously uninformative
structure. Specifically, the standard enthalpy of the because they register mainly compensatory changes in the struc-
aqueous solution of hydrocarbon should be less than the ture of the solvent.
234 Noncovalent Forces
for standard free energy of transfer into benzene were available and
benzene has behavior similar to decane. –17 kJ mol –1 – 1.7 kJ (mol hydrogen–carbon bond)–1
(5–65)
der Waals forces and more than overcomes them. As the This inclusive, exothermic standard enthalpy of transfer
polarities of the solvents increase, the slopes of the lines must arise from the establishment of van der Waals inter-
in Figure 5–22 become less negative. This effect might be actions between water and the alkane during its entry. If
explained by noting that as the solvents become more so, this also provides evidence that water does partici-
polar, more standard free energy is required to form a pate in van der Waals interactions with hydrocarbon.
cavity within them, and this change is deducted from the In Figure 5–23, the lines for the transfer of the linear
favorable standard free energy of interaction between alkanes from the gas phase into water and from the gas
solute and solvent. In this view, water would simply be phase into hexadecane intersect on the ordinate at +4 kJ
the extreme example of the difficulty in forming a cavity. mol–1, in agreement with the intersection of all of the
The solutes chosen for the free energies of transfer lines in Figure 5–22 at +5 kJ mol–1. In Figure 5–23, this
from the gas phase to the various solvents in Figure 5–22
were all acyclic alkanes. Further insight into the
hydrophobic effect is gained when the free energies of * The offsets of the lines for aqueous solutions in Figure 5–23 sug-
transfer from the gas phase into hexadecane and from gest that it is the only the degree of unsaturation that is of conse-
the gas phase into water are plotted for linear alkanes, quence in the solvation experienced by hydrocarbons in water, but
the offsets of the lines for solutions in hexadecane suggest that,
branched alkanes, cyclic alkanes, alkenes, alkadienes, unlike in water where they are equivalent to only one degree of
cyclic alkenes, alkynes, cyclic alkadienes, saturated unsaturation, the solvation of a ring in hexadecane is equivalent to
arenes, and unsaturated arenes (Figure 5–23)175,196 as a the solvation of two degrees of unsaturation.
The Hydrophobic Effect 237
40 Figure 5–23: Standard free energies of transfer from the gas phase
to water (upper set of lines) and from the gas phase to hexadecane
(lower set of lines) as a function of the number of hydrogen–carbon
bonds in a hydrocarbon. It was found that when the values for the
partition coefficients for transfer from hexadecane to the gas phase
30 calculated with Equation 1–10 from the mobility of linear alkanes
(kJ mol –1)
–40
0 5 10 15 20 25
Hydrogen–carbon bonds
intersection corresponds to the standard free energy of with no hydrogen–carbon bonds, which would be the
transfer of a linear alkane with no hydrogen–carbon particular carbon–carbon double bonds alone. These
bonds from the gas phase to either condensed phase, intersections decrease monotonically in value as the
which is the transfer of nothing. It is reassuring that the number of carbon–carbon double bonds increases
transfer of nothing from water to hexadecane proceeds because van der Waals interactions are realized when
with no change in standard free energy, but the fact that each of these carbon–carbon double bonds are trans-
the standard free energy of transfer of nothing from the ferred from the gas phase to either water or hexadecane.
gas phase to either of these condensed phases is +4 kJ The slope of the line in Figure 5–23 correlating the
mol–1 suggests that, not surprisingly, there remains some standard free energies of transfer from the gas phase to
difference in standard free energy between the gas phase water for the linear alkanes is +1.45 kJ (mol hydrogen–
and either condensed phase unaccounted for by the carbon bond)–1, that for branched alkanes is +1.48 kJ
choices of standard state. If, as has been stated,197 an (mol hydrogen–carbon bond)–1, and that for arenes is
additional term equal to RT (+2.5 kJ mol–1) must be +1.50 kJ (mol hydrogen–carbon bond)–1. These values,
added to all of the standard free energies of transfer, this which are for hydrocarbons related to the side chains of
correction would only increase this difference. the amino acids, define the magnitude of the active
As the amount of unsaturation in the sets of hydro- exclusion of hydrogen–carbon bonds from water.
carbons increases, the standard free energies of transfer It is the separate solvations dissected in Figure
at the respective intersections decrease in value. Each of 5–23, the one accomplished by hexadecane and the one
these latter intersections corresponds to the free energy accomplished by water, that together further illustrate
of transfer of the unsaturated hydrocarbon in that set the unique contribution of hydrogen–carbon bonds to
238 Noncovalent Forces
the hydrophobic effect. When any two hydrocarbons surroundings, by definition, do not solvate it; only the
are compared that have the same number of hydrogen– former contribution is expressed, and the hydrophobic
carbon bonds, the more unsaturated one will be the effect is only the exclusion of the hydrogen–carbon
one with the larger surface area. As the degree of bonds from water. If, however, the hydrogen–carbon
unsaturation and hence the surface area increases at a bond is transferred to a condensed phase, such as hexa-
constant number of hydrogen–carbon bonds, the stan- decane, half of the magnitude of the hydrophobic effect
dard free energy of solvation exerted by the hexadecane is its solvation by the new solvent. The more recent habit
becomes more negative. As the degree of unsaturation of equating the hydrophobic effect only with the transfer
and hence the surface area increases at a constant from gas to water26,198,199 avoids dealing with this half of
number of hydrogen–carbon bonds, the standard free the hydrophobic effect as it was defined tradition-
energy of solvation exerted by the water becomes more ally.171,177 Both views, however, the one emphasizing sol-
negative. As the number of hydrogen–carbon bonds vation and the other transfer, persist.19,198
and hence the surface area increases at a constant It is hard to argue that transfer from water to gas is
degree of unsaturation, the standard free energy of sol- relevant to biochemical events such as the folding of a
vation exerted by the hexadecane becomes more nega- protein or the association of a substrate or inhibitor with
tive. In distinct contrast to these three trends, however, an enzyme. In such instances the hydrogen–carbon
as the number of hydrogen–carbon bonds increases at bonds are transferred from water into a condensed phase
a constant degree of unsaturation, the standard free that bears no resemblance to the gas phase. But the con-
energy of solvation exerted by the water becomes more densed phase in such situations does not resemble hexa-
positive. It is only the water that responds to an decane either; rather, it is the interior of the irregular
increase in the surface area of the solute by rejecting it solid that is the native protein itself. There is no reason to
more and more strongly but only when that increase in assume that the interior of a molecule of protein behaves
the surface area is accomplished by adding hydrogen– as if it were an isotropic solvent.
carbon bonds. These considerations bring the argument back to
When the standard free energies of transfer from the van der Waals forces between the hydrogen–carbon
the gas phase to water for an even larger set of organic bond and its new surroundings. Regardless of whether or
solutes than those displayed in Figure 5–23 are exam- not water engages in van der Waals interactions with a
ined,198 small differences in the standard free energy of hydrogen–carbon bond that are of the same magnitude
transfer (mole hydrogen–carbon bond)–1 for different as those of the new surroundings, the results in Figure
types of hydrogen–carbon bond become apparent, and 5–23 suggest that water behaves as though it does not
these differences have been quantified by defining a engage in van der Waals interactions with a hydrogen–
value for the contribution of each type to the overall carbon bond at all. Once the hydrogen–carbon bond has
standard free energy. For pairs of molecules otherwise been expelled from water, the standard free energy of its
identical except that one contains –CH2CH2– and the transfer is significantly affected by the standard free
other –CH(CH3)–, the free energies of transfer from gas energy of the van der Waals forces between it and its new
to water differ by +0.4 kJ mol–1 (15%), and for pairs of surroundings. As a result, and ironically, it is these van
molecules, otherwise identical except that one contains der Waals forces that determine much of the magnitude
–CH2CH2CH2– and the other contains –C(CH3)2–, the of the hydrophobic effect in any particular circumstance,
standard free energies of transfer from gas to water and it is the magnitude of the van der Waals force felt by
differ by +0.6 kJ mol–1 (15%). These differences might be a hydrophobic functional group in the interior of the
taken as evidence that branched alkanes are more folded molecule of protein that significantly affects the
hydrophobic than unbranched alkanes except for the strength of the hydrophobic effect it is able to exert
fact that when standard free energies of transfer from during the folding of the polypeptide.
water to hexadecane (Figure 5–21) are examined instead When a polypeptide is present in water in its
of those from gas to water, branched alkanes (‡ in unfolded state, the hydrophobic hydrogen–carbon
Figure 5–21) are less hydrophobic than unbranched. bonds scattered along its length are unstable relative to
These opposite conclusions bring into focus the unfor- any state in which they are in contact with each other and
tunate fact that currently there are two ways of defining out of contact with water. This is the hydrophobic effect
the hydrophobic effect: one stressing only solvation by that drives the process of folding. Because ions and
water, and the other, transfer from water to hydro- donors and acceptors of hydrogen bonds are more stable
carbon. in the hydrated state than in any state in which they are
There are two significant contributions to the isolated from water, even if they are fully joined in ion
hydrophobic effect (Figure 5–23): the active exclusion of pairs and hydrogen bonds, they cannot provide net
hydrogen–carbon bonds from water and the solvation of favorable standard free energy to the process of folding.
those hydrogen–carbon bonds by the new surroundings The hydrophobic effect is the only noncovalent force that
in which they find themselves. If a hydrogen–carbon provides net favorable standard free energy to drive the
bond is transferred from water to the gas phase, the new folding of a polypeptide.
The Hydrophobic Effect 239
{ ( ) ( )}
(G) Determine DS ∞HC,H2OÆalc in the equation f A,C6H12 V A,H2O V A,C6H12
DS ∞alc,H2OÆalc = DS ∞alc + nDS ∞HC,H2OÆalc and ln exp – =
DH ∞HC,H2OÆalc in the equation DH ∞alc,H2OÆalc = f A,H2O V H2O V C6H12
DH ∞alc + nDH ∞HC,H2OÆalc from the values in
( ) ( )( )( )
Table 5–8.
[ A ]C6H12 V A,H2O V A,H2O V A,C6H12
(H) Is DH ∞ or DS ∞ the major contributor to the ln + ln + –
hydrophobic effect on a hydrogen–carbon bond [ A ]H2O V A,C6H12 V H2O V C6H12
at 25 ∞C?
–– –– –– –– –– ––
if (V A,H2O/V H2O), (V A,C6H12/V C6H12), and (V A,H2O/V A,C6H12) do
not change with temperature
Problem 5–14: Calculate the standard enthalpy of
transfer and the standard entropy of transfer of n-butane
( )
from water to liquid n-butane at 50, 70 and 90 ∞C. Recall [ A ]C6H12
that ! ln
( )
! ln K P [ A ]H2O
d DH = DC p dT =
!T –1 P
!T –1
and P
DCp
d DS = dT (A) Calculate the standard enthalpy of transfer
T for each compound from the behavior of its
partition coefficient as a function of the tempera-
and assume that all partial volumes and DC ∞p are invari- ture.
ant with temperature over the range 20–100 ∞C. (B) If a hydrogen bond were lost every time a
3-methylindole was transferred from water to
cyclohexane and no hydrogen bond were lost
Problem 5–15: The partition coefficients of N-methylin- every time N-methylindole was transferred from
dole and 3-methylindole water to cyclohexane, which standard enthalpy of
transfer should be more negative?
CH3
(C) When a molecule of 3-methylindole leaves water
for cyclohexane, what is the net change in hydro-
gen bonding for the entire system?
N N
H (D) Why isn’t the standard enthalpy change for the
CH3
transfer of 3-methylindole more negative than
between water and cyclohexane were examined201 to that for the transfer of N-methylindole?
investigate the effect of the hydrogen-bond donor in
tryptophan on its partition between water and the anhy-
drous interior of a protein. The coefficients for partition
from water to cyclohexane in the following table are Problem 5–16: The N-acetyl-a-amides of a series of
expressed in units of molarity and have been extrapo- amino acids were synthesized, and the partition of
lated to infinite dilution. each of them between 1-octanol and water was
assayed. The measured distribution coefficients for the
partition coefficient (M M–1) reaction
temperature (K) 3-methylindole N-methylindole
N-acetyl-a-amide of amino acid (H2O) 1
288 19 300 N-acetyl-a-amide of amino acid (1-octanol)
298 19 290
308 20 270 in units of (moles of solute) (liter of water)–1 (moles of
318 20 260 solute)–1 (liter of 1-octanol) at 20 ∞C are presented in the
328 19 230 following table.202
Hydropathy 241
( )
N-acetylamide of mol of solute
(L of 1-octanol)–1 ments must be made to counter these free energies.
amino acid Hydrophobic solutes leave aqueous solution with a pref-
mol of solute (L of water)–1 erence for almost any other condensed or uncondensed
isoleucine 0.93 phase because when they are dissolved in water, they
leucine 0.75 cannot form any net favorable interactions with it. When
methionine 0.25 they are withdrawn from water into another phase, sig-
valine 0.24 nificant favorable changes in free energy are realized.
alanine 0.030 Each of these particular outcomes arises from the respec-
threonine 0.027 tive changes in the structure of water that accompany the
serine 0.0135 transfer of the functional groups between water and a
glutamine 0.0089 nonaqueous phase.
asparagine 0.0038 Viewed in this light, few solutes elicit indifferent
responses from water. Solutes are either hydrophilic,
(A) Estimate the partial molar volumes of the demonstrating a compatability with water, or hydropho-
N-acetylamides in water by using the algorithm of bic, demonstrating an incompatability with water. These
Traube. strong responses together are hydropathy. Hydropathy
is the continuous spectrum from compatibility to incom-
Because partial molar volumes in 1-octanol are unavail- patibility with water, at one end of which is hydrophilic-
able, assume that they are equal to those in water + ity, and at the other, hydrophobicity.
13 cm3 mol–1. In all cases, the final concentrations of the It was suggested by Hine and Mookerjee196 that the
model compounds in each phase were less than 10 mM. hydropathy of a solute could be judged from its standard
(B) Calculate the standard free energy of transfer in free energy of transfer between water and the gas.
kilojoules mole–1 for each of these model com- Assembling values from the tables of Hine and
pounds from water into 1-octanol, DG∞A,H2OÆoctanol. Mookerjee196 and providing several previously unmea-
sured values, Wolfenden, Andersson, Cullis, and
(C) Plot DG∞A,H2OÆoctanol against the number of Southgate199,203 have tabulated the standard free energies
hydrogen–carbon bonds in each N-acetylamide. of transfer between water and the gas for model com-
(D) Draw a line across your plot with a slope of –2.8 kJ pounds of the side chains of the amino acids in which the
(mol hydrogen–carbon bonds)–1 that passes a carbon has been replaced by hydrogen (Table 5–9).
through the four points for leucine, isoleucine, These values reflect the magnitudes of the standard free
valine, and alanine on your plot. Why did you do energies realized when the various functional groups
this? present in the amino acids are removed from water at
pH 7. As previously noted, the hydrocarbons among the
(E) Why don’t the points for glutamine and side chains are expelled spontaneously from water with
asparagine lie 52 kJ mol–1 above the line and the standard free energies of transfer between about –5 and
points for serine and threonine lie 26 kJ mol–1 –20 kJ mol–1.
above the line? The hydroxyl groups on ethanol and methanol
increase their respective standard free energies of trans-
fer from water to the gas phase by +27 kJ mol–1 relative to
Hydropathy alkanes of the same number of hydrogen–carbon bonds.
In part, these unfavorable incremental standard free
As ionic interactions, hydrogen bonds, and the energies of transfer arise from the fact that a net of one
hydrophobic effect are considered in turn, it becomes hydrogen bond is lost to the system when a hydroxyl
clear that the functional groups participating in each of group is removed from water into the gas phase.
these processes—ionic groups, donors and acceptors of Methanethiol, however, has a standard free energy of
hydrogen bonds, and hydrogen–carbon bonds—experi- transfer only +10 kJ mol–1 greater than an alkane with the
ence strong favorable and unfavorable interactions with same number of hydrogen–carbon bonds, while ethyl
water. Ionic solutes display large negative standard methyl sulfide has a standard free energy of transfer
enthalpies of hydration that dominate their behavior in +12 kJ mol–1 greater than an alkane with the same
water. When these ions are withdrawn from water, large number of hydrogen–carbon bonds. A comparison of
investments of free energy must be made to strip the these two values with those for ethanol and methanol
shells of hydration from them. Solutes with donors and suggests that it is the sulfur that is hydrophilic, not the
acceptors form hydrogen bonds with water molecules potential hydrogen-bond donor on the methanethiol.
that have significant negative free energies of formation Propionamide and acetamide have standard free
because of the high molar concentration of the water in energies of transfer +44 kJ mol–1 greater than alkanes of
242 Noncovalent Forces
Table 5–9: Standard Free Energies of Transfer of Model Compounds for the Amino Acids between Water and the Gas Phase
at 25 ∞C and pH 7a
a
The values for the standard free energies of transfer from water to the gas phase for the various model compounds were obtained from several tables.196,199,203 They were
usually presented as the transfer of the compound between the standard state of the real gas at infinitely low partial pressure with concentration expressed in atmospheres
and the standard state of the infinitely dilute solution with concentration expressed in molarity. The units of concentration were changed to moles liter–1 for the gas and
corrected volume fraction (Equation 5–5) for the solution. The partial molar volumes of the solutes26 were calculated24 for each solute from the formulas developed by
Traube.23 The standard free energies of transfer from water to the gas phase were calculated with Equation 5–21. bValues for the pKa of the various amino acids in a polypep-
tide (Table 2–2) were used to correct the standard free energies of transfer of the neutral compounds199,203 for the standard free energy of neutralization required at pH 7
(Equation 5–66).
the same number of hydrogen–carbon bonds. An argu- presence of the pyrrole donor and the hydroxyl, respec-
ment can be made that these large positive standard free tively, that form hydrogen bonds with the donors and
energies of transfer for the primary amides arise simply acceptors of water that must be broken during transfer.
because each of them has two acceptors and two donors The standard free energy of transfer for neutral
so that a net loss to the system of two hydrogen bonds 4-methylimidazole is +39 kJ mol–1 greater that that of an
occurs upon their transfer to the gas phase. arene with the same number of hydrogen–carbon bonds,
Consequently, their standard free energies of transfer in part because its acceptor (pKa = 7.5) is much stronger
relative to alkanes of the same number of than that of either 4-cresol (pKa = 10.2) or 3-methylindole
hydrogen–carbon bonds should be about twice those of (pKa = –2)
ethanol and methanol, which they are. The standard free In the case of the amino acids that are charged at
energies of transfer for acetamide or propionamide are pH 7, such as glutamic acid, aspartic acid, histidine,
about +12 kJ mol–1 greater than those for neutral acetic lysine, and arginine, the tabulated partition coefficients
acid or neutral propionic acid, respectively. The transfer for transfer of the model compounds between water and
of the carboxylic acid on either of these two acids the gas are for acidic solutions or basic solutions in which
involves the net loss of only one hydrogen bond to the those model compounds are dissolved entirely as the
solution. Although all of these explanations seem rea- neutral conjugate acids or neutral conjugate bases,
sonable, they leave unexplained the fact that the stan- respectively. At pH 7 only a fraction of the actual amino
dard free energy of transfer for N-methylacetamide, the acid will be present as the neutral species, and this will
model for the peptide bond, is actually greater than that decrease the value of the partition coefficient for transfer
for propionamide, which has the same number of hydro- from water to the gas. This decrease in the partition coef-
gen–carbon bonds, even though a net loss of only one ficient can be incorporated into the standard free energy
hydrogen bond occurs when the former is removed from of transfer with the formula203
water. It was suggested after the fact that in the case of
the amides the acceptors may be more important than DG ªA TOT, H2O Æ g = DG ªA 0,H2O Æ g – RT lnaA (5–66)
the donors,204 and subsequently spectroscopic evidence
consistent with this suggestion was reported, demon-
strating that each of the two acceptors on the acyl oxygen where A0 refers to the un-ionized form of the model com-
of N-methylacetamide interacts with water with about pound, ATOT is the sum of the neutral form and the
twice the standard enthalpy as that for the interaction of charged form, and aA is the fraction that is un-ionized at
the single donor.116,205,206 pH 7. The values for aA were calculated with the values of
The transfers of the heterocyclic side chains are pKa for the amino acids in a polypeptide (Table 2–2) rather
dominated by their donors and acceptors of hydrogen than with the values of pKa for the model compounds
bonds. 3-Methylindole and 4-cresol have standard free themselves. The fraction of un-ionized species, aA, varies
energies of transfer +18 and +21 kJ mol–1, respectively, from 0.8 for histidine to 10–6 for arginine. Even the stan-
greater than an arene with the same number of hydro- dard free energy necessary to neutralize N-methylguani-
gen–carbon bonds. These increments arise from the dinium and transfer the neutral compound into the gas
Hydropathy 243
(+73 kJ mol–1) is still less than the standard free energy that solutes removed from water, however, are transferred
would be required to transfer it into the gas as a cation into a new environment, the interior of the protein. The
(Figure 5–8). Therefore, all of the tabulated values should standard free energies of transfer into this new environ-
refer to the most likely reactions, namely, neutralization ment are the second half of the reaction, but the standard
followed by transfer. For example, the only reasonable free energies of transfer into the interior of a protein from
reaction for the transfer of n-butylammonium ion, dis- the gas cannot be predicted with any certainty because
solved in H2O at pH 7, to the gas would be the interior of a protein is a heterogeneous solid.
Presumably, noncovalent forces with negative stan-
( dard free energies of formation arise as the amino acids
NH 3 (aq) and the polypeptide backbone are packed into the inte-
rior. The standard free energy of transfer for the removal
1
of polypeptide, or a small solute from water to the inte- water and N-cyclohexyl-2-pyrrolidone have been meas-
rior of a protein should be similar to the standard free ured and used to construct a scale of hydropathies.212
energy of transfer for a model compound of that amino In competition with these scales of hydropathy based
acid or segment of polypeptide between water and a sol- on standard free energies of transfer are scales derived
vent the properties of which resemble those of the inte- from the locations of the various amino acid side chains
rior of a protein. Scales of hydropathy for the side chains in crystallographic molecular models of native proteins.
of the amino acids based on this proposal have been pre- The logic in this case is that the purpose of all of these
sented. They differ in the personal preferences of their scales is to estimate contributions due to changes in sol-
proponents for the type of model compounds and the vation during the folding of a polypeptide and the degree
particular solvent chosen as the basis of the scale. with which particular amino acids are buried in the inte-
The first of these was the scale of hydrophobicity rior or exposed to the solvent should directly indicate how
proposed by Nozaki and Tanford,207 which, as its name hydrophobic or hydrophilic, respectively, they are. In these
implies, was confined to only one end of the spectrum. It computations, the surface area of each amino acid that is
relied on the solubilities of the zwitterionic amino acids accessible to water213,214 in a set of crystallographic molec-
in ethanol that had been previously tabulated by Cohn ular models is individually determined. These individual
and Edsall.24 By subtracting the standard free energy of accessibilities to water are then grouped by amino acid,
transfer for glycine between water and ethanol from the and average accessibilities for each amino acid are calcu-
standard free energies of transfer for hydrophobic amino lated. The uncertainty in these calculations is in the cal-
acids between water and ethanol, they estimated the culation of these averages, and the three scales of
standard free energies of transfer for the side chains hydropathy based on the accessible surface area in molec-
alone between water and ethanol. The implication in for- ular models of folded polypeptides215–217 are not equiva-
mulating a scale of this type is that the interior of a pro- lent, even though they are based on similar raw data.
tein resembled ethanol in its interaction with Finally, there are the scales of hydropathies for the
hydrophobic amino acids, and this may not be far from amino acids that are based on mixtures of the pure scales
the truth because all nonaqueous solvents display simi- discussed so far. In one case,218 a scale based on the acces-
lar standard free energies of solvation for hydrophobic sible surface area of amino acids in crystallographic
solutes (Figure 5–22). models was modified by a theoretical calculation of the
Since this first scale was proposed, at least 35 others standard free energy required to break hydrogen bonds
have appeared,189 and they have usually been expanded and neutralize charge. In another,26 the standard free ener-
spectra including all 20 of the amino acids, hydrophilic as gies of transfer between water and the gas and a tabulation
well as hydrophobic. Most have been based on standard of accessible surface areas were combined with personal
free energies of transfer. The original description of the preference to produce a scale of hydropathies. In a third
hydrophobic effect was based on observations of abnor- case,219 a consensus scale of hydropathies was inferred
mal decreases in the surface tension of water that result from two scales based on standard free energy of transfer
when hydrophobic solutes display a preference for the and three scales based on accessible surface area in folded
surface of an aqueous solution rather than its interior,7,208 polypeptides. In a fourth case,220 a correlation between
and a scale of hydropathy based on the change in the accessible surface area and the hydrophobic effect, the
surface tension of an aqueous solution with the change standard free energy required to neutralize charged amino
in concentration of the different amino acids has been acids (Equation 5–66), and semi-empirical estimates of the
presented.209 The scale of hydrophobicity based on the standard free energy for withdrawing each individual
solubilities of the amino acids in ethanol has been hydrogen-bond donor and acceptor from water were com-
expanded to include uncharged, hydrophilic amino bined to obtain a scale of estimated standard free energies
acids.210 The standard free energies of transfer of the of transfer for each of the amino acids, when located in an
model compounds for the amino acids between water a helix, from water to a phase of hydrocarbon.
and the gas (Table 5–9) have also been used to create At low resolution all of these scales are similar to
scales of hydropathies.198,199,203 The standard free ener- each other. The amino acids the side chains of which are
gies of transfer of various solutes between water and alkanes, namely, leucine, isoleucine, and valine, are the
1-octanol have been proposed as the parameters for a most hydrophobic amino acids; the charged amino acids
general scale for the hydrophobic effect.211 The standard the pKa of which is farthest from pH 7, arginine, lysine,
free energies of transfer of the N-acetyl-a-amides of each glutamate, and aspartate, are the most hydrophilic; and
of the amino acids (Figure 2–1) between water and neutral but polar amino acids such as serine and threo-
1-octanol have been determined, and they have been nine reside in the middle; but the details of the ranking
used to construct a scale of hydropathies.202 It has been and the relative magnitudes of the parameters are dra-
proposed that N-cyclohexyl-2-pyrrolidone would be a matically different.189 At the moment, each of these
better solvent to use as reference for standard free ener- attempts to estimate the standard free energy of transfer
gies of transfer into the interior of a protein, and standard for each of the amino acids between water and the inte-
free energies of transfer for the amino acids between rior of a protein has its particular proponents, some
Hydropathy 245
more forceful than others, and there is no unambiguous If one wishes to compare two different solvents and their
way to choose among them or assess whether any of effects on A
them is realistic or unrealistic.
The usual criterion for the reliability of each scale is
to demonstrate either that it correlates with the distribu- { ( / ) } = m ªA,solid
m ªA, 1 + RT ln f A,sat,1 exp 1 – V A,1 V 1
tion of amino acids between the surface and the interior
of a protein, if it is based on standard free energy of trans-
fer (Figure 5–24),199,202,216 or that it correlates with stan- { ( / ) } = m ªA,solid
m ªA, 2 + RT ln f A,sat,2 exp 1 – V A,2 V 2
dard free energies of transfer, if it is based on the
distribution of amino acids between the surface and the
interior.216 None of these correlations suggest that any m ªA, 2 – m ªA, 1 = DG ªA, 1 Æ 2
one of the scales is more realistic than any of the others.
Suggested Reading
Kyte, J., & Doolittle, R.F. (1982) A simple method for displaying the
hydropathic character of a protein, J. Mol. Biol. 157, 105–132.
DG ªA, 1 Æ 2 = RT ln
[ A ]sat,1
[ A ]sat,2
exp
(
V A,2
V2
–
V A,1
V1 )
Problem 5–17: Consider a saturated solution of the where m∞A,1 is the chemical potential of A in solvent 1 at a
solute A in solvent j. In this case, the solution of A is in concentration of 1 corrected volume fraction, fA,sat,2 is the
equilibrium with solid A and concentration of A in a saturated solution in solvent 2 in
units of volume fraction, and DG∞A,1Æ2 is the standard free
mA,sat, j = m A,solid energy change when A is transferred from solvent 1 to
solvent 2. The quantity DG∞A,1Æ2 is a measure of the
change in standard free energy for the following type of
{
m ªA, j + RT ln f A,sat, j exp 1 – V A,j V j ( / ) } = m ªA,solid reaction:
tein.
Ile
Val The following data24,207 are for 25 ∞C
Gly Ala Leu Phe
Met
Trp
concn at saturation partial molar
0 His
Ser [g (100 g of solvent)–1] volume
Asp Asn Thr Pro Tyr (mL mol–1)
amino acid H2O EtOH
Gln
Glu glycine 25.16 0.00382 43.5
5 leucine 2.17 0.01960 108.5
Arg
Lys –– ––
(A) Calculate fA,sat,j exp[(1 – V A,j/V j)] for these four sit-
–– ––
uations. Assume V A,H2O = V A,EtOH
15 10 5 0 –5
DG∞H2O Æ octanol (kJ mol –1) (B) Calculate DG∞A,H2OÆEtOH for glycine and leucine.
(C) Use the value you have for glycine to subtract
Figure 5–24: Correlation between the standard free energies of
transfer of N-acetyl-a-amides of the amino acids between octanol
away the contribution of –OOCCH2NH+3 to the
and water with the degree to which the amino acids are buried in solubility of leucine. The remainder is an estimate
the interior of a molecule of protein.202 The partition coefficients of the standard free energy of transfer of the
for the distribution of the N-acetyl-a-amides of the 20 amino acids leucine side chain from H2O to ethanol.
between water at pH 7 and octanol at room temperature were
measured (concentrations in molarity) and standard free energies (D) Draw the structure of the glutamine side chain
of transfer, DG∞aa,H2OÆoctanol, were calculated. Each of the 5220 amino and divide it into hydrophobic or hydrogen-
acids in the crystallographic molecular models of 22 proteins was bonding regions. Label each region on your draw-
identified as either buried (less than 0.2 nm2 of accessible surface
area) or accessible to water (greater than 0.2 nm2 of accessible sur-
ing and indicate all hydrogen-bond donors and
face area).216 For each type of amino acid a partition ratio [number acceptors with D or A, respectively.
buried (number accessible)–1] was calculated, and from this parti-
tion ratio, a standard free energy of transfer, DG∞aa,H2OÆinterior was
(E) Estimate the DG∞glutamine,H2OÆethanol contributed
calculated. Adapted with permission from ref 202. Copyright 1983 only by the hydrogen–carbon bonds of the side
Elsevier. chain.
246 Noncovalent Forces
(F) The solubility of glutamine (concentration at sat- 9. Narten, A.H., & Levy, H.A. (1971) J. Chem. Phys. 55,
uration) in water is 4.6 g (100 g of H2O)–1, the sol- 2263–2269.
ubility of glutamine in ethanol is 4.59 ¥ 10–4 g 10. Morgan, J., & Warren, B.E. (1938) J. Chem. Phys. 6,
(100 g of ethanol)–1, and the partial molar volume 666–673.
of glutamine in water is 96.5 mL mol–1. Calculate 11. Narten, A.H., & Levy, H.A. (1969) Science 165, 447–454.
12. Narten, A.H., Danford, M.D., & Levy, H.A. (1967)
DG∞transfer,H2OÆethanol of the glutamine side chain.
Discuss. Faraday Soc. 43, 97–107.
(G) Estimate DG∞transfer,H2OÆethanol for the –CONH2 func- 13. Lavrov, V.V. (1947) Zh. Sakh. Promsti. 17, 1027–1034.
tional group of glutamine. 14. Poirier, J.P., Sotin, C., & Peyronneau, J. (1981) Nature
292, 225–227.
15. Wittebort, R.J., Usha, M.G., Ruben, D.J., Wemmer, D.E.,
Problem 5–18: Consider the following table: & Pines, A. (1988) J. Am. Chem. Soc. 110, 5668–5671.
16. Walrafen, G.E. (1968) J. Chem. Phys. 48, 244–251.
A 0.25 G 0.16 P –0.07
17. Silverstein, K.A.T., Haymet, A.D.J., & Dill, K.A. (2000) J.
R –1.76 H –0.40 S –0.26 Am. Chem. Soc. 122, 8037–8041.
N –0.64 I 0.73 T –0.18 18. Sharp, K.A., Nicholls, A., Friedman, R., & Honig, B.
D –0.72 L 0.53 W 0.37 (1991) Biochemistry 30, 9686–9697.
C 0.04 K –1.10 Y 0.02 19. Sharp, K.A., Nicholls, A., Fine, R.F., & Honig, B. (1991)
E –0.62 M 0.26 V 0.54 Science 252, 106–109.
Q –0.69 F 0.61 20. Deyoung, L.R., & Dill, K.A. (1990) J. Phys. Chem. 94,
(A) What are the letters and what is the intention of 801–809.
21. Huggins, M.L. (1941) J. Chem. Phys. 9, 440.
assigning these numbers to these letters?
22. Flory, P.J. (1941) J. Chem. Phys. 9, 660–661.
(B) What common property is shared by the letters 23. Traube, J. (1899) Samml. Chem. Chem. Tech. Vortr. 4,
with the positive numbers? 255–332.
24. Cohn, E.J., & Edsall, J.T. (1943) Proteins, Amino Acids
(C) What common property is shared by the letters and Peptides as Ions and Dipolar Ions, pp 157–161,
with the negative numbers? Reinhold, New York.
25. McAuliffe, C. (1966) J. Phys. Chem. 70, 1267–1275.
(D) Divide the letters with the positive numbers into 26. Kyte, J., & Doolittle, R.F. (1982) J. Mol. Biol. 157,
two groups based on differences in chemical 105–132.
properties. Why are the numbers in one of these 27. Moore, W. (1972) Physical Chemistry, 4th ed., pp
groups less positive than the numbers in the 890–894, Prentice-Hall, Englewood Cliffs, NJ.
other? 28. Pauling, L. (1960) The Nature of the Chemical Bond and
the Structure of Molecules and Crystals, 3rd ed., p 449,
(E) On what types of measurements could the num- Cornell University Press, Ithaca, NY.
bers assigned to the letters be based? 29. Jencks, W.P. (1969) Catalysis in Chemistry and
(F) Draw the interactions with water that are one of Enzymology, McGraw-Hill, New York.
the reasons that R has a value of –1.76. There are 30. Meot-Ner, M. (1984) J. Am. Chem. Soc. 106, 1265–1272.
31. Benoit, R.L., & Lam, S.Y. (1974) J. Am. Chem. Soc. 96,
two reasons that R has such a low value: the inter-
7385–7390.
actions you have just drawn and another of its 32. Parsegian, A. (1969) Nature 221, 844–846.
properties. What are these two reasons? 33. Stokes, R.H. (1964) J. Am. Chem. Soc. 86, 979–982.
34. Noyes, R.M. (1962) J. Am. Chem. Soc. 84, 513–522.
35. Cox, B.G., & Parker, A.J. (1973) J. Am. Chem. Soc. 95,
6879–6884.
References 36. Leikin, S., Parsegian, V.A., Rau, D.C., & Rand, R.P.
1. Del Bene, J., & Pople, J.A. (1970) J. Chem. Phys. 52, (1993) Annu. Rev. Phys. Chem. 44, 369–395.
4858–4866. 37. Rand, R.P., Fuller, N., Parsegian, V.A., & Rau, D.C.
2. Hankins, D., Moskowitz, J.W., & Stillinger, F.H., Jr. (1988) Biochemistry 27, 7711–7722.
(1970) J. Chem. Phys. 53, 4544–4554. 38. Tanford, C. (1954) J. Am. Chem. Soc. 76, 945–946.
3. Symons, M.C.R. (1972) Nature 239, 257–259. 39. Likhodi, O., & Chalikian, T.V. (2000) J. Am. Chem. Soc.
4. Eisenberg, D., & Kauzmann, W. (1969) The Structure 122, 7860–7868.
and Properties of Water, Clarendon Press, Oxford, 40. Wimley, W.C., Gawrisch, K., Creamer, T.P., & White,
England. S.H. (1996) Proc. Natl. Acad. Sci. U.S.A. 93, 2985–2990.
5. Dyke, T.R., & Muenter, J.S. (1974) J. Chem. Phys. 60, 41. O’Shea, E.K., Lumb, K.J., & Kim, P.S. (1993) Curr. Biol.
2929–2930. 3, 658–667.
6. Dyke, T.R., Mack, K.M., & Muenter, J.S. (1977) J. Chem. 42. Moore, W.J. (1972) Physical Chemistry, 4th ed., pp
Phys. 66, 498–510. 420–476, Prentice-Hall, Englewood Cliffs, NJ.
7. Edsall, J.T., & McKenzie, H.A. (1978) Adv. Biophys. 10, 43. Wyman, J., Jr. (1936) Chem. Rev. 19, 213–239.
137–207. 44. Hine, J. (1972) J. Am. Chem. Soc. 94, 5766–5771.
8. Frank, H.S. (1970) Science 169, 635–641. 45. Eigen, M. (1964) Angew. Chem., Int. Ed. Engl. 3, 1–19.
References 247
46. Pimentel, G.C., & McClellan, A.L. (1960) The Hydrogen 76. Markley, J.L., & Westler, W.M. (1996) Biochemistry 35,
Bond, W.H. Freeman, San Francisco, CA. 11092–11097.
47. Taylor, R., Kennard, O., & Versichel, W. (1983) J. Am. 77. Edison, A.S., Weinhold, F., & Markley, J.L. (1995) J. Am.
Chem. Soc. 105, 5761–5766. Chem. Soc. 117, 9619–9624.
48. Taylor, R., Kennard, O., & Versichel, W. (1984) J. Am. 78. Kreevoy, M.M., & Liang, T.M. (1980) J. Am. Chem. Soc.
Chem. Soc. 106, 244–248. 102, 3315–3322.
49. Gorbitz, C.H., & Etter, M.C. (1992) J. Am. Chem. Soc. 79. Chiang, Y., Kresge, A.J., & More O’Ferrall, R.A. (1980) J.
114, 627–631. Chem. Soc., Perkin Trans. 2, 1832–1839.
50. Thanki, N., Thornton, J.M., & Goodfellow, J.M. (1988) J. 80. Perrin, C.L., & Nielson, J.B. (1997) Annu. Rev. Phys.
Mol. Biol. 202, 637–657. Chem. 48, 511–544.
51. Kuhn, L.P., Wires, R.A., Ruoff, W., & Kwart, H. (1969) J. 81. Baltzer, L., & Bergman, N.A. (1982) J. Chem. Soc., Perkin
Am. Chem. Soc. 91, 4790–4793. Trans. 2, 313–319.
52. Donohue, J. (1969) J. Mol. Biol. 45, 231–235. 82. Taft, R.W., Gurka, D., Joris, L., Schleyer, P., & Rakshys,
53. Cox, R.A., Druet, L.M., Klausner, A.E., Modro, T.A., J.W. (1969) J. Am. Chem. Soc. 91, 4801–4808.
Wan, P., & Yates, K. (1981) Can. J. Chem. 59, 1568–1573. 83. Arnett, E.M., Mitchell, E.J., & Murty, T.S.S.R. (1974) J.
54. McCormack, A.C., McDonnell, C.M., O’Ferrall, R.A.M., Am. Chem. Soc. 96, 3875–3891.
O’Donoghue, A.C., & Rao, S.N. (2002) J. Am. Chem. Soc. 84. Stymne, B., Stymne, H., & Wettermark, G. (1973) J. Am.
124, 8575–8583. Chem. Soc. 95, 3490–3494.
55. Kresge, A.J., Chen, H.J., Hakka, L.E., & Kouba, J.E. 85. Arnett, E.M. (1963) Prog. Phys. Org. Chem. 1, 223–
(1971) J. Am. Chem. Soc. 93, 6174–6181. 403.
56. Ravishanker, G., Mehrotra, P.K., Mezei, M., & 86. Rubin, J., Senkowski, B.Z., & Panson, G.S. (1964) J. Phys.
Beveridge, D.L. (1984) J. Am. Chem. Soc. 106, Chem. 68, 1601–1602.
4102–4108. 87. Shan, S.O., Loh, S., & Herschlag, D. (1996) Science 272,
57. Suzuki, S., Green, P.G., Bumgarner, R.E., Dasgupta, S., 97–101.
Goddard, W.A., & Blake, G.A. (1992) Science 257, 88. Arnett, E.M. (1963) Prog. Phys. Org. Chem. 1, 223–403.
942–944. 89. Gordy, W., & Stanford, S.C. (1941) J. Chem. Phys. 9,
58. Atwood, J.L., Hamada, F., Robinson, K.D., Orr, G.W., & 204–214.
Vincent, R.L. (1991) Nature 349, 683–684. 90. Tobin, J.B., Whitt, S.A., Cassidy, C.S., & Frey, P.A. (1995)
59. Allen, F.H., Howard, J.A.K., Hoy, V.J., Desiraju, G.R., Biochemistry 34, 6919–6924.
Reddy, D.S., & Wilson, C.C. (1996) J. Am. Chem. Soc. 91. Singh, U.C., & Kollman, P.A. (1985) J. Chem. Phys. 83,
118, 4081–4084. 4033–4040.
60. Levitt, M., & Perutz, M.F. (1988) J. Mol. Biol. 201, 92. Cotton, F.A., Fair, C.K., Lewis, G.E., Mott, G.N., Ross,
751–754. F.K., Schultz, A.J., & Williams, J.M. (1984) J. Am. Chem.
61. Tuckerman, M.E., Marx, D., Klein, M.L., & Parrinello, Soc. 106, 5319–5323.
M. (1997) Science 275, 817–820. 93. Harrowfield, J.M., Sharma, R.P., Skelton, B.W., & White,
62. Perrin, C.L., & Thoburn, J.D. (1989) J. Am. Chem. Soc. A.H. (1998) Aust. J. Chem. 51, 785–793.
111, 8010–8012. 94. Hughes, D.L., & Truter, M.R. (1979) J. Chem. Soc.,
63. Perrin, C.L., & Nielson, J.B. (1997) J. Am. Chem. Soc. 119, Dalton Trans., 520–527.
12734–12741. 95. Kanters, J.A., Ter Horst, E.H., & Grech, E. (1992) Acta
64. Perrin, C.L. (1994) Science 266, 1665–1668. Crystallogr., Sect. C 48, 1345–1347.
65. Gilli, P., Bertolasi, V., Ferretti, V., & Gilli, G. (1994) J. Am. 96. Aleksandrov, G.G., Struchkov, Y.T., Kalinin, A.E.,
Chem. Soc. 116, 909–915. Shcherbakov, A.A., Barykina, L.R., & Karaulova, E.N.
66. Ichikawa, M. (1981) Chem. Phys. Lett. 79, 583–587. (1980) Kristallografiya 25, 481–487.
67. Steiner, T., & Saenger, W. (1994) Acta Crystallogr., Sect. 97. Jones, P.G., & Ahrens, B. (1998) Eur. J. Org. Chem.,
B 50, 348–357. 1687–1688.
68. Berglund, B., & Vaughan, R.W. (1980) J. Chem. Phys. 73, 98. Hashimoto, M., & Iwamoto, T. (1991) J. Coord. Chem.
2037–2043. 23, 269–276.
69. Altman, L.J., Laungani, P., Gunnarsson, G., 99. Rivas, J.C.M., & Brammer, L. (1998) Acta Crystallogr.,
Wennerstrom, H., & Forsen, S. (1978) J. Am. Chem. Soc. Sect. C 54, 1799–1802.
100, 8264–8265. 100. Therrien, B., & Beauchamp, A.L. (1993) Acta
70. Hsu, B., & Schlemper, E.O. (1980) Acta Crystallogr., Crystallogr., Sect. C 49, 1303–1307.
Sect. B 36, 3017–3023. 101. McAdam, A., Currie, M., & Speakman, J.C. (1971) J.
71. Madsen, D., Flensburg, C., & Larsen, S. (1998) J. Phys. Chem. Soc., A, 1994–1997.
Chem. A 102, 2177–2188. 102. Pei, X.F., Greig, N.H., Flippenanderson, J.L., Bi, S., &
72. Hussain, M.S., Schlemper, E.O., & Fair, C.K. (1980) Acta Brossi, A. (1994) Helv. Chim. Acta 77, 1412–1422.
Crystallogr., Sect. B 36, 1104–1108. 103. Gupta, M.P., & Ashok, J. (1978) Cryst. Struct. Commun.
73. Jones, R.D.G., & Power, L.F. (1976) Acta Crystallogr., 7, 171–174.
Sect B 32, 1801–1806. 104. Schwartz, A., Madan, P.B., Mohacsi, E., Obrien, J.P.,
74. Iijima, K., Ohnogi, A., & Shibata, S. (1987) J. Mol. Struct. Todaro, L.J., & Coffen, D.L. (1992) J. Org. Chem. 57,
156, 111–118. 851–856.
75. Wozniak, K., He, H.Y., Klinowski, J., Barr, T.L., & Milart, 105. Amstutz, R., Enz, A., Marzi, M., Boelsterli, J., &
P. (1996) J. Phys. Chem. 100, 11420–11426. Walkinshaw, M. (1990) Helv. Chim. Acta 73, 739–753.
248 Noncovalent Forces
106. Philippopoulos, A.I., Bau, R., Poilblanc, R., & 137. Venkatachalam, C.M. (1968) Biopolymers 6, 1425–1436.
Hadjiliadis, N. (1998) Inorg. Chem. 37, 4822–4827. 138. Zimm, B.H., & Bragg, J.K. (1959) J. Chem. Phys. 31,
107. Schwartz, B., & Drueckhammer, D.G. (1995) J. Am. 526–535.
Chem. Soc. 117, 11902–11905. 139. Gellman, S.H. (1998) Curr. Opin. Chem. Biol. 2,
108. Kato, Y., Toledo, L.M., & Rebek, J. (1996) J. Am. Chem. 717–725.
Soc. 118, 8575–8579. 140. Searle, M.S., Griffiths-Jones, S.R., & Skinner-Smith, H.
109. Ash, E.L., Sudmeier, J.L., De Fabo, E.C., & Bachovchin, (1999) J. Am. Chem. Soc. 121, 11615–11620.
W.W. (1997) Science 278, 1128–1132. 141. Fisk, J.D., & Gellman, S.H. (2001) J. Am. Chem. Soc. 123,
110. Lin, J., & Frey, P.A. (2000) J. Am. Chem. Soc. 122, 343–344.
11258–11259. 142. Bierzynski, A., Kim, P.S., & Baldwin, R.L. (1982) Proc.
111. Isaacs, E.D., Shukla, A., Platzman, P.M., Hamann, D.R., Natl. Acad. Sci. U.S.A. 79, 2470–2474.
Barbiellini, B., & Tulk, C.A. (1999) Phys. Rev. Lett. 82, 143. Austin, R.E., Maplestone, R.A., Sefler, A.M., Liu, K.,
600–603. Hruzewicz, W.N., Liu, C.W., Cho, H.S., Wemmer, D.E.,
112. Martin, T.W., & Derewenda, Z.S. (1999) Nat. Struct. & Bartlett, P.A. (1997) J. Am. Chem. Soc. 119, 6461–
Biol. 6, 403–406. 6472.
113. Ghanty, T.K., Staroverov, V.N., Koren, P.R., & Davidson, 144. Marqusee, S., & Baldwin, R.L. (1987) Proc. Natl. Acad.
E.R. (2000) J. Am. Chem. Soc. 122, 1210–1214. Sci. U.S.A. 84, 8898–8902.
114. Stahl, N., & Jencks, W.P. (1986) J. Am. Chem. Soc. 108, 145. Merutka, G., & Stellwagen, E. (1989) Biochemistry 28,
4196–4205. 352–357.
115. Pauling, L., & Pressman, D. (1945) J. Am. Chem. Soc. 67, 146. Merutka, G., & Stellwagen, E. (1990) Biochemistry 29,
1003–1012. 894–898.
116. Klotz, I.M., & Franzen, J.S. (1962) J. Am. Chem. Soc. 84, 147. Scholtz, J.M., Qian, H., Robbins, V.H., & Baldwin, R.L.
3461–3466. (1993) Biochemistry 32, 9668–9676.
117. Kresheck, G.C., & Klotz, I.M. (1969) Biochemistry 8, 148. Zhou, N.E., Kay, C.M., Sykes, B.D., & Hodges, R.S.
8–12. (1993) Biochemistry 32, 6190–6197.
118. Klotz, I.M., & Farnham, S.B. (1968) Biochemistry 7, 149. Smith, J.S., & Scholtz, J.M. (1998) Biochemistry 37,
3879–3882. 33–40.
119. Roseman, M.A. (1988) J. Mol. Biol. 201, 621–623. 150. Miller, J.S., Kennedy, R.J., & Kemp, D.S. (2001)
120. Kyte, J. (2003) Biophys. Chem. 100, 193–203. Biochemistry 40, 305–309.
121. Hine, J.S. (1962) Physical Organic Chemistry, 2nd ed., 151. Artymiuk, P.J., & Blake, C.C. (1981) J. Mol. Biol. 152,
pp 81–103, McGraw-Hill, New York. 737–762.
122. Kresge, A.J., & Chiang, Y. (1973) J. Phys. Chem. 77, 152. Oefner, C., & Suck, D. (1986) J. Mol. Biol. 192, 605–632.
822–825. 153. Pauling, L., Corey, R.B., & Branson, H.R. (1951) Proc.
123. Doig, A.J., & Williams, D.H. (1992) J. Am. Chem. Soc. Natl. Acad. Sci. U.S.A. 37, 205–211.
114, 338–343. 154. Huyghues-Despointes, B.M., & Baldwin, R.L. (1997)
124. Badger, R.M., & Rubalcava, H. (1954) Proc. Natl. Acad. Biochemistry 36, 1965–1970.
Sci. U.S.A. 40, 12–17. 155. Huyghues-Despointes, B.M., Klingler, T.M., & Baldwin,
125. Bruice, T.C., & Pandit, U.K. (1960) J. Am. Chem. Soc. 82, R.L. (1995) Biochemistry 34, 13267–13271.
5858–5865. 156. Tanaka, Y., Kato, Y., & Aoyama, Y. (1990) J. Am. Chem.
126. Bruice, T.C. (1970) in The Enzymes: Kinetics and Soc. 112, 2807–2808.
Mechanism, 3rd ed. (Boyer, P. D., Ed.) Vol. II, pp 157. Nowick, J.S., Chen, J.S., & Noronha, G. (1993) J. Am.
217–279, Academic Press, New York. Chem. Soc. 115, 7636–7644.
127. Bruice, T.C., & Turner, A. (1970) J. Am. Chem. Soc. 92, 158. Morales, J.C., & Kool, E.T. (1998) Nat. Struct. Biol. 5,
3422–3428. 950–954.
128. Page, M.I., & Jencks, W.P. (1971) Proc. Natl. Acad. Sci. 159. Ogawa, A.K., Wu, Y., McMinn, D.L., Liu, J., Schultz, P.G.,
U.S.A. 68, 1678–1683. & Romesberg, F.E. (2000) J. Am. Chem. Soc. 122,
129. Morris, J.J., & Page, M.I. (1980) J. Chem. Soc., Perkin 3274–3287.
Trans. 2, 679–684. 160. Wu, Y., Ogawa, A.K., Berger, M., McMinn, D.L., Schultz,
130. Page, M.I., & Jencks, W.P. (1987) Gazz. Chim. Ital. 117, P.G., & Romesberg, F.E. (2000) J. Am. Chem. Soc. 122,
455–460. 7621–7632.
131. Tadayoni, B.M., Parris, K., & Rebek, J., Jr. (1989) J. Am. 161. Guckian, K.M., Krugh, T.R., & Kool, E.T. (2000) J. Am.
Chem. Soc. 111, 4503–4505. Chem. Soc. 122, 6841–6847.
132. Tadayoni, B.M., Huff, J., & Rebek, J., Jr. (1991) J. Am. 162. Dzantiev, L., Alekseyev, Y.O., Morales, J.C., Kool, E.T.,
Chem. Soc. 113, 2247–2253. & Romano, L.J. (2001) Biochemistry 40, 3215–3221.
133. Higuchi, T., Eberson, L., & Herd, A.K. (1966) J. Am. 163. Kato, Y., Conn, M.M., & Rebek, J., Jr. (1995) Proc. Natl.
Chem. Soc. 88, 3805–3808. Acad. Sci. U.S.A. 92, 1208–1212.
134. Roberts, J.D., Chun, Y., Flanagan, C., & Birdseye, T.R. 164. Krugh, T.R., & Young, M.A. (1975) Biochem. Biophys.
(1982) J. Am. Chem. Soc. 104, 3945–3949. Res. Commun. 62, 1025–1031.
135. Etzkorn, F.A., Guo, T., Lipton, M.A., Goldberg, S.D., & 165. Ogasawara, N., & Inoue, Y. (1976) J. Am. Chem. Soc. 98,
Bartlett, P.A. (1994) J. Am. Chem. Soc. 116, 7054–7060.
10412–10425. 166. Lyu, P.C., Marky, L.A., & Kallenbach, N.R. (1989) J. Am.
136. Karle, I., & Karle, J. (1963) Acta Crystallogr. 16, 969–975. Chem. Soc. 111, 2733–2734.
References 249
167. Melville, H. (1979) Moby-Dick or, The Whale, Chapter 191. McMeekin, T.L., Cohn, E.J., & Weare, J.H. (1935) J. Am.
84, University of California Press, Berkeley, CA. Chem. Soc. 57, 626–633.
168. Hartley, G.S. (1936) in Aqueous Solutions of Paraffin- 192. Tanford, C. (1973) The Hydrophobic Effect: Formation
Chain Salts pp viii, Hermann & Cie., Paris, as quoted in of Micelles and Biological Membranes, Wiley, New
Tanford, C. (1973) The Hydrophobic Effect: Formation York,.
of Micelles and Biological Membranes, p viii, John Wiley 193. Pace, C.N. (1992) J. Mol. Biol. 226, 29–35.
and Sons, New York. 194. Cramer, R.D. (1977) J. Am. Chem. Soc. 99, 5408–5412.
169. Hildebrand, J.H. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 195. Abraham, M.H. (1979) J. Am. Chem. Soc. 101,
194. 5477–5484.
170. Tanford, C. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 196. Hine, J., & Mookerjee, P.K. (1975) J. Org. Chem. 40,
4175–4176. 292–298.
171. Kauzmann, W. (1959) Adv. Protein Chem. 14, 1–63. 197. Makhatadze, G.I., & Privalov, P.L. (1993) J. Mol. Biol.
172. Kauzmann, W. (1954) in The Mechanism of Enzyme 232, 639–659.
Action (McElroy, W. D., & Glass, B., Eds.) pp 70–120, 198. Privalov, P.L., & Makhatadze, G.I. (1993) J. Mol. Biol.
Johns Hopkins Press, Baltimore, MD. 232, 660–679.
173. Scatena, L.F., Brown, M.G., & Richmond, G.L. (2001) 199. Wolfenden, R., Andersson, L., Cullis, P.M., & Southgate,
Science 292, 908–912. C.C. (1981) Biochemistry 20, 849–855.
174. Abraham, M.H., Grellier, P.L., & McGill, R.A. (1987) J. 200. Kinoshita, K., Ishikawa, H., & Shinoda, K. (1958) Bull.
Chem. Soc. Perkin Trans. 2, 797. Chem. Soc. Jpn. 31, 1081–1082.
175. Abraham, M.H. (1993) Chem. Soc. Rev. 22, 73–83. 201. Wimley, W.C., & White, S.H. (1992) Biochemistry 31,
176. Cohn, E.J., McMeekin, T.L., Edsall, J.T., & Weare, J.H. 12813–12818.
(1934) J. Am. Chem. Soc. 56, 2270–2282. 202. Fauchere, J.L., & Pliska, V. (1983) Eur. J. Med. Chem.,
177. Tanford, C. (1980) The Hydrophobic Effect: Formation Chim. Ther. 18, 369–375.
of Micelles and Biological Membranes, 2nd ed., Wiley, 203. Wolfenden, R.V., Cullis, P.M., & Southgate, C.C. (1979)
New York. Science 206, 575–577.
178. Franks, F., & Reid, D.S. (1973) in Water: A 204. Wolfenden, R. (1978) Biochemistry 17, 201–204.
Comprehensive Treatise, Water in Crystalline Hydrates: 205. Wang, Y., Purrello, R., Georgiou, S., & Spiro, T.G. (1991)
Volume 2, Aqueous Solutions of Simple Non-electrolytes J. Am. Chem. Soc. 113, 6368–6377.
(Franks, F., Ed.) pp 323–380, Plenum Press, New York. 206. Davies, M., Evans, J.C., & Jones, R.L. (1955) Trans.
179. Arnett, E.M., Kover, W.B., & Carter, J.V. (1969) J. Am. Faraday Soc. 51, 761–774.
Chem. Soc. 91, 4028–4034. 207. Nozaki, Y., & Tanford, C. (1971) J. Biol. Chem. 246,
180. Edsall, J.T., & Scatchard, G. (1943) in Proteins, Amino 2211–2217.
Acids and Peptides as Ions and Dipolar Ions (Edsall, J. 208. Traube, J. (1891) Justus Liebigs Ann. Chem. 265, 27.
T., & Cohn, E. J., Eds.) pp 177–195, Reinhold, New York. 209. Bull, H.B., & Breese, K. (1974) Arch. Biochem. Biophys.
181. Privalov, P.L., & Gill, S.J. (1988) Adv. Protein Chem. 39, 161, 665–670.
191–234. 210. Segrest, J.P., & Feldmann, R.J. (1974) J. Mol. Biol. 87,
182. Edsall, J.T. (1935) J. Am. Chem. Soc. 57, 1506–1507. 853–858.
183. Frank, H.S., & Evans, M.W. (1945) J. Chem. Phys. 13, 211. Hansch, C.H., & Leo, A. (1979) Substituent Constants for
507–532. Correlation Analysis in Chemistry and Biology, Wiley,
184. Pertsemlidis, A., Saxena, A.M., Soper, A.K., Head- New York.
Gordon, T., & Glaeser, R.M. (1996) Proc. Natl. Acad. Sci. 212. Lawson, E.Q., Sadler, A.J., Harmatz, D., Brandau, D.T.,
U.S.A. 93, 10769–10774. Micanovic, R., MacElroy, R.D., & Middaugh, C.R. (1984)
185. Hafemann, D.R., & Miller, S.L. (1969) J. Phys. Chem. 73, J. Biol. Chem. 259, 2910–2912.
1392–1397. 213. Hermann, R.B. (1972) J. Phys. Chem. 76, 2754–2759.
186. Graziano, G., & Barone, G. (1996) J. Am. Chem. Soc. 118, 214. Lee, B., & Richards, F.M. (1971) J. Mol. Biol. 55, 379–400.
1831–1835. 215. Chothia, C. (1976) J. Mol. Biol. 105, 1–14.
187. Haggis, G.H., Hasted, J.B., & Buchanan, T.J. (1952) J. 216. Janin, J. (1979) Nature 277, 491–492.
Chem. Phys. 20, 1452–1465. 217. Guy, H.R. (1985) Biophys. J. 47, 61–70.
188. Lumry, R., & Biltonen, R. (1969) in The Structure and 218. von Heijne, G., & Blomberg, C. (1979) Eur. J. Biochem.
Stability of Biological Macromolecules (Timasheff, S.N., 97, 175–181.
& Fasman, G.D., Eds.) pp 65–212, Marcel Dekker, New 219. Eisenberg, D., Weiss, R.M., Terwilliger, T.C., & Wilcox,
York. W. (1982) Faraday Symp. Chem. Soc., 109–120.
189. Southall, N.T., Dill, K.A., & Haymet, A.D.J. (2002) J. 220. Engelman, D.M., Steitz, T.A., & Goldman, A. (1986)
Phys. Chem., B 106, 521–533. Annu. Rev. Biophys. Biophys. Chem. 15, 321–353.
190. Abraham, M.H., Chadha, H.S., Whiting, G.S., &
Mitchell, R.C. (1994) J. Pharm. Sci. 83, 1085–1100.
Chapter 6
Atomic Details
MolScript.573
graphic molecular models. This drawing was produced with
used by the crystallographers to construct the crystallo-
differ from those of the preliminary amino acid sequences
of Bioinformatics (us.expasy.org). These numbers often
sequences in the data base published by the Swiss Institute
numbered according to their positions in the amino acid
In this figure and those that follow, the amino acids are
(Glycine 222) and an aspartate (Aspartate 89), respectively.
taken by the proline in the cis bond is occupied by a glycine
proline (Proline 87). In the other two, the position normally
and atoms) near each other. Only one of these contains a
have three cis peptide bonds (highlighted with black bonds
Griffonia simplicifolia.2 By chance, this protein happens to
molecular model (Bragg spacing ≥ 0.20 nm) of lectin IV of
Figure 6–1: cis Peptide bonds in the crystallographic
H H H H
H O H O
H H
H N
N
1 O
H N
N
O H H H H
trans cis
(6–1)
221
88
221
88
H C1 H2 C1 H
O2 f Cb
N2 Figure 6–3: Definitions of the dihedral angles f and y and the steric
C2 effects of rotation. (A) Pattern in which the bonds with the dihedral
C2 angles f and y are distributed along a polypeptide. (B) View down the
f
N2 bond between Ca and the amido nitrogen N1 that precedes it along
Cb O2 H2 H1 H
H H1 the polypeptide. The dihedral angle f is defined as that between the
bond connecting N1 and C1 and the bond connecting Ca and C2
(Figure 6–2). Its sign is determined by the right-hand rule. Note that
the direction of the arrow is irrelevant to the assignment of the sign of
E H2 the angle. In the configuration shown, angle f is +260 ∞ (–100 ∞ ). This
D H2 dihedral angle is in the most sterically free range for angle f (–45 ∞ to
–180 ∞) because in this range the hydrogen on Ca can slip under the
N2 H N2 acyl oxygen O1. (C) View down the same bond as in panel B but with
O1 Cb
H H1 angle f at +90 ∞ , produced from the conformation in panel B by rota-
y N1 tion about only the bond on the axis of the view. When angle f is +60 ∞ ,
C1
N1 C1 the acyl oxygen, O1, sits between the carbon of the next peptide bond
H y at C2 and the first carbon of the side chain, Cb. This would be the
H1 Cb H
O2 O2 value of angle f in a left-handed a helix. (D) The same conformation
presented in panel B is viewed along the bond between Ca and the
acyl carbon C2. The eyes indicate the views interconverting panels B
and D. The dihedral angle y is defined as that between the bond con-
necting Ca and N1 and the bond connecting C2 and N2 (Figure 6–2).
F H2 Its sign is determined by the right-hand rule. The configuration
shown (y = +105 ∞ ) is in the most sterically free range for angle y (+15 ∞
N2 H1 C1 to +190 ∞) because the hydrogen on Ca can slip below H2. (E) View
N1 down the same bond as in panel D but with angle y at +285 ∞ (–75 ∞ ).
H O1 When angle y is +300 ∞ (–60 ∞), H2 lies between the first carbon of the
Cb side chain, Cb, and the nitrogen of the amino-terminal peptide bond,
N1. This is near the value of angle y (–39 ∞ ) in a right-handed a helix.
H (F) Steric effect between H2 and either N1 or H1 that occurs when
O2 angle y is near 0 ∞ .
254 Atomic Details
(-70∞ ) and +320∞ (-40∞ ), hydrogen H2 on the amido nitro- When the dihedral angles for the amino acids in
gen N2 can fit between amido nitrogen N1 and the side eight crystallographic molecular models, all built from
chain with little difficulty (Figure 6–3E). data sets to Bragg spacings of less than 0.12 nm, are plot-
All of these steric effects can be summarized in a ted on a Ramachandran plot (Figure 6–4B),6 the points
Ramachandran plot (Figure 6–4A).12 Using your model, themselves define what should be the actual regions of
you should verify the noted boundaries on the plot. lowest energy. It might have been the case that the three
Refinements of crystallographic molecular models clusters of open squares in Figure 6–4B are more the
by use of Equation 4–15 for calculation of the function q result of preferences enforced by secondary structures
usually do not constrain the values for dihedral angles f than the steric effects first pointed out by Ramachandran
and y. Even though they are not constrained, however, (Figure 6–4A). When, however, dihedral angles are plot-
their values converge upon the allowed regions in a ted for amino acids not involved in secondary structures,
Ramachandran plot during the refinement. For example, from a much larger collection of crystallographic molec-
although many of the values for dihedral angles f and y ular models (402) but from data sets gathered to mini-
for the various amino acids along the polypeptide in the mum Bragg spacings of only 0.2 nm or less, the
initial molecular model of deoxyribonuclease I were distribution still shows the same three clusters with the
scattered beyond the allowed regions in a same shapes and extents.13,14 Consequently, the extent
Ramachandran plot before refinement was performed and magnitude of the actual steric effects in the back-
(Figure 6–5A), they clustered within the enclosures after bone of a polypeptide are delineated in the distribution
the refinement had been completed (Figure 6–5B).8 of the points in a plot such as that in Figure 6–4B.
Because this convergence was not enforced by the choice With the exception of the region where dihedral
of (ds,q2 – dc,q2) in Equation 4–15, its occurrence can be angle f is between –70 ∞ and –180 ∞ and dihedral angle y
used as evidence that the refined structure is closer to is between 30 ∞ and 110 ∞ , which should be sterically
reality than the unrefined. unhindered anyway, almost all of the amino acids found
A B
180 180
O1-O2
O1-C2
A
P
90 90
O1-Cb
H2
O1-C2
1-
O
y (degrees)
y (degrees)
L
H1-H2
N1-H2 N1-H2
0 N1-H2 N1-H2 0
O1-H2
O1-C2
H1-H2 a
–90 –90
Cb -H2
O1-C2
O1-O2
Cb -H2
O1-Cb
–180 –180
–180 –90 0 90 180 –180 –90 0 90 180
f (degrees) f (degrees)
Figure 6–4: Ramachandran plot. (A) Diagram illustrating the steric effects producing the Ramachandran plot.12 The two dimensions of the
plot are the dihedral angles y and f (Figure 6–3). Boundaries are drawn between allowed and forbidden regions obtained from a molecular
model in which each atom is a hard sphere of the appropriate van der Waals radius. The clashing atoms are identified on the forbidden side
of the boundary with the same numbering system as in Figures 6–2 and 6–3. There are only four allowed regions: the large region including
the values for parallel (R) and antiparallel (Î) b sheet, the region including the values for right-handed a helix (Ï), the region including the
values for left-handed a helix (Ô), and a small triangle at f = +60 ∞ and y = +180 ∞ . The clashes can be understood by referring to Figure 6–3.
For example, if f = –100 ∞ and y = +105 ∞ (Figure 6–3B,D) and angle f is increased to –45 ∞ , O1 clashes with C2; if angle y is decreased to +20 ∞,
N1 clashes with H2. If f = –60 ∞ and y = –60 ∞ and angle f is decreased to –185 ∞, O1 runs into Cb; if angle y is decreased to –70 ∞ , H2 runs into
Cb. Adapted with permission from ref 12. Copyright 1977 Journal of Biological Chemistry. (B) Dihedral angles y and f from eight crystallo-
graphic molecular models of high accuracy.6 The crystallographic molecular models and the minimum Bragg spacings of their data sets were
cytochrome c6 (0.12 nm), cutinase (0.10 nm), lysozyme (0.0925 nm), a fragment of protein G (0.11 nm), ribonuclease Sa (0.12 nm), repressor
of primer protein (0.11 nm), rubredoxin from Desulfovibrio vulgaris (0.092 nm), and rubredoxin from Clostridium pasteurianum (0.11 nm).
The numbers in the points indicate the respective models. Triangles are glycines, and squares are amino acids other than glycine.
Secondary Structure of the Polypeptide Backbone 255
180 180
120 120
60 60
y (degrees)
y (degrees)
0 0
–60 –60
–120 –120
–180 –180
–180 –120 –60 0 60 120 180 –180 –120 –60 0 60 120 180
f (degrees) f (degrees)
Figure 6–5: Effect of refinement on the values of f and y for the amino acids in bovine deoxyribonuclease I.8 Each ¥ in one of the diagrams
represents the values for the dihedral angles f and y of one of the amino acids in a crystallographic molecular model of the protein. The
boundaries in the Ramachandran plot are defined by the steric effects represented diagrammatically in Figure 6–4A. Unbroken lines sur-
round regions of no hindrance; broken lines, regions of little hindrance. (A) Unrefined, initial molecular model. (B) Refined, final molecular
model. Glycines are denoted by open circles, cystines by filled squares, and all other amino acids by ¥. Reprinted with permission from ref 8.
Copyright 1986 Academic Press.
outside of the three clusters of open squares in Figure N1 or atom H1 should be overlapping atom H2 (Figure
6–4B are glycines. In addition to being able to reside in 6–4A). Nevertheless, the dihedral angles of many amino
cramped locations, glycine lacks a b carbon, and all of the acids fall within this supposedly disallowed region
steric collisions involving the b carbon (Figures 6–3 and (Figure 6–4B). One way the overlap between either atom
6–4A) are irrelevant. Therefore, the dihedral angles N1 or atom H1 and atom H2 can be prevented is to
around a glycine have a larger compass. In particular, the increase the bond angle N1–Ca–C2 beyond the usual
regions of the Ramachandran plot in which dihedral 109.5 ∞ of a carbon hybridized sp3. In crystallographic
angle f lies between +70 ∞ and +180 ∞ or dihedral angle y molecular models this angle is observed to be wider17
lies between –70 ∞ and –180 ∞ represent areas where than expected, with a mean of 112 ∞ and deviations up to
either O1 or H2, respectively, clash with the side chain of 120 ∞ . In addition, this widening is dependent on the
any amino acid other than glycine. In Figure 6–4B, the values for the dihedral angles f and y. The angle
points for glycine (䉭) and the points for all other amino N1–Ca–C2 is equal to 109 ∞ for b structure, the dihedral
acids (Í) define distinct regions on the plot. In fact, most angles y and f of which fall in the largest unhindered
of the glycines in crystallographic molecular models have area of the plot, but is wider by 3 ∞ in a helices,18 the dihe-
dihedral angles f and y outside of the traditional enclo- dral angles y and f of which fall within the lower left clus-
sures on a Ramachandran plot defined by the dihedral ter in Figure 6–4B immediately adjacent to the
angles f and y of the other amino acids.15,16 This fact sug- supposedly disallowed region. All of these results suggest
gests that glycine is selected for situations in which such that the existence of so many amino acids the dihedral
otherwise unpermitted dihedral angles are unavoidable. angles f and y of which fall within a region of the
It has been noted that when an amino acid other Ramachandran plot originally predicted to be disallowed
than glycine has angles f and y outside of the enclosures, results from a widening of this bond angle to accommo-
that amino acid is usually involved intimately in the date the steric effect. The same argument would apply to
function of the protein.4 For example, Serine 120 is the the glycines the dihedral angles f and y of which fall
nucleophile in the active site of cutinase, and Alanine 30 within the boundaries +60 ∞ to +110 ∞ and –20 ∞ to +20 ∞ ,
is in the center of the crucial tight turn between the two respectively, also previously thought to be disallowed.
a helices in the repressor of the primer protein.6 In a set of eight crystallographic molecular models,
The region of the Ramachandran plot bounded by all built from data sets to Bragg spacings of less than
values of dihedral angle f between –140 ∞ and –60 ∞ and 0.12 nm,6 the amino acids within right-handed a helices
values of dihedral angle y between –20 ∞ and +20 ∞ was (Figure 6–6)19 have dihedral angles of f = –66 ∞ ± 13 ∞ and
originally predicted to be disallowed because when these y = –39 ∞ ± 10 ∞ (Figure 6–3E), and these values fall within
dihedral angles are within these boundaries, either atom one of the enclosures in a Ramachandran plot (Figure
256 Atomic Details
on Proline 25, Alanine 26, Alanine 29, and Arginine 33, in addi-
Figure 6–7: Occupation of the second hydrogen-bond accep-
(n +1)
(n –2)
(n –3)
(n –1)
A
c1
Ser(n)
(n –4)
with MolScript.573
A29
A26
R33
V36
P25
were a serine, its hydroxyl would take the place of one of
the waters in the respective hydrogen bonds to acyl oxy-
gens on Proline 25 or Alanine 26. Such intramolecular
hydrogen bonds are quite frequently encountered in
a helices. In the molecular model of myoglobin, a pro-
B
A29
Asparagine can also participate in such an intrahelical
A26
R33
P25
molecular models are kinked.23 The most common cause
of an abrupt kink in an a helix is a proline. For example,
Proline 183 in the middle of an a helix 30 amino acids
long in citrate (Si)-synthase31 causes the a helix to bend
abruptly by 40 ∞ . The mean value for the angles of the
abrupt kinks produced in a helices by prolines is 26 ∞ .23
In proteins where such a kink is found naturally, the pre-
sumption is that it serves the purpose of fitting the At the amino-terminal end of an a helix there are
a helix properly into the overall structure. When a pro- unoccupied nitrogen–hydrogen donors, and at the car-
line is inserted into an otherwise straight a helix by site- boxy-terminal end, unoccupied acyl oxygen acceptors
directed mutation, the a helix, if it tolerates the (Figure 6–6). Because each peptide bond has two accep-
substitution, displays a kink with a much smaller angle, tors for hydrogen bonding but only one donor and
and the protein becomes significantly less stable.32 because the side chains of the amino acids also have an
There are also examples of local distortions in excess of acceptors over donors, a solution of protein
a helices that seem to be caused by the incompatibility of contains more acceptors than donors of hydrogen
an undistorted a helix with the surrounding structure of bonds. Consequently, when a donor remains unoccu-
the protein. If the a helix is too short, a gap develops in pied in the native structure, there was a loss of one
which molecules of water or donors and acceptors from hydrogen bond from the solution upon folding.
side chains occupy the acceptors and donors broken by Therefore, it comes as no surprise that the donors at the
opening the gap;33,34 if the a helix is too long, one or more amino-terminal end of an a helix are occupied, or
of its amino acids is pushed out of the structure as an capped,26,37–40 but the acyl oxygens at the carboxy-termi-
aneurysm or loop.35,36 nal end are often capped as well.
258 Atomic Details
About half of the time, the side chain of an amino anionic conjugate base,16 but it is usually argued that the
acid such as asparagine, serine, threonine, or aspartate in acidity of Glutamate 35 must be weakened rather than
the position immediately before the beginning of the strengthened so that it will be protonated when substrate
a helix, the N-cap position, provides one or more of binds to the enzyme.
the necessary acceptors to occupy the open donors at the It is unfortunate that the original calculations of the
amino-terminal end (Figure 6–58). When such an amino magnitude of the electric field generated by an a helix
acid is replaced by site-directed mutation with one that assumed that it existed in a uniform dielectric with a rel-
is of the same size or smaller but that cannot provide an ative permittivity of 2 and no account of the relative per-
acceptor, the resulting protein is less stable41,42 because a mittivity of the medium surrounding it was taken. For
hydrogen bond is lost to the solution upon its folding that example, electrostatic potentials of only 2.5 kJ mol–1 have
is not lost when the wild-type protein folds.42 Proline been observed for univalent elementary charges posi-
often (10%) occurs at the first position in an a helix37 tioned at the amino-terminal ends of a helices in water
because it does not have a nitrogen–hydrogen donor that (e = 78 at 25 ∞C),47 but even these small potentials dis-
requires capping. About a third of the time, the amino appear when the ionic strength of the solution is
acid immediately after the end of an a helix is a glycine, increased. Later calculations48 have incorporated the con-
which can readily (Figure 6–4B) adopt the necessary tribution of the dielectric surrounding the a helix, and it
dihedral angles f and y (+70 ∞ and +20 ∞ , respectively) was found that if the protein was approximated by a solid
that permit the amido nitrogen–hydrogen of the next sphere of relative permittivity 3.5 in a solvent of relative
amino acid to occupy the open acceptor of the first unoc- permittivity 80 (water), even when the a helix was com-
cupied acyl oxygen (that on the amino acid 98 in Figure pletely within the sphere of low relative permittivity, the
6–6) at the carboxy-terminal end of the helix.43 electric field around the a helix was dramatically less
The dipole moment of an isolated peptide bond is than the electric field in a uniform dielectric with a rela-
estimated to be 3.5 D,44 and the peptide bonds in an tive permittivity of 3.5. Furthermore, if the ends of the
a helix are held with their dipoles almost parallel to the a helix were at the surface of the sphere, in contact with
axis (Figure 6–6) so that the positive poles point to the the solvent, the electric field decreased even further to
amino-terminal end of the a helix and the negative poles negligible levels. This effect of the dielectric may explain
to the carboxy-terminal end. Such an arrangement of why the apparent electrostatic potentials exerted on
dipoles creates an electrostatic field of the respective aspartates positioned by site-directed mutation at the
polarity, the magnitude of which is 1 V at 0.3 nm and amino-terminal ends of the two a helices on the surface
0.5 V at 0.5 nm from each end of the a helix, if the a helix of T4 lysozyme were only about –2 kJ mol–1.49 If the rela-
is greater than 10 amino acids long and located in a tive permittivity of the interior of a protein is greater than
medium of relative permittivity equal to 2.44 These volt- 3.5, the magnitude of the electric field would decrease
ages would produce electrostatic potentials equal to accordingly in inverse proportion. Finally, the solution
about 100 and 50 kJ mol–1, respectively, for a univalent around a molecule of protein always contains electrolytes
ion. Although these electrostatic potentials are less than that would further diminish the electric field.47 For all of
twice those that would be felt at the same distances from these reasons, electrostatic free energies of significant
two adjacent, isolated peptide bonds, it is thought that magnitude are probably not exerted by an a helix within
the amplification produced by aligning the peptide a protein, although the possibility is often discussed.
bonds in an a helix is significant. When an a helix traverses the surface of a protein as
Experimental observations equivocally consistent a continuous rod, its face directed toward the protein is
with this idea have been presented. For example, the hydrophobic and its face directed toward the solution is
upfield shift (+0.4 ppm) in the absorption in a nuclear hydrophilic. The a helix in Figure 6–7B has such an ori-
magnetic resonance spectrum for the proton in a hydro- entation with the surface formed by Leucine 28,
gen bond between an amide and an acyl oxygen in an Tryptophan 30, Phenylalanine 31, and Threonine 35
a helix relative to one in random meander has been facing the protein and the opposite surface facing the
attributed to the a-helical dipole,45 but an unexplained solvent, as indicated by the locations for molecules of
downfield shift of the same magnitude is found in a water. This asymmetry of hydropathy is sometimes
b sheet. The location of a sulfate ion at the intersection of reflected in the amino acid sequence of the protein and
the amino-terminal ends of three a helices in sulfate can be identified by constructing a helical wheel.50,51
binding protein suggests that the positive ends of the Around a circle, successive amino acids in the sequence
dipoles of these a helices stabilize the anion,46 but the are placed at 100 ∞ intervals (Figure 6–8). This represents
amino-terminal ends of these helices could simply be the view down an a helix (Figure 6–46), much as a
providing the properly oriented amido donors that Newman projection represents the view down a
occupy several of the many s lone pairs of electrons on carbon–carbon bond. Any asymmetry in the distribution
the sulfate (2–28). The location of Glutamate 35 of of hydropathy is easily observed. If a segment of amino
lysozyme adjacent to the amino-terminal end of an acid sequence in a polypeptide, when placed upon a hel-
a helix suggests that this arrangement would stabilize its ical wheel, reveals such an asymmetric pattern of
Secondary Structure of the Polypeptide Backbone 259
A D D E P Y
K N E S
T K K H
D H K G
A A K D
V M I I
P A A A
L V V V V M
Figure 6–8: Two segments of amino acid sequence displayed on helical wheels. (A) Sequence from Lysine 60 to Proline 77 (KKVADALT
NAVAHVDDMP) in the a polypeptide of human hemoglobin. In the crystallographic molecular model of hemoglobin, this sequence is an
a helix running across the surface of the protein. In the diagram, the amino terminus is the lysine at the 10:30 position and the sequence is
read at 100 ∞ intervals. (B) Amphipathic helical sequence from Proline 455 to Lysine 472 (PDVKSAIEGVKYIAEHMK) from the a polypeptide
of acetylcholine receptor. The lines in both panels divide the hydrophilic and hydrophobic surfaces of these two amphipathic a helices.
hydropathy, as do those in Figure 6–8, this pattern is evi- of a peptide that assumes an a-helical conformation in
dence that that segment is an a helix in the folded water58–60 and then measuring the changes in stability to
polypeptide. Such a helices are referred to as amphi- that protein or to that unsupported a helix that result.
pathic a helices. An amphipathic a helix is an a helix Although there are significant differences in the
that is enriched in hydrophobic side chains on one of its various scales that result from these measurements, all
sides and enriched in hydrophilic side chains on the agree that, of the 20 amino acids, alanine has the greatest
other. propensity to stabilize an a helix and glycine the least
A single-stranded amphipathic a-helical peptide in and that the differences in free energy of stabilization
a mixed solvent of trifluorethanol and water, unattached between these extremes are about 4 kJ mol–1. These pref-
to a protein, displays an intrinsic curvature with the erences presumably explain how antifreeze peptide 3
hydrophobic amino acids on the concave face and the from the winter flounder, in which 23 of the 37 amino
hydrophilic on the convex.52 In an a helix running across acids are alanines, can naturally assume the conforma-
the surface of a protein, the same orientation of curva- tion of a long, unbroken, unsupported a helix,61 but in an
ture is often observed.23 Whether this curvature is due to illustration of the unpredictability of the structure of pro-
the fact that such an a helix is amphipathic,52 or to the teins, all of the alanines in the alanine-rich regions of
fact that the acyl oxygens of its peptide bonds on the face spider dragline silk are found in b sheets.62
exposed to the solvent form hydrogen bonds with The propensities of the 17 primary amino acids
water,23 or to the fact that such curvature simply allows it other than alanine, glycine, and proline to stabilize or
to adhere more closely to the underlying structure, or to destabilize an a helix are much less obvious. If the values
more than one of these reasons is unclear. for their helical propensities in eight different scales63 are
It has been proposed that certain amino acids or averaged, the difference between the mean values for
short sequences of amino acids may impose upon a fold- any two of them is rarely as large as the standard devia-
ing polypeptide biases toward the formation of particu- tion of the value for either. Consequently, with the possi-
lar secondary structures at locations where they reside in ble exception of methionine and leucine (both at –2.8 ±
the native structure of a protein. One hears terms such as 0.5 kJ mol–1) assigning the rest a value halfway between
“helix-forming” or “helix-breaking” amino acids.53 that of glycine (arbitrarily set at 0 kJ mol–1) and that of
Originally, these distinctions were based on the observed alanine (–3.6 kJ mol–1) would be as statistically significant
preferences of homopolymers of the various amino acids as assigning them each an individual value.
to assume a helices or sheets of b structure or to remain There are several other types of helical structures
structureless at various temperatures, ionic strengths, that occur rarely in crystallographic molecular models of
concentrations of cosolvents, and values of pH.54 The native proteins. The polyproline helix, of which there is
propensities of the various amino acids to favor an a-hel- an example 13 aa long in benzoylformate decarboxy-
ical conformation have also been examined, either by lase,64 has dihedral angles f and y of –75 ∞ and 145 ∞ ,
placing each of them in turn in the center of an a helix in respectively,11,65 which places it within the largest
a native protein by site-directed mutation55–57 or by allowed region of the Ramachandran plot (Figure 6–4B).
incorporating them in turn into a position in the center It is a much more extended structure than an a helix,
260 Atomic Details
having a rise of 0.31 nm aa–1 and 3.0 aa turn–1, and it is a this arrangement one of the amino acids is skipped in the
left-handed helix rather than a right-handed one. As their regular pattern of hydrogen bonding between two
name suggests, polyproline helices in crystallographic antiparallel strands. The hydrogen bond that would have
molecular models of proteins usually contain a high fre- incorporated the nitrogen–hydrogen bond of the skipped
quency (25–70%) of proline.66 Even though they contain amide incorporates the nitrogen–hydrogen bond of the
no internal hydrogen bonds and usually occur in situa- next amide instead. This causes the b structure to bulge
tions where they are supported by surrounding struc- at the location of the skipped amino acid (Figure 6–10),73
tures, there are sequences of amino acids in naturally and the bulge is located where the strands change direc-
occurring proteins that form unsupported polyproline tion. This change in direction can take two forms. If the
helices.67 The p helix, of which there is an example 13 aa b structure remains as a sheet in roughly the same plane,
long in arachidonate 15-lipoxygenase,68 is a wider, squat- the b bulge puts a bend in the structure. A b bulge, how-
ter version of the a helix in which the hydrogen bonds ever, also can occur at a location where a large sheet of
are between the acyl oxygen of amino acid i and the b structure folds over upon itself to form a sandwich of
nitrogen–hydrogen bond of amino acid i + 5 rather than two opposed b sheets.
amino acid i + 4. As with a helices that contain gaps where a turn is
The values for the dihedral angles f and y found in pulled apart, a b sheet can contain a gap between two
ideal b structure (f = –130 ∞ , y = +120 ∞ ) lie within one of strands. In such a gap, the donors and acceptors that
the two largest allowed regions of the Ramachandran have been pulled apart from each other are occupied by
plot (Î and R in Figure 6–4A). These dihedral angles acceptors and donors on the side chains of their amino
place the hydrogen on the a carbon under the preceding acids or by ordered molecules of water filling the gap.74
acyl oxygen, O1 (Figure 6–3B), and under the next amido Most b structure is buried in the middle of a pro-
hydrogen, H2 (Figure 6–3D), respectively. This is the least tein, but even in a small protein such as fatty-acid-bind-
hindered of all the conformations, and b structure expe- ing protein from Escherichia coli,75 that is only a
riences no serious steric problems around its a carbons. sandwich of two b sheets, there is only a very weak
Because b structure is usually found in the most deeply amphipathic pattern of alternating hydrophobic and
buried regions of a protein, its polypeptide backbone hydrophilic amino acids along the b strands.
usually displays the least thermal motion69 even though Consequently, b structure cannot be identified in an
its dihedral angles f and y are the least sterically con- amino acid sequence.
strained. Nevertheless, it is obvious from an examination There are three cylindrical arrays formed from
of the polypeptide backbones of the proteins presented b structure: a b barrel (Figure 6–11)76 of 4–12 strands,77 a
in Chapter 4 that, because of the size of this region, b helix (Figure 6–12),78,79 and a b propeller (Figure
b structure is far more pliant and unpredictable than an 6–13)80 with 6–8 blades.81,82 In a b barrel, the hydrogen
a helix, and efforts to define regular patterns have been bonds between the b strands are perpendicular to the
less informative than time spent looking at different crys- axis of the cylinder and perpendicular to its radius; in a
tallographic molecular models. The original b-pleated b helix, they are parallel to the axis of the cylinder and
sheets (Figure 4–16B,C) have turned out to be highly ide- perpendicular to its radius; and in a b propeller, they are
alized. There are, however, several notable structural fea- parallel to the radius of the cylinder and perpendicular to
tures of b structure. its axis. Therefore, each of the three orthogonal axes of a
When a number of b strands do form a sheet, the cylinder is represented.
sheet usually has a negative, left-handed twist to its sur- The most common type of b barrel has eight b strands
face (Figure 6–9).70 This is supposed to arise from the fact (Figure 6–11). Usually b barrels are of eight strands or fewer
that the enclosure on the Ramachandran plot in which so that the core can be tightly packed with side chains, but
the dihedral angles f and y for parallel and antiparallel there is a b barrel of 11 strands through the core of which
b sheets reside has more open area for smaller values of runs an a helix.83,84 The b strands in a b barrel reside in a
dihedral angle f and larger values of dihedral angle y surface that can be approximated quite closely by a twisted
beyond the values of these two dihedral angles that hyperboloid.85 A hyperboloid is an ellipsoidal cylinder that
would give a flat sheet. Deviations tend to be biased is narrowest at its center and gradually and continuously
toward these smaller values of dihedral angle f and larger widens away from its center in both directions (notice the
values of dihedral angle y, and this bias creates the twist flare to the hyperboloid in Figure 6–11). In a b barrel of
in the sheet.71 It may simply be the case, however, that eight strands, the strands are tilted77,86 with respect to the
twisted b sheets have surfaces against which other seg- axis of the hyperboloid by a mean angle of –34∞ to –47 ∞
ments of secondary structure, such as a helices, can be (the mean angle of tilt in Figure 6–11 is –34∞ ), but in b bar-
more efficiently packed and that packing efficiency dic- rels of less than eight strands, the angle of tilt gradually
tates the hand and magnitude of the twist because increases to –43 ∞ to –59 ∞ when there are only five.77 As in
b sheets almost as flat and regular as the idealized ver- a normal b sheet (Figure 6–9), the sheet that forms the
sion (Figure 4–16B,C) have been observed.72 hyperboloid has a negative twist and the mean angles of
Another feature of b structure is the b bulge.73 In twist between adjacent strands are between –21∞ and –30 ∞
Secondary Structure of the Polypeptide Backbone 261
194
218
287
263
molecular model (PDB filename 20HX; Bragg spacing ≥ 0.18 nm) of
316
292
312
238
alcohol dehydrogenase.70 The 12-stranded b sheet is composed of
218
199
268
238
six parallel strands from each of the subunits of the dimer joined in
an antiparallel orientation. The two identical series of numbers are
those for the respective amino acid sequences of the two identical
194
263
polypeptides comprising the protein. This drawing was produced
223
287
with MolScript.573
312
316
243
199
in barrels of eight strands (the mean angle of twist in Figure
223
243
292
268
6–11 is –25∞ ), but this angle increases to between –28∞ and
–44 ∞ in b barrels of five strands. b Barrels can be con-
structed from parallel b strands of polypeptide of identi-
cal sequence,87,88 from an antiparallel b sheet wrapped into
a cylinder,83,84,89 or from two identical sheets of parallel
b strands arranged antiparallel to each other,90 but the
194
218
287
263
316
most common arrangement is parallel b strands of non-
292
312
268
199
238
identical sequence. In such parallel b barrels the strands
218
238
are often distributed around the barrel in the order in
which they occur in the sequence of the polypeptide, and
194
the carboxy-terminal end of one strand is connected to the
263
223
amino-terminal end of the next by an a helix. Such b bar-
287
312
rels are designated (ab)n, where n is the number of
316
243
b strands.
199
243
223
292
268
The b helix displayed in Figure 6–12 is one in which
there are three b sheets running up the tube at roughly
60 ∞ angles to each other. This configuration seems to be
the most common type, but there are b helices in which
only two b sheets run up the tube on opposite sides and
the two sheets are flattened against each other.91,92
Extrusions of random meander (amino acids 167–175 in
Figure 6–12) are common features of b helices. There is
also an example of a hybrid structure in which each of the
b strands in one of the three b sheets in a b helix is
replaced by an a helix.93
A third regular structure, in addition to a helices
and b structure, universally encountered in the crystallo-
graphic molecular models of proteins is the b turn. A
b turn is any structure that has a hydrogen bond between
the acyl oxygen of the first amino acid in the turn and the
amido nitrogen–hydrogen of the fourth amino acid in the
turn (Figure 6–14).94 Usually such a hydrogen bond
molecular
type II is represented by the b turn in Figure 4–15D with
its second acyl oxygen out of the page. Each of these two
fundamental types could also be built with molecular
models in such a way that each of their four dihedral
angles, f2, y2, f3, and y3, had the opposite sign, respec-
232
235
87
Table 6–1: Frequency and Dihedral Angles of the Most Common Types of b Turns
a
These are the frequencies in which these types of b turn occur in 59 crystallographic molecular models built from data sets gathered to Bragg spacing of <0.2 nm.96 bMean
and standard deviations of the dihedral angles for amino acids i + 1 and i + 2 in the b turns from the crystallographic molecular models of lysozyme,22 a-lytic protease,97
deoxyribonuclease I,8 and penicillopepsin.98 Values from crystallographic molecular models built from data sets gathered to even narrower Bragg spacing99,100 fall within
these ranges. cDoes not include segments judged to be 310 helix. If these had been included, the frequency of b turns of type I would rise to 50%.
Secondary Structure of the Polypeptide Backbone 263
160
in crystallographic molecular models of proteins
assigned as 310 helix23 are four or less amino acids in
length, so most instances of 310 helix could as easily be
140
assigned as b turns of type I. Usually, however, they are
120
200
not classified as such, and if not they are assigned as
either 310 helix or b turns of type III, depending on the
180
preferences of the crystallographer. For every segment of
amino acids assigned as 310 helix or b turn of type III
instead of b turn of type I, there are about 4.5 b turns of
type I23,96 so the confusion is not a major one.
In crystallographic molecular models, b turns are
designated both by the existence of a hydrogen bond
between the acyl oxygen on the first amino acid and the
amido nitrogen–hydrogen on the fourth amino acid and
by the proximity of the a carbons of the first and fourth
160
amino acids. In general, these two a carbons are
0.5–0.6 nm apart.105 Those configurations designated by
140
these rules as b turns can be grouped into the categories
120
200
proposed by Venkatachalam (Table 6–1) as well as sev-
eral other minor categories.96 It was only after refined
180
crystallographic molecular models became available
that the clear tendency of these structures to fall into
specific categories became apparent, because in unre-
fined structures the orientation of the polypeptide back-
bone could not be defined with sufficient accuracy.
b Turns of type I are the most common (Table 6–1).
The dihedral angles at both of the a carbons in b turns of
type I fall in the enclosure on the Ramachandran plot
between dihedral angles of f = –50 ∞ and –130 ∞ and y = there are exceptions, about half of which are asparagines
20 ∞ and –30 ∞ (Figure 6–4B). This is the region in which such as Asparagine 69 in a-lytic endopeptidase.97
the two successive amides are squeezed against each The mirror image conformations, in which the
other (Figure 6–3F). Presumably the return on the invest- polypeptide backbone mirrors the respective basic
ment of energy necessary to squeeze them against each b turn but the amino acids remain, of necessity, L-amino
other and widen the tetrahedral bond at the a carbon is acids, are rare. The third amino acid in a b turn of type IA
the efficient reversal of the direction of the polypeptide. and the second amino acid in a b turn of type IIA should
It is probably the case that b turns of type I and segments be a glycine, but again a few exceptions have been
of 310 helix23 account for most of the amino acids that fall observed, such as Cysteine 170 in deoxyribonuclease I.8
in this well-populated region of the Ramachandran plot. It has been noted106,107 that when an antiparallel b hair-
The values for the dihedral angles f2 and y2 for pin reverses itself in the tightest possible b turn, where
b turns of type II fall in the largest enclosure on the the hydrogen bond of the b turn is also the last hydrogen
Ramachandran plot, but those for the dihedral angles f3 bond between the tines of the hairpin, the b turn is usu-
and y3 fall in a region that can be occupied only by an ally type IA or type IIA.
amino acid without a b carbon (Table 6–1, Figure 6–4A), Several minor classes of b turn have been defined.
so only glycine should occupy the third position in a b Turns of types VIA and VIB with dihedral angles f and
b turn of type II. Although this is usually the case (74%),96 y of (–60 ∞ ± 30 ∞ , 120 ∞ ± 30 ∞ , –90 ∞ ± 30 ∞ , 0 ∞ ± 30 ∞ ) and
264 Atomic Details
much less tightly clustered than those for types I and II.108
was produced with MolScript.573
mostly long stretches of random meander. This drawing
and forming connections within blades six and eight are
The missing segments of polypeptide connecting the blades
blade is presented in its entirety, its polypeptide unbroken.
tide. With the exception of the sixth and eighth blades, each
blade, which is the amino-terminal b strand of the polypep-
with the exception of the outermost b strand of the last
same order in which they occur in the amino acid sequence
in which the blades occur around the propeller is also the
carboxy-terminal strand is toward the outer edge. The order
the amino-terminal strand is toward the center and the
b strands that occur consecutively in the sequence; in each,
the propeller. Each blade is composed of four antiparallel
b sheets forming each blade have been incorporated into
is an antiparallel b sheet. The left-handed twists of the
This b propeller is composed of eight blades, each of which
drogenase from Methylophilus methylotrophus W3A1.80,569
ular model (Bragg spacing ≥ 0.19 nm) of methanol dehy-
Figure 6–13: b Propeller within the crystallographic molec-
Aside from the requirement that glycine occupy
certain positions of a b turn for steric reasons, there are
some clear preferences96,103,109 for other amino acids.
Because b turns are almost always at the surface of a pro-
tein, they contain hydrophilic amino acids more fre-
quently than hydrophobic amino acids. About 25% of all
b turns have proline at their second position (Figure
6–55). About 30% of all b turns of type I have either aspar-
tate, asparagine, or cysteine at their third position. Each
of these three amino acids has a hydrogen-bond accep-
tor that is properly situated to accept a hydrogen bond
from the amido nitrogen–hydrogen of the amino acid in
the next position just beyond the b turn.110
A g turn is another type of turn that occurs rarely in
crystallographic molecular models of proteins.111,112 A
g turn has a hydrogen bond between the nitrogen–hydro-
gen of the amide of the first amino acid in the turn and
the acyl oxygen of the third amino acid in the turn, caus-
ing the dihedral angles f and y of the central amino acid
of the three to be around 80 ∞ and –60 ∞ , respectively,
which is presumably why such structures are so rare
136
(Figure 6–4B).
83
51
74
126 565
486
174
466
74
126
117
gen–hydrogen of Glycine 20 in
Figure 6–15: Bond angles for the hydrogen bonds in regular structures
A C formed by a folded polypeptide. (A) Bond angles at the amido nitrogen–
O hydrogen. Two angles are defined, the angle g ¢ within the plane of the
amide and the angle b ¢ out of the plane of the amide.22 When g ¢ is 0 ∞, the
C acyl oxygen is in the plane that is normal to the plane of the amide and
N H b¢ that contains the nitrogen–hydrogen bond. When b ¢ is 0 ∞ , the acyl oxygen
is in the plane of the amide. (B) Distribution of these angles. The plot is for
Ca g¢ all of these bond angles for the hydrogen bonds between the peptide
bonds in the crystallographic molecular model of a-lytic endopeptidase.97
Symbols are (3) b structure, (¥) a helix, and (*) random meander.
Reprinted with permission from ref 97. Copyright 1985 Academic Press.
(C) Bond angles at the carbon–oxygen bond of the amide. Two angles are
B b¢ defined, the angle g within the plane of the amide and the angle b out of
80 the plane of the amide. Angles are defined relative to the axis of the
carbon–oxygen bond and the plane of the amide. (D–G) Distribution of
these angles. These angles at each hydrogen bond involving the peptide
bonds in the crystallographic molecular models of 15 proteins113 are
plotted for hydrogen bonds in a helices (D), b turns (E), parallel b struc-
ture (F), and antiparallel b structure (G). Each mark is for the angles b and
g of one of the hydrogen bonds included in the set. Reprinted with per-
mission from ref 113. Copyright 1984 Pergamon Press.
–80 80
g¢
–80
E b
b 80
D
60
–60 60 –80 80
g g
C N –60
H
–80
N b b
F G
C Ob 60 60
Ca g
–60 60 –60 60
g g
–60 –60
Stereochemistry of the Side Chains 267
40
Ser Thr Val
30
Number
20
10
0
Leu Asn 60 180 300
300 Ile Gln c1
240
c2
180
Aromatics
120
90
60
0
60 180 300 60 180 300 60 180 300
c1 c1 c1
Figure 6–17: Histograms and scatter plots of the distributions of
the values for the dihedral angles c1 and c2 for the first two than the acyl carbon of the following amide (6% are within
carbon–carbon bonds of the side chains of amino acids in the crys- 30 ∞ of 63 ∞ ). This behavior is completely consistent with
tallographic molecular models of five proteins: penicillopepsin, the assessment of steric bulk based on preferences of var-
streptogrisin A from Streptomyces griseus, streptogrisin B from
ious substituents on cyclohexane for equatorial over axial
S. griseus, the third domain of the ovomucoid inhibitor, and a-lytic
endopeptidase from Lysobacter enzymogenes.98 The abbreviation of locations. The increase in free energy125 for placing an
each side chain appears in the upper left-hand corner of the panel. acetoxy group in an axial location rather than an equato-
Serine, threonine, and valine had no observable dihedral angles c2, rial location is 2.9 kJ mol–1, but the increase in free energy
so in these instances frequency is plotted as a function of the only for placing a methoxycarbonyl group in an axial location
value of the dihedral angle c1. Leucine, isoleucine, asparagine, and
rather than an equatorial location is 5.4 kJ mol–1.
glutamine had observable dihedral angles c1 and c2, and in these
cases each mark (Y for leucine, + for isoleucine, Y for asparagine, Isoleucine (Figure 6–18) reinforces these prefer-
and + for glutamine) represents the value of these two angles for ences by showing a similar distribution126 of analogous
one of these side chains in these molecular models. Because of stereochemical conformations (76% within 30 ∞ of –64 ∞ ,
symmetry, the values for c2 for the aromatic amino acids tyrosine, 14% within 30 ∞ of 61 ∞ , and 10% within 30 ∞ of –173 ∞ ,
phenylalanine, and tryptophan (listed together as aromatics) fall
respectively). It has been suggested that isoleucine (+ in
only between 0 ∞ and 180 ∞ . Reprinted with permission from ref 98.
Copyright 1983 Academic Press. Figure 6–17) has different preferences from leucine (Y in
Figure 6–17) for the dihedral angles c1 and c2 because
assess the steric bulk of the three substituents on the these two geometric isomers should be able to satisfy in
a carbon because in each of the three staggered confor- turn different steric requirements.127
mations (two of which are displayed in Figure 6–18), one Threonine is isosteric with valine, but the designa-
of these three substituents must reside between the two tion of the dihedral angle c1 of threonine is 240 ∞ out of
methyl groups, a most hindered location. Because the phase with that of the dihedral angle c1 of valine because
smallest functional group should occupy this position of the precedence of the (S)-oxygen over the (R)-methyl
most frequently, the distribution of the dihedral angles c1 group (Figure 6–18). The conformation of threonine (c1
of the valines in molecular models (Figure 6–17) states within 30 ∞ of 59 ∞ , 49% of all threonines) in which the two
that the hydrogen on the a carbon is smaller (73% of c1 substituents on Cb surround the nitrogen of the preced-
are within 30 ∞ of 175 ∞ )* than the nitrogen of the preced- ing amide (Figure 6–18) is about 4 times more frequent
ing amide (20% are within 30 ∞ of –64 ∞ ), which is smaller than the analogous conformation of valine (c1 within 30 ∞
of –64 ∞ , 20% of all valines) relative to the respective con-
formations (43% and 73%) in which hydrogen is sur-
* Values of dihedral angles c1 designated as within 30 ∞ of the angle
at the maximum of the distribution are from the tabulation derived
rounded. The most likely explanation for this difference
from an analysis of 240 crystallographic molecular models built is the fact that a hydroxyl group is significantly smaller
from data sets all to Bragg spacing less than or equal to 0.17 nm.122 than a methyl group. Another possibility, however, is that
Stereochemistry of the Side Chains 269
are preferred (71% of these side chains).121 This orienta- the value of the dihedral angle c1 for that asparagine, the
tion (Figure 6–21) places the polypeptide most distant type of secondary structure to which it belongs, and the
from the two ortho substituents of the rings and also type of hydrogen bond formed between it and the back-
avoids eclipse. A significant fraction (29%) of these side bone. Asparagine 34 in Figure 6–7B serves as an example
chains, however, have values for c2 outside of these of one of these choices. Aspartate shows the same pref-
ranges. The aromatic rings are large, bulky substituents erence for values of the dihedral angle c2. The majority
and each of these outliers is pushed out of the ideal range (82%) of the dihedral angles c2 for aspartates are within
by unavoidable steric clashes with atoms from the back- 60 ∞ of 0 ∞ .122
bone or other side chains.121 In spite of their bulk, how- Only glutamine, glutamate, methionine, lysine, and
ever, the symmetric rings of tyrosine and phenylalanine arginine have carbon–carbon bonds with dihedral
have been observed to flip over slowly and continu- angles c3. Both glutamine and glutamate show the same
ously.129 preferences for dihedral angles c3 near 0 ∞ that are shown
It has also been observed in refined maps of neu- by asparagine and aspartate for the analogous dihedral
tron scattering density from crystallography by neutron angle c2.122 The trans conformation with dihedral
diffraction that the hydrogens on all methyl groups are angle c3 within 30 ∞ of 180 ∞ is the preferred (66%) confor-
staggered.130 Although this seems to be the expected mation for lysine, as expected.122
result because methyl groups in proteins should be free Methionines are usually buried and confined to
to rotate and assume freely a staggered conformation, one or two overall conformations in crystallographic
there are indications that packing in the interior of a pro- molecular models, so the dihedral angles c1 (Ca–Cb), c2
tein is so tight that even methyl groups are confined 131 (Cb–Cg), and c3 (Cg–S) are usually fixed. The normal pref-
The dihedral angles c2 for the hydroxyl groups of erences for dihedral angles c1 (59% within 30 ∞ of –67 ∞ )
serines and threonines and c2 (55% within 30 ∞ of 178 ∞ )122 are observed, but the
value for angle c3
H
H H c
c 3
2 H H
H N CH 3
H
O 6–8
6–6
has a significantly higher frequency for the two gauche
although they define the position only of a hydrogen, can conformers (39% within 30 ∞ of –72 ∞ and 32% within 30 ∞
also be observed by neutron diffraction.132 There is a of 75 ∞ ).122,131 It has been pointed out that because the two
strong tendency for the hydroxyl to be staggered (c2 near carbon–sulfur bonds (0.18 nm) are longer than two
60 ∞ , 180 ∞ , and –60 ∞ ) with the trans conformation (c2 carbon–carbon bonds (0.15 nm), the steric clashes
near 180 ∞ ) slightly preferred over either gauche confor- within methionine in such gauche conformations should
mation. The location of an acceptor forming a hydrogen be less severe and the dihedral angles c3 should be less
bond with the proton often seems to dictate the dihedral confined.134 It seems that this unexpected preference for
angle assumed by the hydroxyl. Because of conjugation, the gauche conformations arises from the fact that when
the oxygen–hydrogen bond of the hydroxyl of tyrosine is methionine assumes this conformation, it is more com-
within the plane of the ring (2–23). pact.
The distribution of the values of the dihedral Because the amido nitrogen is planar, it occupies a
angle c2 for asparagine position in the puckered cyclopentyl ring of a proline
(Equation 6–1) at which eclipse would occur if it were
NH 2 occupied by a methylene. As a result, only the Cg-exo and
H H Cg-endo conformations of proline
c
O 2 O H O H
H N N H H
H
H H
H 1 N H H (6–2)
O H H HH
H
6–7
Cg-endo Cg-exo
has two maxima at –21 ∞ and +32 ∞ (82% within 30 ∞ of
these two maxima) that define two respective can be significantly populated,135 but it is difficult to dis-
classes,122,133 the membership of which is determined by tinguish crystallographically between these two confor-
Stereochemistry of the Side Chains 271
crystallographic
in equilibrium with each other.136
Cystine is an amino acid under peculiar steric con-
straints (Figure 6–19).137,138 The distribution of the two
close to +90 ∞ .
dihedral angles c1 of cystine26,121 shows the same order
and frequencies of preferences (56% within 20 ∞ of –65 ∞ ,
24% within 20 ∞ of –175 ∞ , and 12% within 20 ∞ of 64 ∞ )121
the
as those of any other amino acid with only one uncom-
plicated substituent on the b carbon. The disulfide itself,
because it is a dithioperoxide, is electronically required
to have a dihedral angle c3 along the sulfur–sulfur bond
GLU 215
GLU 211
GLU 222
ASP 226
similar to the dihedral angle of hydrogen peroxide, which
is 94 ∞ or –94 ∞ . If the dihedral angle c3 in a cystine were
exactly 90 ∞ or –90 ∞ , the four lone pairs, two on each
sulfur, would be as far from being parallel to each other
as is possible, and this orientation would be the most
stable electronically:139
Cb
90∞
TRP 231
ILE 238
:
ASN 241
VAL 234
S Cb
: :
:
6–9
GLU 215
GLU 222
ASP 226
GLU 211
The angles observed for the dihedral angles of cystines in
crystallographic molecular models are 97 ∞ ± 15 ∞ and
–86 ∞ ± 11 ∞ (indistinguishable from +90 ∞ and –90 ∞ ), with
little preference for the positive over the negative.140
There are instances in which these two discrete, alterna-
tive conformations are both populated significantly by
the same cystine (Figure 6–40B).141,142 The dihedral
angle c3 of 97 ∞ or –86 ∞ in a cystine is peculiar enough to
TRP 231
ILE 238
or decreasing, respectively, the dihedral angles c2 of Lovell, S.C., Word, J.M., Richardson, J.S., & Richardson, D.C. (2000)
Cysteines 217 and 224 or Cysteines 233 and 240 or all of The penultimate rotamer library, Proteins: Struct., Funct., Genet.
them together. 40, 389–408.
If both of the dihedral angles c2 were 180 ∞ , the
angle preferred by other amino acids with a single sub- Problem 6–2: Turn the alanine into a valine in your
stituent on the b carbon, then the two a carbons in a cys- space-filling molecular model from Problem 6–1 by
tine would be about 0.9 nm apart, a rather distant replacing two of the hydrogens on the b carbon with
connection. Because the distances between the two methyl groups. Rotate around the appropriate bonds
a carbons of cystines in native proteins fall between 0.45 until the dihedral angles f and y have the mean values
and 0.7 nm,103 the dihedral angles c2 assume many other for an amino acid in parallel b structure.
values, and rarely 180 ∞ .
So far, the dihedral angles c have been considered (A) What atoms run into the two methyl groups on
independently. Usually, however, the individual the side chain of the valine as rotation occurs
rotamers of a side chain are tabulated.121,122 A rotamer is around the bond between the a and b carbons?
a rotational conformation of a side chain in which each Use the numbering system of Figure 6–2.
carbon–carbon bond assumes a dihedral angle c within a (B) What is the value for the dihedral angle c1 that has
particular range. The range is within a certain number of the most sterically favorable disposition of the
degrees, for example, within 20 ∞ 121 or within 30 ∞ ,122 of side chain in b structure?
one of the maxima for the distribution, which are usually
close to the staggered dihedral angles of 60 ∞ (gauche+), Rotate around the appropriate bonds until the dihedral
180 ∞ (trans), or –60 ∞ (gauche–). For example there are angles f and y in your model have the mean values for a
nine rotamers of isoleucine, which are, in order of their right-handed a helix.
frequency, g-t, g-g-, g+t, tt, tg+, g-g+, g+g+, g+g-, and tg-. (C) What atoms run into the two methyl groups on
Such tabulations of rotamers emphasize the dependence the side chain of the valine as you rotate around
of one dihedral angle on the adjacent dihedral angles. the bond between the a and b carbons? Again, use
There is, however, no agreement as to the statistical the numbering system of Figure 6–2.
method that should be used to determine rotamers, their
variances, or their distributions. (D) What is the value for the dihedral angle c1 that has
All of the stereochemical observations discussed so the most sterically favorable disposition of the
far are either consistent with the behavior of small mole- side chain in a right-handed a helix?
cules or otherwise make sense. Some of this agreement is The theoretical values for the dihedral angles f and y for
probably illusory. During refinement, constraints on a left-handed a helix should be 65 ∞ and 40 ∞ , respec-
dihedral angles are imposed either advertently or inad- tively. Rotate around the appropriate bonds until the
vertently, and the fact that they are near ideal values in dihedral angles f and y in your model have these values.
the final crystallographic molecular model may not
reflect reality. Careful corrections using properly calcu- (E) What atoms run into the two methyl groups on
lated omit maps should eliminate this bias, and crystal- the side chain of the valine as you rotate around
lographic molecular models derived from data sets the bond between the a and b carbons?
gathered to narrow Bragg spacing, for which few con- (F) What is the value for the dihedral angle c1 that has
straints need to be imposed during refinement, can avoid the most sterically favorable disposition of the
this problem entirely. valine side chain in a left-handed a helix?
If there are only a few conformations that are pre-
ferred for each side chain, then there is far less flexibility (G) On the basis of these observations, why is a left-
involved in the folding of a protein than there seems to handed a helix unstable relative to a right-handed
be at first glance. Conformations of the folded protein a helix?
that demand dihedral angles to assume values other than
those of lowest energy, which are normally the confor- Problem 6–3: Draw Newman projections to explain why
mations most heavily populated in the unfolded the dihedral angles c2 for leucine and isoleucine display
polypeptide, require that extra energy be spent to occupy such a strong preference for 180 ∞ (Figure 6–17).
those conformations. It turns out that there is not much
extra energy to go around.
its interior, which is more or less withdrawn from the choice is arbitrary, especially because locations to which
water. This bias reflects their hydropathy. only a single molecule of water can gain access experi-
The accessible surface area of a molecule of pro- ence diminished effects of the solvation arising from the
tein can be estimated by asking a digital computer to per- bulk properties of water.
form a calculation equivalent to rolling a sphere of a Usually a particular amino acid in a crystallo-
particular size, the probe, over the surface of a space-fill- graphic molecular model is designated as buried if less
ing crystallographic molecular model (Figure 4–17E) of than a certain amount of its surface area is accessible. It
that protein.144 The center of the spherical probe will has been shown145 that when the radius of the probe is
trace a surface, and the area of that surface, removed set at 0.15 nm
from that of the surface of the protein by a distance equal
to the radius of the sphere, is defined to be the surface n ˝ – n B˝ = k (6–3)
area of the protein accessible to the probe. Each portion
of the irregular surface defined by the center of the probe where n is the total number of amino acids in a protein,
can be assigned to a particular atom in the crystallo- nB is the number designated as buried by a particular
graphic molecular model by noting with which atom the rule, for example, every amino acid with an accessible
probe was in contact when that portion was being cre- surface area less than 0.2 nm2, and k is a constant. This
ated. equation states that, in a globular protein, the amino
The surface of a molecule of protein is not smooth, acids defined by a rule as buried are found within a
but highly irregular, covered with cracks, crevasses, cav- roughly spherical solid of radius rB that is smaller than
ities, and ridges (Figure 4–17E).144,145 One way to demon- the roughly spherical solid of radius r containing all of
strate this fact is to vary the radius of the probe (Figure the amino acids:145
6–20). When the probe is large (≥1.5 nm in the example
chosen) the crystallographic molecular model is indistin-
guishable from a hard sphere (a sphere of radius 4.55 nm
in the example chosen), but as the radius of the probe is
r
decreased, much more surface area becomes accessible rB d
(the difference between the points and the curve in
Figure 6–20) as the probe becomes small enough to enter nB
the irregularities of the surface. The radius usually
chosen for the probe,144 in an attempt to mimic a mole- n – nB
cule of water, is 0.15 nm (the arrow in Figure 6–20). The
6–11
Table 6–2: Removal of Amino Acids from Water in Molecular Models of Proteins
fraction buriedb
hydrophobic
Ile 1.80 0.18 0.60 0.76 0.90
Val 1.60 0.18 0.54 0.74 0.88
Phe 2.20 0.14 0.50 0.69 0.89
Leu 1.80 0.16 0.45 0.71 0.87
Met 2.05 0.11 0.40 0.66 0.83
Ala 1.15 0.20 0.38 0.63 0.78
Trp 2.60 0.04 0.27 0.62 0.88
apathetic
Ser 1.20 0.08 0.22 0.46 0.62
Thr 1.45 0.08 0.23 0.41 0.60
His 1.95 0.02 0.17 0.44 0.78
Tyr 2.30 0.03 0.15 0.34 0.74
hydrophilic
Glu 1.85 0.03 0.18 0.24 0.74
Asp 1.50 0.04 0.15 0.27 0.67
Asn 1.60 0.03 0.12 0.30 0.61
Gln 1.90 0.01 0.07 0.23 0.61
Lys 2.10 0 0.03 0.05 0.60
Arg 2.40 0 0.01 0.10 0.51
a
For entire amino acid, both its side chain and its contribution to the backbone, in the tripeptide Gly-X-Gly.146,147 bFraction of the total number of that amino acid in a series
of crystallographic molecular models that are buried by the noted criterion. cReference 148. dReference 145.
amino acids are buried is very low and the statistics stereochemically and energetically protein folding is not
become unreliable. When the rule is relaxed, more amino a transfer between solvents. With this in mind, it still can
acids are scored as buried, and discriminations become be stated that if an amino acid is hydrophobic, it is more
more dependable. likely to be buried, and if it is hydrophilic, it is more likely
Half of the amino acids, when they are in an to remain in contact with the water in the folded
unfolded polypeptide and freely accessible to a probe of polypeptide.
0.15 nm, have total accessible surface areas between 1.5 The results of the hydrophobic effect are most read-
and 2.0 nm2 and therefore are of similar size (Figure ily appreciated by examining the internal core of a crys-
4–14). Small amino acids such as alanine are probably tallographic molecular model (Figure 6–21).150 This
buried more often simply because they are easier to sur- region is enriched in definitively hydrophobic amino
round, and large amino acids such as tryptophan are acids such as leucine, isoleucine, valine, and phenylala-
harder to surround completely and bury, especially in nine. An even more dramatic example of a hydrophobic
the smaller proteins. These stereochemical problems core is the center of a b helix, which is completely walled
must contribute to the observed distributions. off from the water by the backbone of the helix and which
Nevertheless, it has already been noted that the fre- is composed exclusively of aliphatic side chains.151
quencies with which the various amino acids are buried Each of the hydrogen–carbon bonds on the side
are correlated149 with the free energies of transfer for chains that are removed from water provided favorable
their model compounds from water to the gas phase hydrophobic free energy to drive the folding of the
(Table 5–9) and also with many of the other scales of polypeptide. This fact can be verified by performing site-
hydropathy (Figure 5–24). This correlation is actually directed mutation. So that no adverse steric effects are
established by the three main groups of side chains encountered, either an isoleucine found in the core of a
(Table 6–2): those that are hydrophobic, those that are crystallographic molecular model of the protein is short-
apathetic, and those that are hydrophilic (note the three ened by converting it to a valine or an alanine, a leucine
clusters in Figure 5–24). Within each of these groups, or a valine in the core is shortened by converting it to an
however, there is no significant correlation between alanine,152–156 or a position in the core next to a cavity is
extent of burial and any scale of hydropathy derived chosen and the mutants designed so that they expand
from free energies of transfer. Presumably, the reason for into the cavity.157 The change in the standard free energy
the lack of correlation within the main groups is that of folding produced by the various mutations is then
Hydropathy of the Side Chains 275
crystallographic
to the standard free energy of folding, with most of the
values clustered155,156,158–161 between –2.5 and –3.5 kJ (mol
with MolScript.573
of hydrogen–carbon bond)–1. These values encompass*
and are indistinguishable from the value of –2.8 kJ (mol
of hydrogen–carbon bond)–1 for the transfer of hydrocar-
bon from water to liquid hydrocarbon (Figure 5–21).154
the
The interior of a protein is enriched in hydrophobic
amino acids because this is the only way to obtain the
free energy necessary to drive its folding.
When a hydrophilic amino acid such as lysine or
829
696
glutamate is introduced into the interior of a molecule of
809
824
protein by site-directed mutation, the protein becomes
818
significantly less stable. For example, when Methionine
102 and Leucine 133 in lysozyme from bacteriophage T4,
which are both buried in its interior, were replaced in
802
838
742
turn with a lysine and an aspartate, respectively, the pro-
762
785 725
tein became less stable by 29 and 24 kJ mol–1. The region
794
surrounding the new lysine at position 102 became much
more mobile to permit limited access of the lysine to the
711
778
746
solvent, and its pKa was found to be 6.5, a shift indicating
that the neutral conjugate base had become more stable
than it would have been if it were fully exposed to the 754
water.163 When Valine 16 of ribonuclease T1 from
Aspergillus oryzae is replaced with its isostere threonine, 829
696
809
824
802
838
directed.164 Often, however, when valine is replaced by
742
785 725
762
(<4 kJ mol–1).165
When a hydrophobic amino acid ends up fully
exposed on the surface, this exposure is neither energet-
754
cular orbitals,168 but just as often aromatic and aliphatic ual atoms, rather than covalent bonds, has been pre-
amino acids intermingle. One interesting aspect of the sented that can reproduce the original scale of hydropa-
cluster of aromatic amino acids in type I cohesin domain thy with acceptable precision. Presumably every scale of
(Figure 6–21) is that it illustrates the tendency of two hydropathy presently in use can be so dissected.
phenyl rings, in isolation from water, to form a complex Free energies of transfer for model compounds of
in which the planes of the two rings are at around 90 ∞ to tryptophan or tyrosine between water and a solvent
each other.169 This orientation is commonly observed in such as 1-octanol (Figure 5–24)174 or ethanol,175 as
the crystallographic molecular models of proteins, and opposed to free energies of transfer between water and
theoretical calculations for benzene dimers in the gas the gas phase (Table 5–9), have always suggested that
phase suggest that it is the energetically favored arrange- tryptophan and tyrosine should be more hydrophobic
ment.170 than they seem to be when they are found in a protein
One of the most notable features of the accessibili- (Table 6–2). The explanation for this is probably that sol-
ties of the amino acids in the native structure of a protein vents such as ethanol and 1-octanol are able to form
is that, in the folded polypeptide, the accessible surface hydrogen bonds with the one donor on the indole and
areas of all types are less than they were in the unfolded the donor and acceptor on the phenol, making them
polypeptide (Table 6–2). The accessible surface areas more soluble in these solvents than they would be in a
tabulated are for the entire amino acid in a polypeptide, hydrocarbon and making them seem more hydrophobic
both side-chain and backbone segments. Usually, the than they are. This would be consistent with the obser-
backbone is buried before the side chain, so a significant vation that it is the hydroxyl on tyrosine that usually
portion of the mean fraction of surface buried for each remains in contact with the water in crystallographic
type of amino acid is accounted for by this fragment molecular models.17 Because of the requirement that the
common to all of the amino acids. This backbone por- nitrogen–hydrogen bond in the indole of tryptophan also
tion, however, cannot account for more than be hydrogen-bonded, the portion of the side chain con-
0.6–0.7 nm2 of buried surface area because the accessi- taining this bond is also usually in contact with the water.
ble surface area of glycine in a polypeptide148 is only But indole is large and the rest of it is usually buried. As a
0.75 nm2. Therefore even the most accessible side result, it is only in the last column of Table 6–2 that the
chains, arginine, lysine, and glutamine (with a mean hydrophobicity of tryptophan is manifest.
buried surface of 1.2 nm2), have more than 0.5, 0.55, and The neutral, but hydrophilic, amino acids gluta-
0.45 nm2 of the surface area of their side chains buried, mine and asparagine are straightforward examples of
respectively. The regions of each of these hydrophilic the effect of the hydrogen bonds formed with water in
side chains that are buried are usually their the unfolded polypeptide on the location of that amino
hydrogen–carbon bonds. For example, in the crystallo- acid in the folded polypeptide. Complete withdrawal of
graphic molecular model of the complex between the the two hydrogen-bond donors on glutamine or
Ha-ras oncogene product p21 and its substrate, the asparagine from water during folding would result in a
amino group of Lysine 117 is engaged in several hydro- net disappearance of two hydrogen bonds from the solu-
gen bonds, but its butyl group is fully buried just as tion. The difficulty of simultaneously regaining both of
would be the side chain of a leucine.171 these lost hydrogen bonds in the interior of the protein
There should be a normal hydrophobic effect asso- seems to be great enough that the primary amides in the
ciated with the removal of the butyl group of lysine, the side chains of most of the glutamines and asparagines in
propyl group of arginine, and the ethyl groups of gluta- a protein end up in the folded polypeptide fully exposed
mine and glutamate from exposure to water even though to the aqueous phase.126
these side chains in their entirety are among the most It might be supposed that buried hydrogen bonds
hydrophilic on all of the scales of hydropathy. To assess between side chains on different segments of secondary
the hydrophobic effect that was contributed by burying structure would be important factors because these
these portions of each of these amino acids, as well as all would be capable of organizing significant regions of the
of the others, the contribution of each atom in an amino protein.148 Of the rarely buried hydrogen bonds between
acid to its overall hydropathy should be extracted. From side chains,40 however, only about 20% are the type that
these atomic parameters, the free energy of transfer for connect different segments of secondary structure; the
only those portions of each amino acid that are actually other 80% connect donors and acceptors within the
buried could be calculated. It was noted172 that free ener- same a helix, b sheet, or b turn.148 The steric require-
gies of transfer for individual solutes between water and ments of packing the secondary structures efficiently and
the gas could be dissected into the individual contribu- avoiding empty space are far more important than
tions of each covalent bond that they contained. A simi- hydrogen bonds in positioning the segments of second-
lar dissection has since been performed upon the set of ary structure and organizing the overall structure of the
free energies of transfer for the N-acetyl-a-amides of the protein, and the few buried hydrogen bonds that do
amino acids between water and 1-octanol.173 In this occur between segments of secondary structure are
latter dissection, a series of parameters based on individ- probably adventitious. It is the interdigitation of the side
Packing of the Side Chains 277
chains protruding from b sheets and a helices that ori- sum of their van der Waals radii, and examining the
ents these secondary structures. details of the packing of side chains in the interior of a
crystallographic molecular model of a protein is one way
Suggested Reading to estimate values for van der Waals radii.183 The spheres
defined by the van der Waals radii define the van der
Chothia, C. (1976) The nature of the accessible and buried surfaces Waals surface of a molecule:*
in proteins, J. Mol. Biol. 105, 1–14.
solvent-accessible
water surface
Packing of the Side Chains 0.15 nm
a
Values are averaged from various tabulations179–183 and expressed to nearest
5 pm, which may overstate their accuracy.179 * Reprinted with permission from ref 184. Copyright 1996
Academic Press.
278 Atomic Details
and f.131 Even though it might seem to be the case from noncovalent force minimizing the empty space within a
an examination of Figure 6–4B, not all of the glycines in a molecule of protein can be considered to be a conse-
protein, however, are handling unavoidable steric prob- quence of the hydrophobic effect, if the hydrophobic
lems because many can be replaced with larger amino effect is defined as the tendency of water to minimize the
acids by site-directed mutation without affecting the volume of the cavity occupied by any solute.
function of the protein.185 Although there are a few proteins in which the
Another consequence of the economy in arranging folded polypeptide forms a knot,191,192 the packing of
the side chains of the amino acids is that the volume of a every other protein appears to result from the consecu-
molecule of protein is quite small relative to its molecu- tive layering of one element of secondary structure upon
lar mass. The volume occupied by the atoms in a mole- another, much as one would fold a cloth or a hinged rod.
cule of protein can be calculated by summing all of its a Helices pack upon a helices, b sheets pack upon
individual atomic volumes defined by the van der Waals b sheets, and a helices pack upon b sheets. In all of these
surface, and the actual volume of the molecule can be situations the secondary structures take up orientations
calculated from its partial specific volume and its molec- with respect to each other that permit the side chains
ular weight. From this calculation it is learned that 75% that protrude from each of them to interdigitate (Figure
of the volume of a molecule of protein is occupied by 6–22).193,194 This interdigitation is the reason that there is
atoms.180,184,186 By comparison, in most organic liquids very little vacant space in the interior of a molecule of a
only about 45% of the volume is occupied by atoms and protein. If it can be assumed that the configuration of
in water only 36%, but in a solid of hexagonally packed minimum volume is the preferred configuration in the
spheres 75% of the volume would be filled by atoms.186 In condensed phase, then these interdigitations promote
anhydrous crystals of small organic molecules, 70–80% the achievement of such a minimum volume. In order to
of the volume is filled by atoms.180 form as many interdigitations as possible, the individual
The high density of the packing in the interior of a segments of secondary structure are required to assume
molecule of protein is also reflected in its compressibil- preferred orientations with respect to each other. Viewed
ity. The compressibility of the interior of a molecule of in this perspective, packing is a structural force just as the
protein has been estimated from two different interpre- formation of hydrogen bonds between buried donors
tations of the available experimental measurements184,187 and acceptors on the side chains of the amino acids
to be about 20 Gbar–1. This value is intermediate between would be a structural force, but packing is more impor-
those for liquids (CCl4, 100 Gbar–1; C10H22, 105 Gbar–1; tant.
H2O 46 Gbar–1) and solids (ice Ih, 12 Gbar–1; quartz, 2.7 The orientation between two a helices, two sheets
Gbar–1; NaCl, 4 Gbar–1). of b structure, or an a helix and a sheet of b structure can
At first glance, all of these results seem incompati- be assigned an angle W.195,196 The angle W between two
ble with the observation that the partial specific volume a helices is the angle between their two axes (Figure
of a protein (usually 0.72–0.75 mL g–1) can be calculated* 6–23).197 The sign on W is given by the right-hand rule.
quite accurately from the sum of the molar volumes of its Consequently, the angle W in Figure 6–23 has a negative
constituents188,189 because each protein has a unique sign. Because the pattern in which the amino acids pro-
structure. The accuracy of this calculation suggests that trude from an a helix has a 2-fold rotational axis of pseu-
each structure, although it is unique, incorporates the dosymmetry at each position in the a helix (focus on
requirement that its volume be as small as possible. The position i at the right of Figure 6–23), the axis of the
minimization of molecular volume is an important a helix has no direction associated with it and a value of
noncovalent force in the folding of a molecule of protein, W = –50 ∞ is equivalent to a value of W = +130 ∞ . The
and it dictates many of the features of the structure. This angle W between two sheets of b structure is the angle
between the direction of the parallel or antiparallel
strands in one sheet and the direction of the strands in
* This calculation does not treat the constituents as independent
solutes in free solution. In fact, if each side chain were an inde- the other sheet (Figure 6–24).195 The right-hand rule
pendent solute, each of their partial molar volumes would include determines the sign of angle W, and the angle W in Figure
a covolume,190 which is a volume that arises simply because a par- 6–24B is, therefore, negative. No distinction is made
ticular constellation of atoms is an independent molecule dis- between parallel and antiparallel relationships of the
solved in a given solvent. These covolumes are substantial. For strands or the amino- and carboxy-terminal ends of a
water189 the covolume of a solute is 14 cm3 mol–1, and for organic
solvents190 it is 25 cm3 mol–1. Therefore, the sum of the partial given strand of b structure because all combinations of
molar volumes of the components of a protein, were they each sep- these distinct stereochemistries produce almost the
arate molecules in solution, would be significantly greater than its same pattern in which the side chains are distributed
actual partial molar volume. To the extent that its covolume arises across the face of the sheet (Figure 4–16B,C). The angle W
from the fact that a solute is in free solution in a given solvent, the between an a helix and a sheet of b structure is the angle
fact that the partial molar volume of a protein is the sum of the
atomic volumes of its substituents with no added covolume states between the axis of the a helix and the direction of the
that those substituents are not in free solution. This of course is b strands (Figure 6–25D).195
true; they are economically packed into a solid. The most frequently observed angle W between two
Packing of the Side Chains 279
MolScript.573
angle between the two helices is –50 ∞ (Figure 6–23). An
example of the interdigitation that occurs in such situa-
tions is found between two adjacent a helices in the
molecular model of bovine carboxypeptidase A (Figure
6–27).197
There are a number of other values for the angle W
between two a helices that promote less favorable inter-
S61’
S57’
L54’
digitations of the side chains, and examples of all of them
have been observed.197 Because several possibilities exist
and because a helices can tighten or loosen to accom-
modate different angles close to the ideal values, the dis-
F8
tribution of angle W between –90 ∞ and +90 ∞ is fairly
M98
uniform197 with the exception of the striking and sharp
T10
peak at –50 ∞ . For example, in the crystallographic molec-
Q96
ular model of cytidine deaminase, two a helices in the
Q115
interior cross at an angle W of 90 ∞ , but the side chains pro-
truding from them at the interface do not pack together A117
well.198 In the distribution of angles W, however, there is
another preferred angle represented by a broad maxi-
mum in the distribution at +20 ∞ .199 This angle W defines
S61’
W
(– 50∞)
j +7 i +7
j +4 i +4
j +3 i +3
j +7 i +7
j +4 i +4
j flip j +3 i +3 i
j i
j –3 i –4 i –3
j– 3 j –4 i–3
j– 4 j –7 i –7 i–4
rotate
– 50∞
i
j
Figure 6–23: Use of superimposed helical nets196 to describe the contacts at an interface between two a helices.197 (Top) The angle between
two adjacent a helices, i and j, is defined as the angle W between the two axes; its sign is determined by the right-hand rule. (Right) a Helix i
in a vertical orientation is numbered out from its center. The central amino acid is given the designation i; those below are designated by
negative integers, and those above, by positive integers. The relative orientations in which amino acids i – 7, i – 4, i – 3, i, i + 3, i + 4, and i + 7
are distributed can be projected onto a plane tangential to the position of amino acid i. These seven projected points define a unique lattice,
or helical net. For a helix i the lattice is face-up, in the orientation of the original a helix. (Left) a Helix j also defines the same lattice as that
defined by a helix i, but, because a helix j is to be opposed to a helix i, face to face, the helical net for a helix j is flipped over, face-down.
(Bottom) The two helical nets, the one for a helix i face-up and the one for a helix j face-down, are then opposed and rotated with respect to
each other until maximum interdigitation of the lattice points is achieved. The angles at which maximum interdigitation occurs in the heli-
cal nets will be the angles at which maximum interdigitation occurs between the amino acids of the two a helices. Adapted with permission
from ref 197. Copyright 1981 Academic Press.
every position in one of the sequences. Although the sequence,208 three parallel a helices of identical
actual twist of a coiled coil should reflect a tradeoff sequence,209 three parallel a helices of nonidentical
between energy required to tighten the a helix and sequence,200,210 three antiparallel a helices of identical
energy required to bend the a helix into the supercoil, in sequence,211 three antiparallel a helices of nonidentical
the coiled coil of tropomyosin the twist is –3.4 ∞ to –3.9 ∞ sequence,212 four parallel a helices of identical
for each position in the sequence,50,202,203 and in the one sequence,205,213,214 four antiparallel a helices of nonidenti-
from general control protein GCN4 (Figure 6–29)204–206 it cal sequence,215,216 five parallel a helices of identical
is –3.6 ∞ to –3.9 ∞ , values that seem almost too close to the sequence,217 five antiparallel a helices of nonidentical
expected one. sequence,218 and 12 antiparallel a helices of nonidentical
The original coiled coil of a helices predicted from sequence producing a cylinder with a hollow center.219
these geometric arguments contained two parallel There is even an example of a coiled coil of four antiparal-
a helices. The coiled coils formed by tropomyosin and gen- lel a helices that coils around another copy of itself to form
eral control protein GCN4 are coiled coils of two parallel a coiled coil of coiled coils (Figure 6–30).220
a helices of identical sequence. There are, however, exam- That both parallel and antiparallel arrangements
ples of coiled coils of two parallel a helices of nonidentical are observed must follow from the two facts that an
sequence,207 two antiparallel a helices of nonidentical a helix has a pseudo-2-fold axis of symmetry with
A C E
A D
U D W
U D
U D
i +9 i
D U
D U
i +5 i +4
i +1 i +8
D U
B D F
W
U B
UU DD
D U U
E
H
D
DD UU S
U
W<0
Figure 6–24: Opposition of two sheets of b structure.195 (A) The rectangles represent two D
D
b sheets, the twists of which cause their corners to be either below the plane of the rec-
tangle (D for down) or above the plane of the rectangle (U for up). The average direction H
in which the b strands are oriented within the sheet defines a vector for the direction of S
the complete sheet. (B) The angle W between two opposed sheets of b structure is the C W=0
angle between the two vectors in the two respective sheets that are parallel to the strands
of polypeptide composing the sheets. (C–F) Examples of opposed sheets of b structure D
U
from crystallographic molecular models. The positions of the a carbons in the molecular i +12
model of the individual strands of polypeptide are designated by closed circles for the
upper sheet and open circles for the lower sheet. The a carbons for a given strand are con- i +9
i +8
nected by line segments. The crystallographic molecular models from which these four H
examples were drawn are (C) immunoglobulin fragment VREI, (D) superoxide dismutase, i +5 i +4 S
(E) prealbumin, and (F) concanavalin A. Reprinted with permission from ref 195.
Copyright 1977 National Academy of Sciences.
i +1 i W>0
D U
Figure 6–25: Packing of an a helix on a b sheet.195 (A) View down an a helix of the orientations of the amino
acids on one face. (B) The usually observed twist of a b sheet (Figure 6–9) with the corners designated as up
or down as in Figure 6–24. (C) An a helix sitting on the surface of a twisted b sheet. The amino acids on the
lower face of the a helix are in black and designated by number. The displacements of the four corners of
the twisted b sheet are designated by letters. (D) The angle W between the a helix and the b sheet is the angle
Packing of the Side Chains
between the axis of the a helix and the vector parallel to the strands of polypeptide in the b sheet. (E) The
three relationships that are possible between the straight axis of the a helix (H) and the various curvatures
in a twisted b sheet (S). The curvature encountered by the a helix is determined by the value of angle W.
Reprinted with permission from ref 195. Copyright 1977 National Academy of Sciences.
281
282 Atomic Details
1 1
4 4
89
D 301
7 7 85
M
K
8 8 294
81 W
W
297
82 F V
11 11 290
A 293
12 12 T
286 78
T
I
74
15 15 I
18 18
19 19
respect to the emergence of its side chains and that the meridional reflection, representing a repeat of 0.51 nm,
packing of these side chains governs the existence of a had been observed previously222 in the diffraction pat-
coiled coil. The twist in a coiled coil of three a helices is terns of fibers of keratin, myosin, and fibrinogen, and it
–3.0 ∞ to –4.0 ∞ for each position in the sequence of one if is now known that such a reflection is indicative of the
its a helices; that in one of four a helices, –1.9 ∞ to –3.0 ∞ ; coiled coils of a helices in these proteins. The infrared
and that in one of five a helices, –2.6 ∞ ,209,210,213,214,217 but spectra of coiled coils of a helices are also characteris-
the situation seems to be much less constrained for those tic.223
with four and five a helices, for which there are examples The sequence of the polypeptides in any coiled coil
of coiled coils with right-handed supercoiling.221 of a helices can be divided into successive units, or hep-
Crick calculated the diffraction pattern of X-radia- tads, each seven amino acids in length. The first and
tion expected from a macroscopic fiber constructed of fourth amino acid in each heptad (positions a and d in
aligned coiled coils of a helices and was able to explain Figure 6–28) are the most deeply buried amino acids in
why the meridional reflection in the pattern that would the interface between the two or more a helices in the
normally arise from the pitch of 0.54 nm for an untwisted coiled coil (Figure 6–29). These most deeply buried loca-
a helix should shorten to a pitch of 0.51 nm when the tions are isolated from the water surrounding the coiled
a helix becomes twisted into a coiled coil. A prominent coil, and the side chains sequestered there are usually
Packing of the Side Chains 283
N C
f f f'
c c
g e' g e'
a' d a' d
273
265
258
a d' a d'
279
279
b b
265
258
273
f f f
c b
g e
f f
d a
273
265
258
b c
e a d g
d a
249 249
279
g
279
e
265
258
273
c b
f
Figure 6–28: Interaction between a helices in a coiled coil.50,200
(Top) Alignment of two tightened a helices with 3.5 amino acids for hydrophobic amino acids such as leucine, valine,
each turn rather than 3.6. Amino acids from amino-terminal to car- isoleucine, alanine, phenylalanine, tyrosine, and methio-
boxy-terminal are designated as a, b, c, d, e, f, and g; the view is end-
nine.50 The hydrophobic amino acid can also be a cystine
on looking from amino-terminal to carboxy-terminal amino acid.
Every seven amino acids the orientations would repeat, and this as in the antiparallel coiled coil of a helices in
would place the amino acid after amino acid g precisely below carboxypeptidase C from Saccharomyces cerevisiae
amino acid a and so forth. (Middle) The two a helices in the top (Figure 6–19).138 An a helix that has a heptad repeat of
panel are cut along two respective lines normal to the plane of the hydrophobic amino acids is an amphipathic a helix
page and passing through amino acids f and f ¢ and then flattened,
(Figure 6–8). There are a few interesting exceptions to the
one against the other. The two resulting planes are then turned
together –90 ∞ about a vertical axis so that the gray positions end up rule that coiled coils are formed from amphipathic a he-
above the white. This view illustrates the interdigitations of amino lices, such as the chloride ion chelated by five symmetri-
acids a and d. (Bottom) Three tightened a helices running parallel cally arrayed glutamines in the center of the coiled coil of
to each other. In this arrangement also, amino acids a and d can five parallel a helices in extracellular matrix protein
interdigitate.
COMP.217
284 Atomic Details
each other in parallel.
coiled coils are in turn coiled around
antiparallel a helices. These two
built around a coiled coil of four
two polypeptides forms a structure
the complete protein. Each of the
protein. The numbering is for that of
model of this fragment of the entire
the
and crystallized. The drawing is of
expressed as a fragment, purified,
Glycine 26 and Arginine 188 was
segments of polypeptide between
protein containing the two identical
amino acids long. The portion of the
copies of a folded polypeptide 553
complete protein is built from two
Salmonella typhimurium.220 The
accepting chemotaxis protein II from
ligand binding portion of the methyl-
(Bragg spacing ≥ 0.20 nm) of the
crystallographic molecular model
a helices comprising the complete
allel coiled coils of four antiparallel
Figure 6–30: Coiled coil of two par-
and hydrophilic peripherally, such as glutamate, lysine,
arginine, and glutamine,208 that can provide
hydrogen–carbon bonds to cover the hydrophobic side
crystallographic chains in the core before they enter the water fully.
In the crystallographic molecular model of the
coiled coil from transcription factor GNC4, the
hydrophobic amino acids of the heptad repeats interdig-
itate along the interface between the two a helices to
form the hydrophobic core of the structure and to pro-
molecular
duce the supercoil. They and those that flank them pack
so closely together and so efficiently that there is almost
no vacant space in the core of the structure. The two
identical a helices are parallel to each other and packed
in precise register so that each central hydrophobic side
chain packs against its twin from the other a helix. The
139
114
them lies upon the axis of symmetry of the coiled coil, the
two possible hydrogen bonds between the respective
180
58
180
80
58
173
180
173
58
0.7nm
mean interface
0 nm between the
b sheets
– 0.8 nm
32
Val
42
Glu
33
34 70 Phe
Arg Lys 90
92 His
68 44 Glu 31 46
Ile Phe His Ser 74 29
69 Asp Ala 48
Tyr 71 30
Val 91 Lys
93 Val Ala 73
59 Val 28
95 55 116 Ile Val
12 Thr
Phe Leu Tyr 111 49
Leu
118 14 Leu Thr
11 Thr 107 Val 16
105 Ile 117 Val
Pro Ser 53
Tyr
119 13 56 110 Gly
Thr Gly His Leu
112
Val 104 17
Arg Leu
– 0.8 nm 0 nm 0.7nm
Figure 6–32: Packing of the amino acids at the interface between two opposed b sheets in the crystallographic molecular model of prealbu-
min.235 A tracing of the a carbons of the strands in this structure is presented in Figure 6–24E. This is represented diagrammatically in the
upper inset, where the locations of the horizontal sections through the structure are designated by their positions in nanometers relative to
the central section. The sections are normal to the two b sheets, and the strands run approximately normal to the planes of the sections. The
amino acids in the upper sheet (in which the four strands run parallel, antiparallel, parallel, antiparallel) are enclosed in solid lines; those in
the lower sheet (in which the four strands run parallel, antiparallel, antiparallel, parallel) are enclosed in broken lines. The straight lines indi-
cate the orientation of the interface, which twists in a right-handed sense as the sections proceed through the structure. Reprinted with per-
mission from ref 235. Copyright 1981 National Academy of Sciences.
layer of 85, 135, 182, and 234 is sandwiched between that remain circular while the side chains tightly fill the central
of 51, 112, 161, and 212 and that of 53, 114, 163, and 214 cavity, but in b barrels of eight strands, the hyperboloid is
in Figure 6–11). In the top layer of the b barrel in Figure usually flattened into an ellipse to pack the side chains in
6–11, the hydrophobic portions of the side chains pack each layer as tightly together as possible.77 In ribonucleo-
against the layer below, and the four hydrophilic atoms, side-diphosphate reductase from E. coli, a b sheet of five
the nitrogen and the three oxygens, are pointed upward parallel strands antiparallel to a second b sheet of five par-
out of the end of the barrel. allel strands together form a b barrel of ten strands, but it
In b barrels of five or six strands, the radii of the is flattened so that the two respective sheets are opposed
hyperboloids are small enough that the cylinder can to each other across the minor axis of the ellipse.90 This
structure is drifting in the direction of a sandwich of two
close b sheets at an angle W equal to 90 ∞ . Again, an interesting
corner splayed set of exceptions is that of b barrels in which a cavity is
A f e d corner
a
123 98
94 95
36 24
26 Gly
Val Thr Thr 25
61
96
97
His
1.2 nm Thr
Thr
138 Ala 92
28 27
Phe Asn Leu
56 59
121 99
86
0.8 nm 71 63 Tyr Ser
123 Leu Gln 118 58
Leu Thr Asp Pro
38 117 40
100 16 Val
122 Leu Ala 83 Asn
32
Asp Gly 66
Leu
0.4 nm 41
53
84 104 His
37
Val
77 Trp Ala
Asp 111
34 35 28 107 105
Gly Ser Phe Ile
0 nm 41 Gln
118
110 108 106
108 Gln Ser Gln
1.2 nm 0 nm
17
Thr
90 15 16
91 Thr 18
Thr Val Ile 62
Val 88
89 Ser Gly
Val 87 56
Asp 29 Leu Tyr
40
31 86 63 Val 85
Phe Thr Lys 30 120
Asn Leu 39 Phe
121 85 103
Leu 38 101 Trp 65
Leu Val Phe Ala Glu
32 65 83
Asp 120 84 Glu 71 Asn
Leu 102 111 82
39 Gln Val Trp
Phe Gly
Trp
80 81
Ala Ser
71 70
Trp 79
73 Thr 80
75 77 Ser
Ile 72 Asp Ser
Tyr Ser 78
74 Gly
76 Ser
Gly
0.8 nm 0.4 nm
Figure 6–34: Packing of amino acids at the interface between three alternately orthogonal sheets of b structure in the crystallographic molec-
ular model of penicillopepsin.238 In the center of the figure, the a carbon atoms of the molecular model between amino acids 16 and 123 are
connected by line segments to provide a tracing of the polypeptide. The three-layered sandwich is viewed from above and consists of a three-
stranded sheet of b structure (parallel, antiparallel, parallel) on top of a four-stranded sheet of b structure (parallel, parallel, antiparallel,
antiparallel) on top of a second four-stranded sheet of b structure (parallel, antiparallel, parallel, antiparallel). The indicated sections 0.4 nm
apart were cut through the three-layered sandwich in a space-filling representation of the molecular model. The planes of the sections were
horizontal and normal to the page as indicated. The packing between the sheets can be viewed in respective cross sections arrayed in coun-
terclockwise order from top to bottom. Amino acids are numbered, and all amino acids in a given pleated sheet are in either solid outline or
broken outline. Amino acids in hatched outline are not in the sheets of b structure. In the cross sections, the strands of the sheets at the top
and the bottom run parallel to the page while the sheet in the middle runs perpendicular to the page. Reprinted with permission from ref
238. Copyright 1982 American Chemical Society.
required for the function of the protein, such as the cavity The interface (Figure 6–35)243 between three of the
in the middle of the b barrel of retinol-binding protein in a helices and one of the b sheets in lactate dehydroge-
which the retinol is bound.241 In this b barrel, the ligand nase illustrates the fit between an a helix and a twisted
provides enough extra hydrophobic mass that the barrel b sheet in a parallel orientation. Note that the side
can remain circular even though it has eight strands. The chains from the a helices lie upon the gaps between the
b barrel of red fluorescent protein from Discosoma is cir- side chains in the sheet of b structure. Because a sheet
cular even though it is composed of 11 strands because of b structure twists appropriately, the a helices lying
there is an a helix running through its center.242 across its surface parallel to its strands are aligned next
An a helix lying upon a sheet of b structure usually to each other with angles W of about –40 ∞ between
has its axis almost parallel to the strands of the sheet adjacent pairs even when they cleave tightly to the sur-
because the a helix is straight, the sheet is twisted, and a face of the sheet.243 This value for angle W is sufficiently
straight rod can contact a twisted surface only when it is close to the –50 ∞ that produces the most frequently
either parallel or perpendicular to the axis of the twist observed type of interdigitation between two a helices.
(Figure 6–25).195 The angle W observed243 between Therefore, both the interfaces between the a helices
a helices and adjacent sheets of b structure is usually and the sheet of b structure and the interfaces among
around 0 ∞ , and almost all values fall between –20 ∞ and the a helices themselves can exist simultaneously in
+10 ∞ . The exceptions are usually instances in which the almost optimum orientations. It is also possible, how-
angle W is close to 90 ∞ .244,245 In one of these instances, a ever, that the twist of such a sheet of b structure arises
long b sheet of four strands wraps around an a helix245 as from the requirement that the a helices upon it be posi-
one’s four fingers would wrap around a cylindrical rod tioned at the proper angles to maximize the interdigita-
3–4 cm in diameter. This grip is yet another example of tion of their amino acids. For all of these reasons, a
the elasticity of b structure. twisted b sheet sandwiched between two layers of
288 Atomic Details
aG
aB
M262 aC
I261
A258
V40 L41
Q67 x
L 257
M63
A37 L66
V254
D36
G33
L59
V32
y fold
bD over y
y
bE bA bB
bF D53
D53 b C V32 T97
V138 L59
V138 T97 G33 S80
L136 L51
D36
L51 S80 V254
V26 V A37
V95 L66 M63
L136 I161 26
I161 L257 V95
x x
Q67
V40 L41
L93 I78 A258 I78
I24 V49
I134 I261 I24 V
I134 49
M262
N22
N22
Figure 6–35: Schematic formation of the interface between three a helices and a sheet of b structure found in the crystallographic molecu-
lar model of L-lactate dehydrogenase.243 The sheet of b structure is presented in the bottom left of the figure with its strands running verti-
cally in the y direction and the axis of the right-handed twist parallel to the x-axis. Side chains of the amino acids on the upper face of the
sheet are identified and numbered by the amino acid sequence of the protein. The three a helices that will form the interface with the sheet
of b structure are presented in the upper left of the figure with the face that will participate in the final interface directed upward. The axes
of the three helices (aG, aB, and aC) are almost parallel to the vertical y-axis. Side chains that will participate in the interface are identified
and numbered. When the three a helices are rotated 180 ∞ around the x-axis and placed upon the sheet of b structure as they are in the molec-
ular model, the interface is produced by the interdigitation of the highlighted amino acids from the sheet and the highlighted amino acids
from three a helices, respectively. It is these interdigitations that position the three a helices upon the sheet of b structure. Adapted with per-
mission from ref 243. Copyright 1980 Academic Press.
parallel a helices is one of the most common tertiary is b strand, a helix, b strand, a helix and so forth with
structures. extraneous segments of other secondary structure
This arrangement is also the one assumed by the thrown in at random. Even in the hybrid b barrel in
coating on a b barrel. The most common type of b barrel ribonucleoside-diphosphate reductase, the five parallel
is one in which all of the strands run consecutively and in b strands in each of the two antiparallel b sheets forming
parallel (Figure 6–11), and usually within the polypeptide the barrel occur consecutively and are each connected to
connecting the amino-terminal end of one strand to the the next by a segment of polypeptide containing an
carboxy-terminal end of the next (for example, the con- a helix running across the outside of the barrel.246 There
nection between Cysteine 54 and Alanine 83 in Figure is a peculiar variant of this a-helically wrapped b barrel
6–11) there will be an a helix running along the outer sur- in which each of the strands of b structure in the central
face of the b barrel parallel to the b strands of the b bar- barrel is replaced by an a helix to form an a-helically
rel. Consequently, the strands occur consecutively wrapped a-helical barrel.247
around the barrel, and the underlying repeating pattern It is useful to imagine the interior of a molecule of
Packing of the Side Chains 289
protein created by all of these arrangements as a three- the mosaic of the puzzle,257 some large enough to bind
dimensional jigsaw puzzle because this is an image that random hydrophobic ligands.258 When larger amino
emphasizes the interdigitations among the side chains of acids are replaced by smaller ones through a site-
the amino acids driving the various orientations of the directed mutation, an unnatural cavity is formed, and the
secondary structures. The pieces of this puzzle, however, contraction of the structure surrounding the artificially
are neither inelastic nor invariant,248 and there can be created cavity256,259–261 again illustrates the elasticity of
flaws in its mosaic. the puzzle. Although this contraction is usually incom-
The elasticity of the packing of the amino acids in plete, leaving a definite cavity where one was not present
the interior of a protein is most readily demonstrated by before, when Isoleucine 29 in lysozyme from bacterio-
performing site-directed mutation. When Alanine 129 in
lysozyme from bacteriophage T4 is replaced with
leucine, the stability of the protein decreases249 by 6 kJ
mol–1, but its structure is affected only in the vicinity of
the mutation. There it expands, most notably at Leucine
121, in response to the increase in the size of the side
Phe 153
chain at position 129 (Figure 6–36).250 When Valine 30,
Leu 133
located between two b sheets in the core of human
Leu 121
A/L129
transthyretin is replaced with methionine, the b sheets
Met 102
move apart by 0.1 nm to accommodate the consequent
steric effect.251 The usual response to mutations such as
Leu 99
Phe 114
these that increase the volume of matter in the
hydrophobic core of a protein is that the structure
expands in response to the local increases in volume and
the stability of the protein decreases,252 occasionally cat-
astrophically.253 The strain of the increase in size can also
be accommodated by a conformational change of an
adjacent side chain to a significantly different rotamer.249
There is never only one invariant arrangement of
side chains that can solve the problem of filling the space
Phe 153
between segments of secondary structure in the
Leu 133
hydrophobic core of a protein. For example, in the plas-
A/L129
Leu 121
phage T4 is replaced with alanine, the structure sur- (A) Write out this sequence in the same format50 as
rounding the site of the mutation collapses to such a the following diagram of a portion of the
degree that no discernible cavity remains.256 When an amino acid sequence from the coiled coil of
unnatural cavity is formed by site-directed mutation, the a-tropomyosin:
stability of the protein usually decreases.252,259 It has been
proposed that this instability produced upon the muta- Asp Lys Asp Glu Glu Lys Ala
Gln Ser Gly Lys
Asp Leu Asp
tion of a larger amino acid to a smaller suggests that nat- Glu Lys Glu Ala Lys Lys Asp
urally occurring cavities in proteins destabilize their Leu Leu Thr Tyr Ala Ala Ala
structure,127 but it is not possible to extrapolate from the Leu Leu Leu Leu Leu Ala Val
Glu Gln Glu Ser Gln Glu Glu
effects resulting from artificial changes performed by Val Lys Asp Lys Glu Thr Ala
site-directed mutation to the effects of changes pro- Asp Lys Asp Glu Glu Lys Ala
duced by natural selection.
It has been argued that because the ability of a par- In your diagram place the appropriate amino acids from
ticular polypeptide to form a particular tertiary structure the sequence of human epidermal keratin along the
is not drastically affected by extensive replacement of center line as was done in the diagram of the sequence of
amino acids in its core by site-directed mutation248,262 and a-tropomyosin.
because the volume in the interior of a molecule of pro-
(B) What is the role of the amino acids placed along
tein can be filled with a number of different arrangements
the center line?
of the normally available hydrophobic side chains,263 the
packing of the amino acids cannot dictate the tertiary (C) Circle the two amino acids in your diagram that
structure that results when the polypeptide folds. Such do not seem to fit this role.
arguments, however, ignore the fact that it is the overall
(D) How may they be excused?
pattern in which the side chains emerge from the sec-
ondary structures, not the identity of those side chains,
that dictates the values of the angles W and hence the ter- Problem 6–5: In the coiled coil of a helices shown in
tiary structure. The fact that the details of the packing Figure 6–29, why do Lysine 263 and Glutamate 268 and
beyond these dictations display such tolerance is not Lysine 275 and Glutamate 270 form hydrogen bonds?
remarkable because it has long been known that evolu-
tion by natural selection frequently performs similar Problem 6–6: The drawings on the next page of three
replacements. From a consideration of the logic of the crystallographic molecular models265–267 illustrate
interdigitations that are observed in naturally occurring aspects of the packing between segments of secondary
proteins, it can be concluded that it is such interactions structure. These drawings were produced with
among the side chains that produce the relative orienta- MolScript.573 Discuss each molecular model separately
tions assumed by the secondary structures and that these and describe the points illustrated by each in turn.
orientations are crucial to creating the tertiary structure
of a protein.
Water
Suggested Reading
About 40–70% of the volume of a crystal of protein is occu-
Chothia, C., Levitt, M., & Richardson, D. (1977) Structure of pro-
teins: packing of a-helices and pleated sheets, Proc. Natl. Acad.
pied by water.268 It fills the large vacant spaces among the
Sci. U.S.A. 74, 4130–4134. folded polypeptides. The majority of the molecules of
Word, J.M., Lovell, S.C., LaBean, T.H., Taylor, H.C., Zalis, M.E.,
water in a crystal of protein are liquid and disordered
Presley, B.K., Richardson, J.S., & Richardson, D.C. (1999) over the time required to collect a data set. Regardless of
Visualizing and quantifying molecular goodness-of-fit: small- the degree of refinement or the minimum Bragg spacing,
probe contact dots with explicit hydrogen atoms, J. Mol. Biol. the regions containing this disordered water remain fea-
285, 1711–1733. tureless and have a mean electron density similar to that
Baldwin, E., Xu, J., Hajiseyedjavadi, O., Baase, W.A., & Matthews, of liquid water.269 These regions of the unit cell are treated
B.W. (1996) Thermodynamic and structural compensation in as solids of uniform electron density that have the shape
“size-switch” core repacking variants of bacteriophage T4 of the disordered regions peculiar to the particular crys-
lysozyme, J. Mol. Biol. 259, 542–559.
tal. These irregular solids can be used for refining phases
by solvent flattening and are always incorporated as such
Problem 6–4: The following is a segment of amino acid into the molecular model of the unit cell from the first
sequence from the coiled coil region of human epider- cycle of the refinement because when these water-filled
mal keratin:264 lacunas are added explicitly to the molecular model, the
T A A E N E F V T L K K D V D A A Y M N K R-factor decreases significantly.269
V E L Q A K A D T L T D E I N F L R A L Y During a refinement, maps of difference electron
D A E L S Q M Q T density are frequently calculated. In addition to the large
Water 291
Problem 6–6 A
spaces filled with disordered water, small discrete peaks were formed, which is usually a concentrated solution of
of positive electron density become regularly recurring ammonium sulfate, poly(ethylene glycol), or a smaller
features of these maps. Because no reasonable glycol. Sulfate is an ion with a large number of electrons,
rearrangement of the atoms of the molecular model of and any peaks of electron density representing sulfate
the protein is able to erase these features and because can usually be recognized with little difficulty.30
they are unaccompanied by adjacent peaks of negative Molecules of glycols or other polyols are also easily rec-
electron density indicative of a misalignment of the ognized. The ammonium cation, however, is indistin-
molecular model, these peaks are assumed to represent guishable in its electron density from a molecule of
either individual molecules of water or individual mole- water, but proteins at neutral pH rarely bind many
cules of solutes from the solution in which the crystals cations so it is usually assumed that the smaller isolated
292 Atomic Details
with the backbone and side chains of the surrounding amino acids.
bulk water by the protein and participate in hydrogen bonds only
locations for molecules of water are completely cut off from the
culated from the model produced by these five cycles. These three
map of difference electron density is for Fo – Fc, with Fc, and ac cal-
omission was submitted to five additional cycles of refinement. The
water were omitted from the final molecular model, which after the
electron density presented is an omit map. The three molecules of
for molecules of water at an early cycle in the refinement, but the
from Ustilago sphaerogena.270 The peaks were assigned as locations
molecular model (Bragg spacing ≥ 0.18 nm) of ribonuclease U2
positive electron density buried deeply within the crystallographic
sion from ref 8. Copyright 1986 Academic Press. (B) Three peaks of
acyl oxygens of the same two amino acids. Reprinted with permis-
nitrogen–hydrogens of Isoleucine 193 and Isoleucine 211 and the
itself. The molecule of water is hydrogen-bonded to the amido
prominent. The molecular skeleton is the refined molecular model
ference electron density is for (2Fo – Fc) with ac, so the peak is
(Bragg spacing ≥ 0.20 nm) of deoxyribonuclease I.8 The map of dif-
density, within the interior of the crystallographic molecular model
for a molecule of water, unconnected to any other peak of positive
electron density (designated by the cross) assigned to the location
graphic molecular models of proteins. (A) A single peak of positive
Figure 6–37: Molecules of water buried in the interior of crystallo-
A
Asp 18
water. The reason for the generous range is that the mag-
Thr 21
Asp 18
have been included in ac. These more solid peaks are only
Asn 68
Thr 21
Ile 23
lution.
A portion of the peaks of electron density assigned
Water 293
as representing molecules of water in a map are observed of protein, not the molecules of water themselves. There
in the same locations in maps of electron density for the are also locations for molecules of water on flexible por-
same protein in different crystals or for the same protein tions of the protein that change their conformations so
from a different species, and such locations are consid- widely that the locations for those molecules of water
ered to be conserved.272 It is assumed that a conserved cannot appear in the map of electron density.
location is occupied consistently, in particular, when the One of the more unexpected observations has been
protein is in solution rather than in the crystal. Molecules the discovery of molecules of water buried in the inte-
of water in the map that are not conserved are assumed rior of proteins with no direct contact with the solvent.
to be peculiar to that protein in that crystal, and such These occur as single forlorn molecules (Figures 6–37A
locations may or may not be occupied to a significant and 6–39);8,280 or as small clusters of two or more mole-
extent when the protein is free in solution.
There are different degrees of conservation. Peaks
representing molecules of water may be found at the
same locations in two different molecules of the protein
in the same unit cell. For example, 25 positions in thiore-
doxin from E. coli,273 46 positions in cytochrome b562
from E. coli,99 and 26 positions in b-lactamase from
E. coli274 are occupied by molecules of water in both of
the crystallographic molecular models of the respective
protein in the same unit cell. Peaks representing mole-
cules of water may be found at the same locations in
the same molecule of protein in different crystals. For
example, 30 positions for molecules of water in ribonu-
clease T1 are conserved in four different crystals of the
protein.275 They may be found at the same locations in
crystallographic molecular models of the same protein
from different species. For example, a string of five mol-
ecules of water is found at the same locations in the inte-
riors of crystallographic molecular models of both
cytochrome f from Phormidium laminosum and
cytochrome f from Brassica rapa.276 They may even be
found at the same locations in different but related pro-
teins. For example, two positions for molecules of water
are found at the same locations in crystallographic
molecular models of ferredoxin–NADP+ reductase,
phthalate-dioxygenase reductase and a fragment of
nitrate reductase (NADH).277
The molecules of ordered water included in the
final refined molecular model surround the molecule of
protein, fill deep clefts in its surface, and are found in its
interior (Figure 6–38).98 They represent locations in the
actual molecule of protein in the crystal that are consis-
Figure 6–38: Locations for molecules of water in
the crystallographic molecular model (Bragg
spacing ≥ 0.18 nm) of penicillopepsin.98 At
various cycles of the refinement of this crystallo-
graphic molecular model, it was decided that
certain members of the array of as yet un-
assigned peaks of positive density in the map of
difference electron density were locations for
molecules of water, and a molecule of water was
placed at each of these positions in the model.
The 319 unique locations for molecules of water
so positioned are designated in the figure with
open circles (oxygen atoms) relative to the
polypeptide backbone without the side chains.
The drawing of the crystallographic molecular
model is presented in the same orientation as
that in Figure 4–17. This drawing was produced
S73
S73
B
Y165 T19 Y165 T19
F32 T9 F32 T9
Y14 Y14
F15 F15
N44 N44
result of the fact that they surround a hydrophobic side water; and glutamines, a mean of 0.7 water bound to
chain.291 If such networks around hydrophobic amino their side chains. More than 50% of the arginines are dis-
acids are present in the crystal but are not pinned to the ordered, but those that can be observed have 1.5 waters
same locations in each unit cell or if they are rearranging bound.
continuously while the data set is being gathered, they When a set of 16 crystallographic molecular models
would not be seen in the crystallographic molecular from data sets gathered to Bragg spacings of 0.17 nm or less
model. If the charged functional groups on the surface of were examined,295 side chains that had two or three het-
a protein are surrounded by spheres or semispheres of eroatoms that can participate in hydrogen bonds (aspar-
hydration, as the paradigm associated with the hydration tate, arginine, glutamate, histidine, and glutamine)
of spherical ions suggests (Figure 5–9), the molecules of frequently (>70%)295 had one or more waters bound to
water in these shells of hydration are not pinned, them, while those with only one (tyrosine, tryptophan,
because no indication of their presence is seen in the threonine, and lysine) less frequently had water bound to
maps of difference electron density. Charged functional them (60–70%). Asparagine (61%) and serine (51%) fall out
groups on the side chains of the amino acids, however, of these ranges, probably because they are often hydrogen-
often have one or two molecules of water forming hydro- bonded to the backbone.
gen bonds to their donors and acceptors. All of these It is the molecules of water hydrogen-bonded to the
points reemphasize the fact that only specific locations donors and acceptors of these side chains that produce
occupied by molecules of water for long periods of time the sharp maximum at around 0.29 nm in the solvent
appear as distinct features in maps of electron density. distribution function around a molecule of protein.296
Of the water molecules that are attached directly to The hydrophobic side chains (methionine, alanine,
the molecule of protein in a refined crystallographic phenylalanine, isoleucine, leucine, and valine) much less
molecular model, about two-thirds make only one frequently (10–30%) have fixed locations for molecules of
hydrogen bond to the protein269,294 and one-third make water adjacent to them, but these side chains are usually
two or more hydrogen bonds. The mean number of buried in the interior of the protein. Those hydrophobic
hydrogen bonds between molecules of water in this first groups that do have fixed locations for molecules of
layer of hydration and the protein is 1.7.98 The average water adjacent to them are surrounded by a layer of
distance between one of these waters and a donor nitro- water in which the centers of the oxygen atoms are about
gen is 0.29 ± 0.02 nm and between one of these waters 0.4 nm from the centers of the carbon atoms of the side
and an acceptor oxygen is 0.29 ± 0.02 nm.98,269 Of these chains,296 as expected from their van der Waals radii
water molecules directly bound to protein, 42% act as (Table 6–3).
donors to acyl oxygens of the polypeptide backbone, 16% There are a number of physical measurements
act as acceptors from nitrogen–hydrogen bonds of the which register the fact that each molecule of protein in
polypeptide backbone, and 42% are in hydrogen bonds solution has water bound to it in an irregular network cre-
to donors and acceptors on the side chains.8,98,269 In the ating a shell of hydration. The molecules of water in this
crystallographic molecular model of lysozyme,269 of the shell of hydration differ from the molecules of water in
waters bound to functional groups of side chains on the the bulk of the solution away from the molecule of pro-
surface of the protein, 24% were donors to carboxylates, tein in several of their physical properties. For example,
13% were donors to primary amides, 13% were acceptors neutron scattering has revealed that the layer of hydra-
for primary amides, 14% were acceptors for primary alkyl tion immediately adjacent to the surface of a molecule of
ammoniums, 14% were acceptors for guanidiniums, 14% protein has a density about 10% greater than that of the
were hydrogen-bonded to alkyl hydroxyls, and 6% were bulk water.297 From a dissection of the compressibility of
hydrogen-bonded to phenolic hydroxyls. this layer of hydration, it has been concluded that it
The disordered side chains in a crystallographic contains extensive hydrogen-bonded networks184 similar
molecular model are usually at its surface in the most to those observed in crystallographic molecular
accessible locations. Because they are at the surface, they models.286,291 While it is unable to distinguish the water in
are usually hydrophilic and are probably even more this layer from water in bulk solution because its rate of
hydrated than the ordered side chains. Any molecules of relaxation is too fast, nuclear magnetic resonance is able
water bound to these disordered side chains are never to detect the buried waters in a molecule of protein
seen in the crystallographic molecular model. Therefore, because their rates of relaxation are so much slower.298
the mean number of waters bound to each side chain is Each of the molecules of water surrounding a mol-
probably an underestimate of the actual values. ecule of protein at a given instant is in a different situa-
Nevertheless, of the side chains to which bound water tion (Figure 6–38), and the relationship between each
can be assigned in lysozyme,269 aspartic acids have a one of these molecules of water and the molecule of pro-
mean of 2.0 waters; lysines, a mean of 1.8 waters; tein depends upon the respective situation. It is the con-
asparagines, a mean of 1.6 waters; glutamic acids, a tribution of each one of these molecules of water to the
mean of 1.5 waters; threonines, a mean of 1.2 waters; statistical behavior that produces the value of the physi-
tyrosines, a mean of 1.0 water; serines, a mean of 0.7 cal property measuring the hydration of the protein, dH2O
Water 297
[grams of H2O (gram of protein)–1]. Each contribution alternating electric field used to measure that relative
will be a unique function of the situation of the respec- permittivity becomes greater than the ability of the mol-
tive molecule of water, and the physical measurement ecules of protein to reorient in response to its alter-
will be only an average over all of these contributions. ations,311 and another discontinuous decrease is
Mathematically, this heterogeneity in the situations of observed when the frequency becomes greater than the
the waters participating in the shell of hydration can be ability of the water in the bulk solvent to reorient.
expressed as a weighted mean:299 Between these two extremes, there is a third dielectric
relaxation that is assigned301 to the waters of hydration
M H2O n bound to the protein. These waters have dielectric relax-
dH
2O
= ∑ wi (6–4) ations 10–100-fold slower than the waters in the bulk
Mp i =1 solution. From the spectrum of these dielectric relax-
ations, the concentration of these relatively immobilized
where MH2O is the molar mass of water (18.0 g mol–1), Mp molecules of water and hence the amount of water
is the molar mass of the protein, and the sum is over a set bound to the protein can be calculated. These molecules
of statistical weights wi. of water, however, are not fixed to the protein, or they
There are two ways to think of the meaning of the would be required to rotate with it, and their dielectric
statistical weights wi. It can be assumed that there are n relaxation would be indistinguishable from that of the
sites for the binding of water molecules around the pro- protein itself.
tein, the positions of which move through the solution in Solid powders of dry protein always have water
lock step with the protein. The statistical weight wi for a incorporated in them and the amount of this water of
given site is then the occupancy of that site, which is the hydration can be chemically determined. A more sys-
fraction of the time that the site is occupied by a mole- tematic approach is to equilibrate the dry powder, either
cule of water.299 It is also possible to consider all n mole- as a precipitate, as a microcrystalline solid, or as visible
cules of water in the vicinity of the protein at a given crystals, with air of a certain relative humidity. It has
instant. The statistical weight wi in this case expresses the been proposed that air at 90% relative humidity is the
degree of influence the molecule of protein has over the appropriate choice.303 Below this value the powders tend
behavior of water molecule i. When wi = 1, the location to become glasses,303 and above this value they become
occupied by water molecule i is fixed, as if covalently, to hygroscopic. The amount of water bound by a solid
the molecule of protein. When wi = 0, the water mole- powder of a given protein at 90% relative humidity can be
cule i is uninfluenced in its behavior by the presence of taken as its hydration.
the molecule of protein. If it is assumed that the water hydrating a protein is
Under no circumstances should the layer of hydra- entirely unable to dissolve salting-out solutes that are
tion surrounding a molecule of protein be pictured as a otherwise freely soluble in water,304 the negative of the
uniform layer clearly distinguished from the water in the preferential solvation of a particular protein in a partic-
bulk of the solution by some discontinuous boundary. ular solution (Equation 1–57) can be multiplied by the
Rather, the layer of hydration gradually fades from fixed molarity of the water in that solution and the molar mass
defined locations for molecules of water adjacent to the of water to obtain a value for the grams of H2O (gram of
surface of the molecule of protein to molecules of water protein)–1 in the layer of hydration. To perform measure-
distant from the surface that are only marginally affected ments of preferential solvation, a solution of the protein
by its presence. is usually brought into equilibrium with a solution con-
It is almost always the case that physical measure- taining only water and the salting-out solute, for exam-
ments of hydration yield a simple number, dH2O, the ple, glucose,304 lactose,304 or sucrose305 (Table 6–4). It is
grams of water bound for every gram of protein (Table also possible to equilibrate crystals of a protein with solu-
6–4). It is not surprising that this number varies with the tions of salting-out solutes and from the dependence of
method used to obtain it, as more or less of the molecules the density of the crystal on the density of the solution to
of water surrounding the protein differ more or less from determine the amount of water in the crystal that
the water in the bulk solvent in the particular behavior excludes the solute.299,302,306
measured by the particular procedure. When a solution of protein is frozen, the water of
The self-diffusion of water decreases when protein hydration freezes below the freezing point of the water in
is added to the solution,310 and this decrease can be the bulk solution. Not until the temperature is lowered to
explained if it is assumed that the water of hydration, below 180 K does it all become frozen.312 For example, at
being less mobile than the water in the bulk phase, does –3 ∞C, 0.51 g of water (g of protein)–1; at –5 ∞C, 0.46 g of
not participate significantly in self-diffusion. With this water (g of protein)–1; and at –7 ∞C, 0.41 g of water (g of
assumption, the amount of water bound to the protein protein)–1 remained unfrozen in a solution of ovalbu-
can be calculated. min.313 Unfrozen water is more mobile than frozen
The relative permittivity of a solution of protein water, and the two can be distinguished by nuclear mag-
decreases discontinuously when the frequency of the netic resonance.307,312 The amount of unfrozen water in a
298
Atomic Details
Table 6–4: Hydration of Proteinsa
scattering of
protein self-diffusion300 dielectric solid at NMR frozen X-radiation at
of H2O18 relaxation301 RH = 90%302,303 sugar299,304,305 (NH4)2SO4299,302,306 solution307 diffusion308 viscosity308 small angles309
a
All units are grams of water (gram of protein)–1.
Water 299
frozen solution of protein at 238 K (–35 ∞C) has been des- acids. From the definition of accessible surface area
ignated as water of hydration. (6–12) it follows that a molecule of water, held by hydro-
Upper limits on the amount of water that migrates gen bonds at 0.28 nm from its nearest neighbors, can
with a molecule of protein through the solution can be cover about 0.07 nm2 of accessible surface area if waters
calculated from the frictional coefficient.308 The radius are assumed to pack in hexagonal array or 0.09 nm2 if
of a hard sphere the same volume as a molecule of pro- they are in a tetrahedral lattice (Figure 5–2). This means
tein can be calculated from its molar mass and partial that there are about 7 waters (amino acid)–1 immediately
specific volume, and the radius of the sphere that would adjacent to the surface of a protein containing 100 amino
have the same frictional coefficient as the molecule of acids and 4 waters (amino acid)–1 immediately adjacent
protein can be calculated with Equation 1–66. The latter to the surface of a protein containing 2000 amino acids.
sphere is always larger than the former. If it is assumed These limits would be equivalent to 1.2 and 0.7 g of water
that the entire difference in volume is water forced to (g of protein)–1, respectively. For the proteins gathered in
move with the molecule of protein, an upper limit to the Table 6–4, which all contain less than 600 amino acids,
amount of bound water can be calculated (Table 6–4). It the span would be 1.2–0.9 g of water (g of protein)–1.
is an upper limit because molecules of protein are not Therefore, the water of hydration determined by physical
spheres and a particle with the same volume as a given measurements is considerably less than the amount of
sphere but a different shape will always have a larger fric- water required to cover the surface of a molecule of pro-
tional coefficient than that sphere. How much of the dif- tein with a continuous rigidly fixed layer.
ference between the two radii is due to hydration and Part of the reason for this discrepancy may be that,
how much to differences in shape has never been ascer- as with the networks of water covering the surface of a
tained unambiguously for any protein. The numbers tab- molecule of protein in a crystallographic molecular
ulated are not intended to be estimates of hydration, but model, the layer of hydration is patchy and discontinu-
upper limits of the hydration. ous286,291 but the heterogeneity that must exist among
It is also possible to estimate the hydration of a pro- the waters of hydration is probably most of the reason.
tein from the scattering at small angles of X-radiation Some waters at the surface are held tightly (wi @ 1.0), but
from a solution of that protein as a function of the angle most are only loosely influenced by the protein (wi < 1)
of that scattered radiation.309 and contribute only partially to the weighted mean
There are several remarkable features of this tabu- (Equation 6–4). Therefore it is not surprising that the
lation (Table 6–4). The values for bound water are all sim- weighted mean is less than the limit calculated by simply
ilar, and each technique produces values that, although counting every immediately adjacent molecule of water
they do not agree, are in the same range (0.2–0.4 g g–1), and presuming it to be fixed to the protein. The range
which is about 2 mol of water (mol of amino acid)–1. There over which the amount of immediately adjacent mole-
seems to be no significant difference in the amount of cules of water (0.9–1.2 g g–1) varies among the proteins of
bound water for every gram of protein over a 5-fold range the size of those contributing to Table 6–4 is narrow, and
in size of the proteins, between ribonuclease (naa = 124) this fact explains why all of the proteins seem to have
and serum albumin (naa = 581). For a small protein such about the same degree of hydration, within the variation
as lysozyme (naa = 129) or dihydrofolate reductase of the measurements.
(naa = 162), these results indicate that there should be The molar concentration, and hence the thermody-
200–300 molecules of bound water for every molecule of namic activity, of the water in the bulk phase of a solu-
protein. In a crystal of lysozyme, 140 molecules of water tion of protein can be changed without changing its
had locations that were sufficiently distinct to be incor- concentration in the layer of hydration by adding a salt-
porated into the refined molecular model.269 In a crystal ing-out solute such as sucrose, triethylene glycol, diox-
of dihydrofolate reductase, 264 molecules of ordered ane, stachyose, or poly(ethylene glycol) that is excluded
water had sufficiently distinct locations to be incorpo- from the layer of hydration.315–317 Because the water in
rated in the refined molecular model.27 Whether these the layer of hydration is in rapid equilibrium with the
ordered molecules of water bear any relation to the bound water in the bulk phase, changing the activity of the
water detected by the physical measurements is uncer- water in the bulk phase changes its activity in the layer of
tain. hydration, and this change affects any chemical reaction
There is a highly significant correlation between the in which the amount of hydration of the protein changes.
accessible surface area of a crystallographic molecular As one might expect, the binding of substrates to an
model of a protein and the total number of amino acids enzyme,316,318 the binding of a protein to DNA,315 a large
it contains, regardless of whether it is a monomer or an conformational change of a protein,317,318 or the binding
oligomer.314 As a result of this correlation, the mean of one protein to another causes significant changes in
accessible surface area for each amino acid falls gradu- hydration. From the magnitude of the effect of changing
ally and monotonically from 0.53 nm2 (amino acid)–1 the concentration of water on the dissociation constant
when the protein contains 100 amino acids to 0.30 nm2 for these reactions, the number of molecules of water
(amino acid)–1 when the protein contains 2000 amino removed from or added to the layer of hydration during
300 Atomic Details
the reaction can be estimated. These range from 9 mole- probably have been even smaller at the physiological
cules of water for binding of a substrate to a hydrated ionic strength of 0.15 M.
active site316 to 60 molecules of water for a significant There are experimental results suggesting that the
conformational change.317,318 In the latter transforma- charge of the amino acids on the surface of a protein may
tion, a portion of the water detected as leaving the shell electrostatically increase the rate of association323,324 or
of hydration is thought to be molecules beyond the first increase the equilibrium constant for association325–328 of
layer. ligands that bear an opposite charge. These effects, how-
ever, are rarely more than a factor of 2 (–1.7 kJ mol–1) at
Suggested Reading physiological ionic strengths, and often the ionic
strength of the solution must be lowered to observe them
Blake, C.C.F., Pulford, W.C.A., & Artymiuk, P.J. (1983) X-ray studies
of water in crystals of lysozyme, J. Mol. Biol. 167, 693–723.
at all,326 so they can be of little consequence biologically.
The acid dissociation constants of the individual
Problem 6–7: Assign the hydrogens to donors and accep- acid–bases of the side chains on the surface of a protein
tors in Figures 6–37B and 6–40A,B. are shifted by the elementary charges on the amino acids
in their vicinity. For example, it has been shown that the
pKa of Histidine 64 in subtilisin BPN¢ decreases by 0.26
unit when Aspartate 99 is mutated to a serine329 because
Ionic Interactions when the elementary negative charge of Aspartate 99 is
no longer in its vicinity, the stability of the histidinium
Almost all of the charged amino acids—glutamate, ion decreases relative to that of the neutral histidine. All
aspartate, histidinium, lysinium, and arginine—are such pairwise interactions are tautomeric because if one
found on the surface of the crystallographic molecular side chain shifts the pKa of another, then the other side
model of a protein, so that they retain their hydration. chain must shift the pKa of the first. If the negative charge
Aside from the few that have roles as acids and bases in of Aspartate 99 shifts the pKa of Histidine 64, the positive
the function of the protein, the reason that these charged charge of Histidine 64 must shift the pKa of Aspartate 99.
amino acids are present on the surface of a protein is to A large constellation of such tautomeric interactions
permit it to dissolve in water at high concentrations. For determines the individual acid titrations of the side
example, the concentration of hemoglobin in an erythro- chains on the surface of a protein.
cyte is 0.3 g mL–1. The acid–base titration curve of a native protein
The distribution of these elementary charges on (Figure 1–11) is the summation of the individual titra-
the surface of a molecule of protein seems to be random tions of the accessible acid–bases on its surface. For
with little regard for the signs of the elementary charges every 100 amino acids, a normal protein contains about
and no attempt to compensate the charges. Patches of five aspartic acids, six glutamic acids, two histidines,
positive charge and patches of negative charge are as three tyrosines, and six lysines.330 The aspartic acids (pKa
common as regions where the charges are evenly = 4.0) and glutamic acids (pKa = 4.4) account for most of
divided. Changing these distributions seems to have little the dissociation of protons between pH 2 and 5.5. The
effect on the stability of the protein.319 The lysozyme lysines (pKa = 10.4) and tyrosines (pKa = 9.8) account for
from bacteriophage T4 has an excess of nine elementary most of the dissociation of protons between pH 8 and 12.
positive charges over elementary negative charges at These are the two major features of the titration curve of
neutral pH. When lysines on its surface were changed to a protein because these four amino acids account for the
glutamates by site-directed mutation to produce a majority (90%) of the acid–bases present in the protein.
number of single, double, triple, and quadruple mutants The histidines account for most of the small amount of
in which the net charge number decreased from +9 to +7, dissociation that occurs between pH 5.5 and 8.
+5, +3, and +1, respectively, the mean change in the free As the pH is decreased below the isoelectric point,
energy of folding of the protein was +3.3 ± 2.9 kJ mol–1.320 a protein gains net positive charge number as each
Consequently, the protein decreased slightly in stability proton associates, and as the pH is increased above the
rather than increasing in stability as its excess charge was isoelectric point, the protein gains net negative charge
neutralized, and the magnitudes of the individual number as each proton dissociates. This change of net
decreases in stability showed no correlation with the charge number with decreases and increases of pH
magnitude of the decrease in charge. Increases in stabil- causes the addition of each successive proton or the
ity of the same magnitude (–4 to –8 kJ mol–1), however, removal of each successive proton, respectively, to be
have been observed upon neutralizing imbalances of more difficult. The reason for this is that the gathering of
charge on the surfaces of ubiquitin321 and the subunit- net charge number on a molecule, even one as large as a
binding domain of dihydrolipoyllysine-residue acetyl- protein, is an unfavorable reaction relative to dispersing
transferase from Bacillus stearothermophilus.322 All of those elementary charges evenly throughout the solu-
these experiments were performed at an ionic strength of tion. Because tautomeric interactions are themselves
0.05 M, so the differences in stability observed would electrostatic, the effect of the resulting charge on the
Ionic Interactions 301
overall acid dissociation of the molecule of protein is the large number of carboxylates. The moles of tyrosine
simply the summation of all the tautomeric interactions the ultraviolet absorption of which displays the expected
among all the side chains, the individual titrations of shift between pH 9 and 11 upon formation of the pheno-
which produce the complete titration curve. late anion is often less than the total moles of tyrosine
That the electrostatic work of creating this charge present in a mole of the protein. For example, only four
shifts the observed titration curve is easily demonstrated of the six tyrosines in ribonuclease can be titrated331,333
by changing the ionic strength (Figure 1–11). An increase and only two of the four tyrosine side chains of chy-
in ionic strength shrinks the layer of counterions around motrypsinogen can be titrated331 within accessible
each individual, charged amino acid in the protein ranges of pH. With most proteins, values of pH greater
(Equation 1–71) and causes them to exert a decreased than 11 are inaccessible, so it can be said only that each
effective electrostatic charge in their influence on neigh- of these missing tyrosines has a pKa greater than 11.
boring acid–bases undergoing titration. This in turn Both the decreases in the values of pKa for the his-
decreases the electrostatic work that must be performed tidines and the increases in the values of pKa for the
to create charge on the neighboring acid–bases and shifts tyrosines implied or demonstrated by these results are
the titration curve closer to the curve that would have reasonable. If these shifts in pKa are due to burying the
been seen if each acid–base titrated only according to its side chains in the interior of the protein, even though
intrinsic pKa. This electrostatic shielding due to they remain accessible to the solvent and capable of
increased ionic strength produces a steepening of the acid–base reactions, their neutral forms should become
titration curve for the protein both below and above its more stable relative to their charged forms. In most
isoelectric point (Figure 1–11). cases, the missing acid–bases in a titration curve are
It is possible to correct roughly331 for the electrostatic assumed to be buried in the interior of the folded
work involved in creating charge on the molecule of pro- polypeptide. Such buried acid–bases can be seen in the
tein by assuming that in a given region of the titration crystallographic molecular models of proteins. For
curve, for example, between pH 2 and 5.5, only one type example, in the crystallographic molecular model of
of acid–base is titrating and all of the members of this set ribonuclease, Tyrosine 25 is almost completely buried
have the same intrinsic pKa, pKa,int. Then it is assumed that (the solvent accessibility of its phenolic oxygen is only
the charge on the molecule of protein, Qi, is proportional 0.02) and Tyrosine 97 is completely buried.334 It has been
to the mean net proton charge number, 6 H,i, and that assumed that these are the two tyrosines in the native
pKa,int, which is proportional to a free energy, is shifted protein that do not participate in acid–base titrations.
arithmetically by the electrostatic work, which is a free These examples of buried tyrosines or histidines are
energy and which should be proportional to 6 H,i. special cases of the fact that polar amino acids are found
The values of the intrinsic acid dissociation con- in the interior of a protein, even ones that are normally
stants obtained by these corrections for electrostatic charged. For example, Arginine 30 in the crystallographic
work agree with expectation (Table 2–2) to a certain molecular model of xylose isomerase from Anthrobacter
extent. The value of pKa,int for the carboxyl groups in sev- strain B3728 is completely surrounded by both the back-
eral proteins the titration curves of which between pH 2 bone and mostly carbon–hydrogens of other side chains
and 5.5 have been analyzed in this way 331 are between 4.0 (Figure 6–41).112 It does not form an ion pair with any
and 4.8. The titration of tyrosine side chains in a native anionic side chain. Instead, its five donors form hydro-
protein can be followed independently by using the large gen bonds with four acyl oxygens from the backbone and
difference in ultraviolet absorbance between the phenol a molecule of water. Because one of the hydrogen bonds
and the phenolate anion to calculate fA and fHA.332 The is to a molecule of water, it cannot be determined
values of pKa,int, corrected for electrostatic work, for whether or not the guanidino group is positively charged.
tyrosines in several proteins331 are between 9.4 and 10.8. Aspartate 76 is buried in the interior of ribonuclease T1 of
The contribution of tyrosine to the titration curve A. oryzae335 and a cluster of three glutamates, two his-
between pH 8 and 12 can then be deducted from the over- tidines, and an aspartate is buried in the interior of the
all curve, and values of pKa,int for the lysines in these same iron free form of the R2 protein of ribonucleoside-
proteins can be calculated. They lie between 9.8 and 10.4. diphosphate reductase.336
The titration curves of proteins usually fail to meet The amino acids that are charged at neutral pH are
expectations in one key aspect. There are usually too few either the anionic conjugate bases of neutral acids, for
acid–bases contributing to the titration.331 The deficit is example, carboxylates, or the cationic conjugate acids of
most easily noticed in the case of histidine and tyrosine. neutral bases, for example, ammonium cations. It may be
The number of moles of protons dissociating from a the case that when such an amino acid is buried, it is
mole of protein between pH 5.5 and 8 is often less than buried as the neutral acid or the neutral base, respec-
the moles of histidine in a mole of that protein. This tively. This conclusion follows from the fact that the far-
deficit can be explained by assuming that the values of ther the pKa of a normally charged amino acid is from 7.0,
pKa for one or more of the histidines have been lowered the less likely it is to be buried (Table 6–2). Such a buried
and that their titrations have become buried in those of amino acid usually participates in a set of hydrogen
302 Atomic Details
T16
D23
teric but has a different pattern of donors and acceptors,
W19
such as an asparagine for an aspartate, usually produces
a less stable protein.335 From examining closely the con-
T29
P295
V304
S296
D23
W19
F17
W11
F17
F97’
R13
F90’
Ionic Interactions 303
S143
G139
unknown on which atom the proton resides. An estimate
C137
of the effect of the relative permittivity on the location
of the proton in an ionized hydrogen bond has been
made,343 and it was concluded that the relative permit-
R102
D142
T106
T107
tivity of the surroundings would have to be less than that
S103
of CCl4 (er = 2.2) before the shift of the proton from an
ammonium cation to a carboxylate anion in the ionized
Y109
hydrogen bond between them
G108
O) ( O
O HN OH N
O
HH 1 HH
(6–5)
O O
O
O
O
O
S143
G139
C137
D142
T106
among Arginine 31, Glutamate 36, and Arginine 40 in the acids occur in the usual protein and the number of equiv-
Arc repressor were replaced with hydrophobic interac- alent donors or equivalent acceptors present on each
tions among a methionine, a tyrosine, and a leucine, amino acid. If anything, ionized hydrogen bonds are
respectively, the mutant that resulted was –16 kJ mol–1 observed less frequently than predicted by this calcula-
more stable than the wild-type protein.348 Why buried tion of probability. This may be due to the fact that both
ionized hydrogen bonds uninvolved in the function of the charged donors and charged acceptors will tend to be
protein have not been eliminated in this way by evolution more exposed to the water and less likely to form hydro-
by natural selection is unknown. Buried, ionized hydro- gen bonds. A reciprocal argument could be invoked to
gen bonds, however, are rare; most ionized hydrogen explain the fact that hydrogen bonds between hydroxyl
bonds are found on the surfaces of crystallographic groups are more frequent than expected (Table 6–5),
molecular models of proteins where they can be stabi- because amino acids bearing hydroxyl groups are often
lized by the solvation of the water. Even then they repre- buried (Table 6–2). Nevertheless, with few exceptions, the
sent a minority of the ionized hydrogen bonds that frequencies with which each of the particular hydrogen
potentially could form. Most of the fortuitously juxta- bonds are observed are about those expected from the
posed, oppositely charged side chains on the surface of a probability that the respective donor and acceptor would
crystallographic molecular model do not participate in encounter each other at random, regardless of charge.
hydrogen bonds319 “even though in most cases there is no An example of the interchangeability of charged
steric reason why they cannot.”349 It is the competition of and uncharged donors and acceptors of hydrogen bonds
the donors and acceptors of the water that prevents it. occurs in phycobiliproteins, where an ionized hydrogen
The frequency with which ionized hydrogen bonds bond between an arginine and an aspartic acid in one
are observed in crystallographic molecular models (Table species is replaced isochorically by hydrogen bonds
6–5) is no greater than the probability that they would between two glutamines in another:352
occur at random. Only hydrogen bonds between two side )
chains are considered in the tabulation, and the proba- O
OH
bility that a certain hydrogen bond will form at random NH
(
natural
O selection
is calculated from the frequency with which the amino OH 1
N N
H H
(6–6)
H
NH
OG 279
OO
O
OH
O 278
O 277
N
CA 350
H
OG 228
NZ 350
Suggested Reading
Gibbs, M.R., Moody, P.C.E., & Leslie, A.G.W. (1990) Crystal struc-
O 278
CA 350
O 277
H H H
H NH H N NH O
N 2 5 4 4
(NH O O HN
O
O
H H
H
NH O H H
H NH O
O
N 10 12 9 7
(NH O
O
H ) O
H
NH H
H NH
N 6 10 7 4
(NH O
O
H H O HO
O
H H
HN O
H
H 0 4 O 15 8
N H O H
O
( O
O
H
H 5 9
N H O
O
HN N HX 3 5
O
( )
H
H
N H O 0 7 4 6
O
N NH X
O
( H
) H O HO
O
O HN
O
c
12 9 1 7c
NH O
O O H
O O O
c
3 18c 17 10
O HO
O
)O HO
O
)
a
From tables in refs 8, 22, 97, and 98. bProbability that the hydrogen bond would occur at random, calculated only from frequencies of functional groups350,351 in proteins
and their respective number of donors or acceptors, assuming no preferences for type of hydrogen bond. cProbability on the same scale as the others but not included for
normalization.
(A) The side chain of which of the 20 amino acids in the drawings in Table 6–5. Indicate clearly the
descends from the top left of the figure into its chemical identity of each donor and acceptor in
center? your drawing by including enough of its structure
that there is no doubt as to what functional group
(B) Draw the structures of all of the hydrogen bonds
it is and by labeling it.
made by the donors and acceptors on this side
chain. In your drawing include the s lone pairs of (C) Is the side chain charged or neutral? How can you
the acceptors and the hydrogens of the donors as be sure?
306 Atomic Details
(D) In a solution of protein, are there an excess of gen bonds are between the p clouds of tyrosine and
donors or acceptors for hydrogen bonds? On the tryptophan acting as acceptors and the ammonium
basis of this consideration, why should all donors nitrogen–hydrogens of lysines.358,359 When the nitro-
of hydrogen bonds find a partner? Do all of the gen–hydrogen bonds are themselves attached to a
donors on the amino acid side chain in the center p system, however, as with glutamine, asparagine, argi-
of the figure find acceptors? nine, and histidine, their p cloud is often stacked on the
p cloud of the aromatic side chain. In such instances,
(E) Because the figure is for a region of electron den-
the nitrogen–hydrogen bonds point away from the aro-
sity from the center of the molecule of protein,
matic ring,359,360 and none can form a hydrogen bond
what is the most unexpected feature of the
with it. There are exceptions, however, such as
arrangement? What seems to permit this unex-
Glutamine 96 in human HLA class I histocompatibility
pected arrangement?
antigen A-2 (Figure 6–22) and Glutamine 399 in ribu-
lose-bisphosphate carboxylase (Figure 6–44B).
As is the case with the backbone of the polypeptide,
Hydrogen Bonds when the donors of hydrogen bonds on the side chains of
the amino acids are removed from the water and
Although they are less frequent than the hydrogen stripped of their hydrogen bonds with the solvent, there
bonds between the amido nitrogen–hydrogens and the would be a considerable increase of enthalpy if they did
acyl oxygens of the backbone producing the secondary not find new partners in the interior of the protein. Most
structure of a protein, hydrogen bonds between the if not all of them do.
donors and acceptors on the side chains of the amino One of the remarkable features of the buried
acids are common features of crystallographic molecu- hydrogen bonds that result from this energetic impera-
lar models. The stereochemistry of such hydrogen tive is that they tend to be clustered. For example, of the
bonds is as expected.354 The various acceptors to the 54 side chains in myoglobin that form hydrogen bonds
nitrogen–hydrogen bonds that are donors on the side with atoms in the protein other than bound water, 16
chains of glutamine, asparagine, arginine, histidine, and participate in eight closed pairs and nine participate in
tryptophan are located preferentially at positions to three closed triplets, but 29 participate in larger clus-
which the sp2 nitrogen–hydrogen bonds of the donors ters.30 These clusters often incorporate buried water.
are pointed. The donors to the oxygens that are accep- Examples of such clusters occur in deoxyribonuclease I
tors on glutamate, aspartate, asparagine, and glutamine (Figure 6–44A)8 and in ribulose-bisphosphate carboxy-
show some preference for the positions at 120 ∞ to the lase (Figure 6–44B).361 In these clusters, charged amino
carbon–oxygen double bond to which the sp2 lone pairs acids participate as donors and acceptors of hydrogen
of electrons on the oxygens are pointed, but there is bonds as readily as uncharged amino acids and there is
much more flexibility to their locations as they pivot no obvious balancing between positive and negative
around these lone pairs (Figure 5–10D). The distribu- charges (Figure 6–44A).
tion of hydrogen-bond donors and acceptors around Clusters of hydrogen bonds serve to orient func-
the hydroxyl oxygens of serines and threonines is even tionally important amino acids. For example, a “complex
more flexible, but there are noticeable preferences for network of hydrogen bonds” serves to orient the six his-
the two positions at dihedral angles c2 of 80 ∞ and 280 ∞ . tidines responsible for chelating the copper and the zinc
The donors and acceptors to the phenolic oxygen of in superoxide dismutase.341 Histidine 57 in the active site
tyrosine, however, have a strong preference to be in the of chymotrypsin is oriented by a hydrogen bond to
plane of the ring at angles of 120 ∞ to the carbon–oxygen Aspartate 102, which in turn is oriented by three other
bond, as expected from the sp2 hybridization of the hydrogen bonds, one to each of its three remaining
oxygen. The three nitrogen–hydrogen bonds on lysine acceptors.17 Histidine 31 in deoxyribonuclease I is func-
are almost always occupied by three respective accep- tionally important and is held in position by the cluster
tors arranged around the ammonium ion at angles near in which it participates (Figure 6–44A), as is Histidine 325
the 109 ∞ expected from its sp3 hybridization15,355,356 but in ribulose-bisphosphate carboxylase (Figure 6–44B).
not located at any preferred dihedral angles c4.354 The hydrogen bond in the case of deoxyribonuclease I
It is fairly common (17% of the tryptophans, 9% of forces the dihedral angle c2 of Histidine 31 to assume an
the tyrosines, 6% of the phenylalanines, and 1% of the unfavorable value when it is positioned properly.
histidines in crystallographic molecular models from Carboxylic acids, histidines, and arginines are most sus-
data sets with minimum Bragg spacings less than ceptible to such pinning because they have donors and
0.17 nm) for a nitrogen–hydrogen, an oxygen–hydrogen, acceptors at two or more separate locations on their side
or a sulfur–hydrogen bond to be directed towards the chains, and they are rigid structures because of their
p cloud of an aromatic side chain with its hydrogen p molecular orbital systems. These features make them
close enough (< 0.3 nm) to conclude that a hydrogen easily immobilized.
bond has been formed.357 Most frequently, these hydro- Just as an accounting of the concentrations of all of
Figure 6–44: Examples of clusters of hydrogen bonds among side chains. (A) A
large cluster of partially buried hydrogen bonds in the active site of the crystallo-
graphic molecular model (Bragg spacing ≥ 0.20 nm) of bovine deoxyribonuclease
I.8 Segments of the folded polypeptide from Phenylalanine 6 to Arginine 9,
Isoleucine 59 to Arginine 63, Tyrosine 76 to Tyrosine 80, Serine 110 to Glutamate
112, Alanine 132 to Serine 135, Methionine 166 to Phenylalanine 169, Alanine 232
insignificant.
to Arginine 235, and Serine 250 to Proline 254 are drawn. Only those side chains
involved in the hydrogen-bonding are drawn. Locations of molecules of water are
drawn as unbonded oxygen atoms (open circles). This drawing was produced with
MolScript.573 (B) Completely buried string of hydrogen bonds in the center of the
b barrel of parallel consecutive strands in the crystallographic molecular model
(Bragg spacing ≥ 0.16 nm) of ribulose-bisphosphate carboxylase.361,572 The b bar-
rel has eight b strands and each inserts two amino acids, i and i + 2, into the core.
The drawing presents the eight consecutively arranged strands of the b barrel from
Leucine 167 to Threonine 171, Phenylalanine 197 to N-(Carboxy)lysine 199,
Glycine 235 to Tyrosine 237, Isoleucine 262 to Aspartate 266, Leucine 288 to
Histidine 292, Histidine 323 to Histidine 325, Leucine 373 to Alanine 376, and
Serine 396 to Glutamine 399, as well as a segment from Valine 155 to Leucine 160
capping the barrel. Only hydrogen bonds between side chains in the core of the
b barrel are drawn. These are a string from Glutamate 156 through Histidine 323
and Histidine 290 to Histidine 325 and hydrogen bonds between the two amido
nitrogen–hydrogens on Glutamine 399 and the phenolic oxygen of Tyrosine 237
and the p system of Histidine 290. Consequently, all six of the hydrogen bonds
together form a linked cluster. This drawing was produced with MolScript.573
Y76 Y76
E78 E78
E156 E156
R111 R111
E39 E39
F197 F197
L167 I262 L167 I262
S396 S396 N7 N7
H134 H134
D251 D251
Y237 Y237 D168 D168
H252 H252
A376 A376
H292 H292
307
308 Atomic Details
almost all of the acceptors within a molecule of protein anyway. All of these considerations should be kept in
have acid dissociation constants associated with their mind when the standard free energy of formation for a
s lone pairs of electrons that are not appreciably differ- hydrogen bond is being assessed experimentally,
ent from that of the lone pairs of electrons on water because it is often the case that these differences in
(pKa = –1.7). Consequently, the enthalpy of formation of importance between donor and acceptor affect the
that hydrogen bond will be close to 0 (Equation 5–49). results of the experiment.
Because both the competition of the water for this donor The necessity that the donor of a hydrogen bond
(Equation 5–45) and the entropies of approximation retain an acceptor is particularly relevant when the
involved in forming the regular structures of the indole of tryptophan is considered. The side chain of
polypeptide backbone (Figure 5–19) or a hydrogen bond tryptophan is remarkably soluble in ethanol,175 which
between two side chains are entropic terms, they can be has twice as many acceptors of hydrogen bonds as
combined into the larger question of the change in stan- donors. Likewise, during partition between water and
dard entropy accompanying the folding of the polypep- 1-octanol, the side chain of tryptophan has the greatest
tide. preference for 1-octanol of all the amino acids (Figure
If upon folding, however, a donor for a hydrogen 5–24). As the indole contains only a donor, a net of one
bond, such as the nitrogen–hydrogen of an amide or the hydrogen bond is created every time it is dissolved in
oxygen–hydrogen of a hydroxyl group, finds itself ethanol. When it is transferred from water to 1-octanol, a
sequestered within the structure without an acceptor, net of one hydrogen bond is also created because a solu-
the number of hydrogen bonds in the solution decreases tion of indole in water has more donors than acceptors
by one. This unsatisfactory sequestration would produce and 1-octanol has more acceptors than donors. When
a change in standard enthalpy* of +15 to +20 kJ mol–1 indole is transferred, empty donors disappear from water
(Table 5–2) and would consequently squander a consid- and empty acceptors disappear in the alcohol.
erable portion of the net free energy available for folding. Consequently, the side chain of tryptophan is signifi-
Consequently, such a loss must be avoided, and it is cantly more hydrophilic364 than is indicated by its solu-
likely that every nitrogen–hydrogen bond and bility in ethanol or its transfer between water and
oxygen–hydrogen bond of a folded polypeptide partici- 1-octanol, two proposed measurements of its hydropho-
pates as a donor in a hydrogen bond, either with water, bicity.174,175 Because a solution of protein has, as does
with an acyl oxygen of a peptide bond, or with a lone pair ethanol or 1-octanol, more acceptors than donors, simi-
of electrons on a side chain. It comes as no surprise that lar imbalances of donors and acceptors have a major
there are few6,362 if any113,363 unoccupied donors of hydro- effect on the distribution of amino acids between the sur-
gen bonds in crystallographic molecular models. face and the interior of a molecule of protein or the cou-
If, upon folding, an acceptor such as a lone pair of pling of the donors and acceptors of hydrogen bonds
electrons on the acyl oxygen of an amide or on the withdrawn from water during the process of folding the
oxygen of a phenol or alcohol finds itself without a donor, polypeptide.
there is not much of a penalty. For example, if as many as One example of the strong tendency of tryptophan
half of the excess of acceptors over donors in the to retain its hydrogen bond with water occurs in the
polypeptide were to become sequestered unoccupied, structure of the Bence-Jones protein Rhe. A tryptophan
the increase in the free energy of formation of the hydro- in the center of the crystallographic molecular model of
gen bonds in the solution would be less than –RT ln 0.5 this protein, though completely buried, is engaged in a
or 1.7 kJ (mol of folded polypeptide)–1. It comes as no sur- hydrogen bond with a buried molecule of water sitting
prise that there are quite a few unoccupied acceptors of next to its indole nitrogen (Figure 6–39).280 This molecule
hydrogen bonds in crystallographic molecular models. of water is trapped in the interior during the folding of
The most obvious examples of unoccupied acceptors are the polypeptide, and its two donors are hydrogen-
the second lone pairs of electrons on the acyl oxygens in bonded in turn to two acyl oxygens from the backbone.
a b sheet buried in the center of a molecule of protein In g-II crystallin, two of the tryptophans are also hydro-
(Figure 6–9). The fact that only a fraction of the acyl oxy- gen-bonded to buried molecules of water.168 Usually,
gens on either the backbone or on the side chains end up however, tryptophan retains the hydrogen bond to the
with two donors (Figure 6–7) is inconsequential because nitrogen–hydrogen bond of its indole in less dramatic
many acceptors were vacant before folding occurred ways. For example, all of the donors in the indoles of the
tryptophans of chymotrypsin retain hydrogen bonds
* This change in standard enthalpy is not to be confused with the with the solvent or another acceptor in the interior.17 The
dissociation of a hydrogen bond in the reaction described in other two tryptophans in g-II crystallin form hydrogen
Equation 5–22. In this situation, in which the dissociated donor and bonds with acyl oxygens.168 In deoxyribonuclease I, all of
acceptor are not sequestered from the solvent, equiergonic hydro- the tryptophans, though mostly buried, retain contact
gen bonds are formed between the dissociated donor and the dis-
sociated acceptor with surrounding molecules of water, there is no
with the solvent at their nitrogen–hydrogen bonds.8 The
net decrease in the concentration of hydrogen bonds in the solu- indole nitrogen–hydrogen bond of Tryptophan 21 in the
tion, and the change in standard enthalpy is 0. lipoyl domain of dihydrolipoyllysine-residue acetyl-
Hydrogen Bonds 309
transferase from B. stearothermophilus, which does not and un-ionized hydrogen bonds will form during the
fully exchange with 2H2O in the solvent over 3 years, is refinement rather than in the actual protein. Such fan-
well buried in the core of the protein but hydrogen- tastical hydrogen bonds on the surface of a crystallo-
bonded to the acyl oxygen of Proline 61.365 graphic molecular model tend to appear and disappear
There are also other anecdotal instances in which as further refinement is performed and as the data set is
the requirement that donors must be occupied seems to extended to narrower Bragg spacing. For example, a set
be expressed. Arginine is one of the best examples. When of 12 hydrogen bonds on the surface of myoglobin
it is partially buried, all of the five hydrogen-bond donors between pairs of amino acid side chains in which both of
on the guanidinium are provided with acceptors (Figure the partners have been conserved by natural selection
6–41).366 In the binding site on trypsin with which both throughout all myoglobin sequences had been identified
arginine and lysine associate normally, there is a constel- in a refined molecular model of the protein.30 When the
lation of acceptors that can occupy in turn the five Bragg spacing of the data set was decreased and the
donors on the former and the three donors on the latter refinement significantly improved,372 seven of these
even though the dispositions of those donors do not hydrogen bonds, four of which had been between oppo-
overlap. Consequently, there are empty acceptors in sitely charged side chains, were no longer present in the
each complex but never empty donors.367 When Tyrosine crystallographic molecular model.* All assignments of
385 in 4-hydroxybenzoate 3-monooxygenase is mutated hydrogen bonds between two amino acid side chains on
to a phenylalanine, it creates an empty acceptor on the surface of a protein should be regarded with skepti-
Tyrosine 201 and nothing happens, but when Tyrosine cism unless properly calculated omit maps clearly indi-
201 is mutated to phenylalanine, it creates an empty cate their existence.†
donor on Tyrosine 385 and a molecule of water is found Nevertheless, hydrogen bonds, both ionized and
in the crystallographic molecular model sitting where the un-ionized, probably do exist on the surface of a protein.
hydroxyl of Tyrosine 201 used to be and occupying that When they do, there are probably particular reasons for
donor.368 When an unoccupied hydrogen-bond donor in their existence. Steric effects of neighboring amino acids
the complex between a peptide and penicillopepsin is and the backbone of the polypeptide can bring a donor
replaced with a methylene group, the inhibitor binds 400 and acceptor together in an orientation such that
times more tightly.369 “The pH dependence of chromate entropy of approximation sufficient to overcome solva-
binding and the extremely low affinity of phosphate are tion of ions and competition by water is realized. It is also
attributable mainly to the lack of hydrogen bond accep- possible that these hydrogen bonds are simply the
tors in the binding site” of sulfate-binding protein from random result of the participation of all of the donors
Salmonella typhimurium.370 and acceptors on the surface of the molecule of protein
The difference in the importance of a donor and in the hydrogen-bonded network of the water surround-
that of an acceptor affects the magnitude of the free ing it (Figure 6–38). In this case, these hydrogen bonds
energy of formation of a hydrogen bond in a protein, but would be only the fortuitous outcome of the fact that the
so does its location in the structure. One of the unex- positions of these donors and acceptors in the larger lat-
pected observations resulting from an examination of tice happen to be adjacent to each other. This hydrogen-
crystallographic molecular models is the high frequency bonded network of waters and donors and acceptors
with which hydrogen bonds between donors and accep- from the protein itself should be a rather fluid structure.
tors, each from the protein itself, occur on the surface of The crystallographic molecular model represents only
the folded polypeptide.168 Because of the strong hydra- the structure of lowest energy in a constantly fluctuating
tion of ions or the high relative permittivity of liquid environment. One observation, however, suggesting that
water or both of these factors, ionized hydrogen bonds some of these hydrogen bonds on the surface of the crys-
between monovalent anions such as formates or acetates tallographic molecular model are real is that they have
and monovalent cations such as alkyl ammoniums, imi- negative standard free energies of formation.
dazoliums, or guanidiniums have negligible standard The standard free energies of formation for hydro-
free energies of formation in aqueous solution.371 gen bonds seen in the crystallographic molecular models
Consequently, ionized hydrogen bonds on the surface of of proteins have been estimated by site-directed muta-
a molecule of protein should be unstable, but so should tion. It is not possible to make an accurate estimate of the
neutral hydrogen bonds because the competition for the standard free energy of formation of such a hydrogen
donors and acceptors by the water surrounding them bond by mutating only one member of the pair.373 A single
should prevent them from forming. mutation will always have steric, hydrophobic, and elec-
Many of the hydrogen bonds found on the surface trostatic effects associated with it that are unrelated to the
of a crystallographic molecular model are artifacts of the
constraints applied during refinement. If potential ener-
* C. Chothia and A.M. Lesk, personal communication.
gies that favor rather than disfavor the formation of an ion † This is yet another instance in which omit maps must be used to
pair or a hydrogen bond are incorporated advertently or position atoms correctly and eliminate the artifacts inherent in the
inadvertently into the procedure for refinement, ionized constraints applied during refinement by simulated annealing.
310 Atomic Details
mation of a particular hydrogen bond in a protein free energy of formation of a hydrogen bond in the
obtained by a double-mutant cycle is not completely free middle of an a helix in a protein is indistinguishable from
of contributions from interactions with neighboring 0, even though considerable entropy of approximation is
amino acids. When all of the amino acids in a cluster of realized. All of these results emphasize that it is difficult
hydrogen bonds surrounding the hydrogen bond to form a hydrogen bond in an aqueous solution.
between Arginine 218 (TEM) and Aspartate 49 (BLIP) in Even though there are hydrogen bonds in a mole-
the complex between b-lactamase TEM-1 from E. coli cule of protein that do have negative standard free ener-
and its inhibitor protein BLIP were mutated to alanine, gies of formation, it is the structure of the protein that
the standard free energy of formation of that hydrogen approximates the donor and the acceptor, causing their
bond increased from –9 to +1 kJ mol–1.381 Similar hydrogen bond to become stable. It is this approxima-
increases of 4–6 kJ mol–1 were observed when the amino tion that overcomes, in many cases meagerly, the other-
acids surrounding three other hydrogen bonds in the wise overwhelming competition of the water for the
same cluster were mutated to alanine. Consequently, the donors and acceptors. The folding of the protein that
standard free energies of formation listed in Table 6–6 approximates the donor and acceptor in such a hydrogen
may be only lower limits of the value for that hydrogen bond is driven entirely by the hydrophobic effect. It is
bond in the absence of assistance from its surroundings. only after the hydrophobic effect has collapsed the
Approximation is probably the greatest contribu- random coil, withdrawn the donors and acceptors from
tor to the stability of a buried hydrogen bond between water, and excluded water from the interior that the
two side chains. Following formation of the secondary hydrogen bonds of the a helices and b structure are able
structure and the alignment of secondary structures by to form. It is only when the hydrophobic effect, expressed
packing, the donor and acceptor of a buried hydrogen as the minimization of the internal volume of the pro-
bond should be efficiently aligned and a considerable tein, has locked the secondary structures into the tertiary
amount of entropy of approximation should have been structure, that donors and acceptors of hydrogen bonds
realized, yet the free energies of formation of such buried between side chains are brought close enough together
hydrogen bonds are less than –30 kJ mol–1 in the most and are constrained sufficiently that they can form oth-
advantageous circumstances (Table 6–6). The wide vari- erwise unfavorable hydrogen bonds. It is only after all of
ability in the standard free energies of formation could this prelude that the observed hydrogen bond has a
reflect wide differences in the success with which donor lower standard free energy of formation than the
and acceptor are aligned given all of the steric problems hydrated, separated donor and acceptor had in the
of the interior of a protein. unfolded polypeptide.
There are other experimental observations suggest- It is the case that such favorable free energy of for-
ing that approximation is not so successful as it should mation adds to the stability of the protein, but this is an
be. In a series of tight complexes (dissociation constants illusory contribution. The amino acid sequence of the
less than 750 nM) between thermolysin and a set of lig- protein and hence the location and identity of each side
ands that bind to its active site, when the respective chain in its structure is the result of evolution by natural
nitrogen–hydrogens of the phosphonamidates in the lig- selection. The hydrogen-bonded pair of side chains that
ands, which each form a hydrogen bond with the acyl currently occupies a particular location in the structure
oxygen of Alanine 113 in the crystallographic molecular could have been chosen because it was the constellation
model of the complex,382 were replaced with methylenes, of atoms that sterically filled that particular location in
the association constants for the ligands remained the the structure most effectively relative to all of the other
same.383 When corrected for the removal of the two possibilities that were tried, not because it is a hydrogen
hydrogen–carbon bonds of the methylene from water, bond. It has a favorable free energy of formation because
the standard free energy of formation for this buried, the two side chains that were chosen for these other rea-
rigidly aligned hydrogen bond is –6 kJ mol–1, well within sons, happened to end up with a donor and an acceptor
the range of those for buried hydrogen bonds in Table adjacent to each other. The hydrogen-bonded pair is not
6–6 but not anywhere near the value predicted from the necessarily the most energetically favorable pair of side
entropy of approximation that must be realized. Even chains that could have occupied that position. In fact,
more surprising is that when the amido nitrogen–hydro- even though it was not so astute a process as evolution by
gen of a hydrogen bond in the middle of an a helix of natural selection that determined the choice of the
T4 lysozyme was replaced with an ester oxygen, the free replacements, it is sometimes the case that the double
energy of folding of the protein increased384 by 7 kJ mol–1, mutant in a double-mutant cycle or even one of the
but that increase was indistinguishable from the increase single mutants is as stable as the wild type containing the
expected for the enthalpy of formation of the hydrogen hydrogen bond.
bond between the acyl oxygen of the ester, relative to the The relationship between the strength of a hydro-
acyl oxygen of the unmutated amide, and the nitro- gen bond and the difference in pKa between donor and
gen–hydrogen with which it forms a hydrogen bond.385 In acceptor (Figure 5–14) has been verified in the context of
other words, these latter experiments suggest that the a molecule of protein. As the pKa of Tyrosine 27 in micro-
312 Atomic Details
coccal nuclease from Staphylococcus aureus, which forms where [AL9B] is the total concentration of hydrogen
a hydrogen bond with Glutamate 10, was lowered by sub- bonds, both deuterated and protonated; and i0,AHB is the
stituting various fluorinated tyrosines, the free energy of intensity of the absorption in H2O. The normalized
folding of the protein, presumably reflecting the intensity of the absorption of the proton in the hydrogen
decreases in the free energy of formation of the hydrogen bond as a function of xH2O is fit by nonlinear least squares
bond, decreased386 by 2.0 kJ mol–1 (unit of pKa)–1. This to Equation 6–11 to obtain f.
value for the Brønsted coefficient is near that observed In this way, the fractionation factors for the protons
(Equation 5–37) for a hydrogen bond in CCl4 [1.3 kJ mol–1 within the hydrogen bonds of the secondary structure of
(unit of pKa)–1]. This increase in strength, as the increase a protein can be measured. There are results suggesting
in the acidity of the phenolic side chain matches its pKa that a significant portion of these protons have fraction-
more closely with that of the glutamate, suggests that, as ation factors less than 1. For example, 13 of the 87 amino
in other situations, the hydrogen bond between an acid acids in the phosphocarrier protein HPr from Bacillus
and its conjugate base should be a strong one. subtilis394 and 36 of the 231 amino acids in micrococcal
In crystallographic molecular models there are nuclease from S. aureus393 have been reported to have
examples of hydrogen bonds between an acid and its amido protons with fractionation factors less than 0.80,
conjugate base. In turkey troponin C, Glutamate 57 and six of the 76 amino acids in ubiquitin have fraction-
forms a geometrically ideal hydrogen bond with ation factors less than 0.90.396 There is some uncertainty
Glutamate 88 in which it is unknown on which carboxy- to these measurements because it is quite difficult to
late the proton resides.24 There is no experimental indi- equilibrate all of the protons in the hydrogen bonds of
cation, however, that such hydrogen bonds are the secondary structure of a protein with deuterons in
unusually stable. The histidinium ion in the hydrogen the solution,340 and an unequilibrated position would
bond between Histidine 24 and Histidine 119 in sperm appear artifactually to have a low fractionation factor
whale myoglobin has a pKa of 6.0.387 If this were a partic- (Equation 5–31). In more recent studies of the fractiona-
ularly stable interaction, the pKa of the acid dissociation tion factors of protons in streptococcal protein G397 and
that eliminates it should have been much higher (Table the SH3 domain of proto-oncogene protein-tyrosine
2–2). The hydrogen bond388 between Lysine 206 and kinase from Gallus gallus,396 none of the protons in the
Lysine 296 of human transferrin, although necessarily nitrogen–hydrogens of the backbone had fractionation
lowering the values of pKa for the lysines participating in factors less than 0.9. Nevertheless, it is thought to be the
it,389 has been shown to destabilize the protein.390–392 case that some of the protons in the hydrogen bonds of
No evidence has been presented that hydrogen the secondary structure of many proteins have abnor-
bonds between acids and their conjugate bases in pro- mally low fractionation factors.396
teins display properties associated with low-barrier There also seems to be a correlation between the
hydrogen bonds, but hydrogen bonds displaying one fractionation factor of a proton and the length of the
such property, a low fractionation factor (Equation hydrogen bond that it occupies in a crystallographic
5–31), have been identified in proteins. The fractionation molecular model of a protein.397 In crystallographic
factor f for a proton in a hydrogen bond in a protein is molecular models built from data sets to Bragg spacing of
measured by following the fraction, ƒAHB, of the hydrogen less than 0.1 nm, the maps of electron density are accu-
bond of interest that remains undeuterated, AH9B, as a rate enough that the bond lengths of the hydrogen bonds
function of the mole fraction xH2O of undeuterated water are of sufficient reliability to identify those that are
in a series of mixtures of H2O and D2O abnormally short,398–400 and there are usually a few
abnormally short hydrogen bonds (0.26–0.28 nm) among
[ H2O ] [ L2O9HOL ] those between amido nitrogen–hydrogens and acyl oxy-
x H2O = @ gens of the backbone.398 It is thought that such shortened
[ H2O ] + [ D2O ] [ L2O9HOL ] + [ L2O9DOL ] hydrogen bonds are the ones that display low fractiona-
(6–10) tion factors and therefore are low-barrier hydrogen
bonds.397
where L again stands for either H or D. It is not possible, however, for these short low-bar-
A physical property that monitors the concentra- rier hydrogen bonds to be strong hydrogen bonds396
tion of the undeuterated hydrogen bond, such as the because the difference in pKa between the nitrogen–
intensity (iAHB) of the absorption of its proton in a nuclear hydrogen (pKa = 16) and the oxygen (pKa = –0.5) is so large
magnetic resonance spectrum is monitored.393,394 and any decrease in polarity would only widen the dif-
Equations 5–31 and 6–10 can be combined to give395 ference. Whenever a polymer as long and heterogeneous
as a molecule of protein is folded into a unique confor-
iAHB [ AH9B ] xH O mation, it is hard to believe, in spite of evolution by nat-
2
= ? f AHB @ ural selection, that all of the steric problems can be
i 0,AHB [ AL9B ] f ( 1 – x H2O ) + x H2O
solved. There must be some places in the structure that
(6–11) are tight fits. When such a tight fit occurs at a hydrogen
Hydrogen Bonds 313
bond between an amido nitrogen–hydrogen and acyl (B) The uncertainty in your calculated values was
oxygen of the backbone, the hydrogen bond shortens to estimated by the authors to be ±0.2 kJ mol–1. Is the
relieve the strain, much as the hydrogen bond in hydro- electrostatic interaction significantly different
gen maleate monoanion shortens in response to the from zero at physiological ionic strength? Is this
steric compression. This shortened hydrogen bond must surprising?
be weaker than the unshortened bond because there is (C) What conclusion would you have reached had
repulsion energy in the compressed case that would be only Arginine 16 been mutated?
relieved on relaxation to the normal distance. This
shorter but weaker hydrogen bond has a smaller frac- Problem 6–10:
tionation factor because this property is determined only
by the degree of overlap of the wells of potential energy (A) Write out the amino acid sequence of the protein
confining the proton on the donor and the acceptor. It is in the drawing below of a crystallographic molec-
a low-barrier hydrogen bond not because the strength of ular model. This drawing was produced with
the bond has brought donor and acceptor together but MolScript.573
because the contraction of the distance is imposed by the (B) List the pairs of cysteines that participate in the
rest of the framework. It has also been concluded from cystines.
studies of complexes between proteins and small ligands
that there is no correlation between the length of a (C) What structural feature of a cystine is illustrated
hydrogen bond and its strength.401 by the model?
(D) Identify the participants in a small hydrophobic
Suggested Reading cluster.
Horovitz, A., Serrano, L., Avron, B., Bycroft, M., & Fersht, A. (1990) (E) List as a pair the donor and the acceptor of each of
Strength and co-operativity of contributions of surface salt the hydrogen bonds in the model by the letter and
bridges to protein stability, J. Mol. Biol. 216, 1031–1044. number of its respective amino acid and by the
respective designation defined in Figure 4–14 for
the atom participating in the hydrogen bond.
Problem 6–9: Aspartate 12 and Arginine 16 are located in
an a helix on the surface of a mutant of the ribonuclease (F) Which hydrogen bonds are probably artifacts of
from B. amyloliquifaciens. The arginine and the aspar- the procedure used to refine the molecular model?
tate do not form an ionized hydrogen bond in the crys-
tallographic molecular model of the enzyme from the
closely related species, Bacillus intermedius, even
50
25
25
presence of threonine.375
35
40
Problem 6–11 Problem 6–11: The two stereo drawings to the right
P161
S147
represent portions of the crystallographic molecular
models of two proteins.274,366 These drawings were pro-
duced with MolScript.573
Each of the hydrogen bonds in each stereo drawing
5
2
is numbered in the figure. Draw the chemical structure of
4
5
1
each of your drawings with the number for the hydrogen
6
H
P161
N O
‘ HN ’
O
“
5
1
2
4
5
1
6
11
11
12
do not overlap entirely. Consequently they are a helical groove and those along the backbone, all provide keys for
staircase but one with narrow treads. the recognition of the double-helical DNA by the protein.
There are two helical grooves, the major groove There are two levels of recognition on which pro-
and the minor groove (facing the viewer in the upper half teins operate in binding to DNA. Certain proteins are
and lower half, respectively of Figure 3–9), the former required by their function to recognize any segment of
wider than the latter. It is in these grooves that the double-helical DNA regardless of its sequence. Examples
narrow treads of the stairs are found. Each pair of bases of such proteins are histones that form chromatin from
projects a characteristic pattern of donors and acceptors DNA, the RecA protein that catalyzes recombination,
into each groove. The pair of adenine and thymine proj- helicase and DNA-directed DNA polymerase that are
ects a methyl group, an acceptor, a donor, and an accep- components of the system replicating DNA, and DNA
tor into the major groove and two acceptors into the topoisomerase that passes one segment of DNA through
minor groove; and the pair of guanine and cytosine proj- another. These proteins recognize only the overall shape
ects an acceptor, an acceptor, and a donor into the major of a molecule of DNA and the acceptors along its phos-
groove and an acceptor, a donor, and an acceptor into phodiester backbone. Other proteins are required by
the minor groove. The order and orientation of these their function to recognize specific sequences of double-
donors and acceptors differs between a guanine– stranded DNA and bind tightly to them. Examples of
cytosine pair (6–13) and a cytosine–guanine (6–14) pair such proteins are repressors that shut off certain genes,
and between an adenine–thymine pair (6–15) and a transcription factors that initiate transcription at certain
thymine–adenine pair (6–16):404 genes, and activators that increase the rates of transcrip-
tion of certain genes. Many of these latter proteins are
Æ
Æ
Æ
Æ
“ then run along the double helix until they reach their tar-
“ “O H H “ gets, and proteins of this type must perform both levels
“
“
N HN NH O N
of recognition. Such proteins demonstrate that the abil-
N N ity to recognize specific sequences is a special case of the
“
NH N N HN
N N N N ability to recognize DNA in general.
“ “ One property of the proteins that recognize DNA is
“
NH O O HN
Æ
“
Æ
Æ
Æ
“
Æ
between DNA and the regulatory protein Cro from bacte- asparagine and a histidine among others.411 And in three
riophage 434, many of the donors to the phosphoryl oxy- successive phosphodiesters in the complex between
gens are amido nitrogen–hydrogens from the polypeptide deoxyribonuclease I and DNA, an arginine, two histidines,
backbone,409 while in that between DNA and the regula- an aspartic acid, an asparagine, a tyrosine, and a threonine
tory protein Cro from bacteriophage l, a tyrosine, a thre- provide the donors.412
onine, an asparagine, a glutamine, and two amido Donors to phosphoryl oxygens are also provided by
nitrogen–hydrogens from the backbone provide donors to molecules of water that are incorporated into the com-
phosphoryl oxygens (Figure 6–45).410 In the complex plex and form bridges to donors and acceptors on the
between topoisomerase I and DNA, hydrogen-bond protein.413 In the complex between the repressor from
donors to the phosphoryl oxygens are provided by an bacteriophage l and DNA, five amido nitrogen–hydro-
gens, two lysines, two tyrosines, five asparagines, two
glutamines, and 11 waters together provide all of the
donors for the 10 phosphodiesters in contact with the
protein.414,415
Some proteins that recognize only DNA and not
specific sequences within it use the regularity of its
Y26
Q27
6–15, and 6–16) are the most legible. The protein usually
S28
Q27
Q16
O
OO bound to it and were subsequently incorporated into the
complex.405 These waters then bridge donors and accep-
O
N
HN tors on the protein and donors and acceptors on the DNA.
They are as much a key for recognition of the DNA by the
N
HN N protein as the bases themselves. One dramatic indication
of this fact is that in crystallographic molecular models of
H
6–17
or at one of its h nitrogens and its e nitrogen (Figure
C32
A31
A13
T14
6–47),418,423 Arginine also can span two bases, offering a
A15
T33
donor to an acceptor on each. Lysine, with its three
donors, is often found occupying a single acceptor on a
base or spanning two bases.429,430 In fact, it is in providing
A266 C12
N262
donors to acceptors for the neutral bases, rather than for
the negatively charged phosphoryl oxygens, that arginine
R270
and lysine seem to be most frequently employed.
A265
Another common hydrogen bond is that between a
glutamine or an asparagine and adenine (Figure
R263
M260
6–45)414,422,431
K268
O
O
O
NH
H H
C32
H
A31
N
T14
A13
O
A15
T33
N N
N N A266 C12
N262
6–18
R270
K268
M260
major groove.
C –2
into the major groove (Figure 3–9) and are used as keys
G –3
R194
T189
S40
C3
G1 D34
D193
S40
T189
that the problem of recognizing a specific sequence ture of the double helix.425 This strategy of recognizing
would simply require reading enough of the pattern of preexisting, sequence-dependent peculiarities in the
donors and acceptors in the major groove and minor structure of the DNA, however, is difficult to separate
groove and methyl groups from thymine in the major experimentally from a strategy of recognizing sequence-
groove to make an unequivocal identification of the dependent differences in the resistance of the DNA to
sequence. Although there are a few instances in which distortion by the protein because only rarely405 is the pre-
side chains on the protein are able to form hydrogen existing structure of the segment of DNA found in the
bonds to every donor and acceptor in the major groove crystallographic molecular model of the complex known.
(Figure 6–47),423 and usually many of these features are When proteins bind specifically to a segment of
recognized by the protein either directly or through
intervening molecules of water, it is often the case that
fewer are recognized than would be necessary to make a
positive identification.425 Consequently other strategies
must be used to make a positive identification.
K71
T82
The most obvious of these is the use of packing to
recognize shape, just as in the center of a molecule of
G77
protein the dense packing of the side chains of the amino
acids is used to position the secondary structures. For the
A –7
crystallographic molecular models of complexes
A5
between a protein and its complementary DNA, calcula-
C –6
G6
tions of atomic volumes “performed in the presence and
absence of water molecules, showed that protein atoms
T7
T –5
buried at the interface with DNA are on average as closely
packed as in the protein interior. Water molecules con-
tribute to the close packing, thereby mediating shape
complementarity.”406 This close packing means that the
shape of the surface of the protein fits tightly into the
shape of the surface of the DNA and its water, particu-
K71
larly in the major groove.405 As the shape of the surface
T82
of the DNA and water in the major groove represents its
G77
sequence, it is the shape of the surface of the protein as
much as anything else that reads the sequence of the
A5
DNA.
A –7
G6
works of hydrogen bonds (Figure 6–47), but the
hydrophobic hydrogen–carbon bonds of the protein also
T –5
T7
a particular distortion of a particular sequence is signifi- round a double-helical molecule of DNA to perform their
cantly less positive than the standard free energies for same functions, and they recognize the double helix in part by
distortion of other sequences, then when a complex is its fit to the hole. In the empty state, the ring of protein
formed that requires this distortion, the free energy of for- around the hole is continuous but always contains at
mation of the complex will be more negative when the least one interface through which the polypeptide does
easily distorted sequence is bound. not pass. It is at such an interface that the ring of protein
There are experimental observations indicating splits apart to allow the DNA to enter the hole and then
that a base pair between adenine and thymine is more closes back around it.467,468
flexible than one between guanine and cytosine and that There are also proteins that bind to single-stranded
this susceptibility to distortion can be used to recognize DNA. Unlike double-helical DNA, the structure of single-
this base pair.459 In the complex between the repressor stranded DNA is undefined, but when it is bound by one
protein CI of bacteriophage 434 and its complementary
DNA (Figure 6–50), the central six pairs of bases are not
part of the sequences on either side that are indispensa-
ble for the recognition, but their sequence also deter-
mines the magnitude of the dissociation constant
between protein and DNA.449 When they are
adenine–thymine pairs rather than guanine–cytosine
pairs, the free energy of formation of the complex is more
negative. In the crystallographic molecular model of the
complex, this region of the DNA is significantly distorted
by the protein in a manner that seems as though it should
be more readily tolerated by adenine–thymine pairs than
it would be by guanine–cytosine pairs.449,454 The DNA
mismatch repair protein MutS from E. coli seems to take
advantage of the instability of double-helical DNA at a
position where the bases are mismatched to introduce a
kink at such a location.460,461 The uncomplexed segment
of DNA recognized by the trp repressor of E. coli is
already distorted in the direction in which it will be dis-
torted by the complex but is further distorted when the
complex forms. It is thought that the partial distortion of
the uncomplexed DNA demonstrates the susceptibility
of this sequence to the ultimate distortion to which it will
be submitted.405 It is also thought that the decrease in
free energy of formation observed when the N6 anilino
group of an adenine in the segment of DNA recognized
by EcoRI site-specific deoxyribonuclease is deleted
results from an increase in the ease with which this seg-
ment can be distorted by the protein.462
Just as the DNA is often distorted upon forming a
DNA-binding portion (Serine 1 to Arginine 69) of the
complete crystallographic molecular model (Bragg
Figure 6–50: Gradual bend produced in DNA as it
with MolScript.573
of the protein. This drawing was produced
numbered by their positions in the sequence
line segments) participating in the stacks,
aromatic amino acids (drawn with narrower
played as well as the side chains of the four
single-stranded DNA in the complex is dis-
with thicker lines). The entire molecule of
tion protein A and octadeoxycytidine (drawn
binding subunit (616 aa) of human replica-
(Lysine 183 to Glutamine 420) of the DNA-
complex between the DNA-binding portion
model (Bragg spacing ≥ 0.24 nm) is that of a
tein.469 The crystallographic molecular
against aromatic amino acids in the pro-
stranded DNA between each other and
Figure 6–51: Stacking of bases in single-
DNA and proteins that recognize DNA nonspecifically.
The hydrogen bonds to the donors and acceptors of the
individual bases in the particular complex that was crys-
tallized, however, are thought to arise only from the
requirement that these must be occupied somehow to
avoid losing hydrogen bonds from the solution upon
association of the DNA with the protein. There are suffi-
cient acceptors and donors of sufficient flexibility on the
protein in these locations to satisfy any sequence of
bases, as replication protein A is required to do.
The novel feature of this complex is the interactions
between p systems of amino acid side chains on the pro-
tein and the p systems of the bases that are no longer
enclosed within the core of a double helix. Phenylalanines
238 and 269 sandwich one stacked pair of cytosines, and
Tryptophan 361 and Phenylalanine 386 sandwich another
(Figure 6–51).469 Both Phenylalanine 238 and Tryptophan
Phe 269
double-helical hairpin and its loop is one of the basic Unlike DNA, which is rarely unassociated with pro-
structures formed by RNA. tein, there are species of RNA such as transfer RNA and
Double-helical hairpins of RNA can be as long as 50 messenger RNA that spend at least a part of their lives
or more pairs of bases, but they are usually interrupted free in solution. Unlike DNA, in which the proteins with
one or more times with bulges at which there is a mis- which it is associated change dramatically as it is trans-
match of the bases on the two strands that face each ferred from storage, to transcription, to replication, and
other. The mismatch causes an interruption in the to recombination; complexes between RNA and protein,
double helix. A bulge can be as small as one or two extra such as ribosomes and the small nuclear ribonucleopro-
unmatched bases that protrude out of the double helix tein particles that form spliceosomes, often have fixed
on one of the strands while the strand on the other side structures that remain essentially unchanged during
contains no mismatched base. Uracil 59 and Cytosine 60, their lifetimes. Such ribonucleoproteins are biologically
found between the 12th (Guanine 53 and Cytosine 61) distinguished from proteins only by the fact that they
and the 13th (5-Methyluracil 54 and 1-Methyladenine almost always operate on other molecules of RNA.
58) pairs of bases of the horizontal double-helical hairpin Many of the atomic details of the association
in Figure 6–52, form such a small bulge immediately
before the loop of three bases (Pseudouracil 55 to
Guanine 57) following the 13th base pair. Bulges can also
occur across from each other on both strands of double-
helical RNA. The number of bases on one strand of such
1
a bulge can be the same as or different from the number
70
of bases on the other strand, and the two strands are usu-
ally independent of each other until they rejoin in the
11
7
double helix at the other end of the bulge. An inconse-
26
64
38
quential bulge of one mismatched pair of bases occurs at
60
Guanine 4 and Uracil 69 in the horizontal double-helical
53
hairpin in Figure 6–52.
The most interesting bulges, however, are the larger
31
43
ones. For example, the entire lower portion (Uracil 8 to
Cytosine 48) of the transfer RNA in Figure 6–52 is a bulge 19
out of the horizontal double-helical hairpin. It protrudes
between the seventh pair (Uracil 7 and Adenine 66) and
the eighth pair (5-Methylcystosine 49 and Guanine 65) of
bases while the opposite strand of the horizontal double
1
38
between proteins and RNA are indistinguishable from proven satisfactory for crystallographic studies, and crys-
those for the association of proteins and DNA. There are tallographic molecular models have been obtained from
hydrogen bonds formed to the acceptors on the phos- data sets gathered from them.498–503 These crystallo-
phoryl oxygens and the donors and acceptors on the graphic molecular models provide the atomic details of
bases, and molecules of water are participants in these the structure of a ribosome as well as insight into its abil-
networks of hydrogen bonds.481–485 The main difference ity to translate messenger RNA into protein.504
is that many of the bases are not in pairs, and in those As the distribution of mass suggests, the basic
bases all of their donors and acceptors of hydrogen structural element of a ribosome is the RNA. The 4600
bonds are available for recognition by acceptors and bases of the three molecules of RNA form a globular
donors on both the side chains and the polypeptide structure with which the proteins associate. The RNA,
backbone of the protein. Hydrophobic contacts are also although it is 60 times larger, is reminiscent of a molecule
made, often with the exposed p systems of the bases, of tRNA and displays all of its characteristic features:
which are accessible in the regions of the RNA that are double-helical hairpins, loops, bulges, and random
not double-helical.486 There are even instances of meander. One of the few novel features is that many of
a helices lying in grooves of the RNA.486 the hairpins are so long that they form smoothly curved
When a large globular molecule of RNA such as a double helixes that wrap around other curved double-
transfer RNA is bound in a transient complex by a protein helical hairpins in structures reminiscent of coiled coils.
such as an aminoacyl-tRNA ligase, the complex is remi- For the most part, the various proteins are found
niscent of one between a protein and double-helical associated with the outer surface of the much larger
DNA in that the surface of the protein and the surface of globular RNA. Many of the proteins are entirely globular,
the RNA in the interface fit together as cast in mold.487,488 but some of them have long segments of polypeptide,
In the more permanent complexes, however, the RNA either interior loops or segments at their carboxy-termi-
and protein are more intimately intertwined. For exam- nal or amino-terminal ends, emerging from their globu-
ple, in the U1 small nuclear ribonucleoprotein particle, a lar portions and meandering widely through the RNA.
representative component of the spliceosome, 10 sepa- Some of the globular portions of these proteins sit upon
rate proteins form a complex with one molecule of RNA the surface of the RNA, others are buried within it, but all
in which some individual proteins and multimeric com- are subordinate to it both structurally and functionally.
plexes of other proteins associate with different seg- The RNA is responsible for the ability of a ribosome
ments of the RNA.489 The RNA contains four to translate messenger RNA into protein. The RNA of the
double-helical hairpins in an open, extended structure, 30S subunit aligns the codon of the messenger RNA with
and the proteins bind to the ends of individual hair- the anticodon of the transfer RNA,505 and the RNA of the
pins490 or to the double-helical portions emerging from 50S subunit appears to catalyze506 the formation of the
the center of the molecule of RNA.489 In such a small peptide bond from aminoacyl transfer RNA and the pep-
nuclear ribonucleoprotein particle, only 20% of the mass tidyl tRNA.507
is RNA, and the RNA is a loose scaffold that ties together There is a set of small modules of protein known as
the proteins, which are responsible for most of the struc- zinc fingers that recognize specific sequences of double-
ture of the particle. helical DNA mainly by forming bonds within the major
The ultimate complex between protein and RNA is groove (Figure 6–53).508 Each of the many different zinc
a ribosome. A ribosome is a ribonucleoprotein that is by fingers is capable of recognizing the specific sequence of
mass about two-thirds RNA and one-third protein. It a segment of DNA 3–4 bases in length, and each recog-
contains three different molecules of RNA, about 3000, nizes a different sequence.509,510 Sets of these modules
1500, and 120 bases long, respectively, and about 50 dif- are strung together within the same polypeptide and
ferent molecules of protein, totalling about 7200 aa, the together recognize longer specific sequences in DNA.
largest about 350 aa long, the smallest about 50 aa long.* Four of the five zinc fingers in the zinc finger protein
There are two different subunits comprising a ribosome. GLI1 from Homo sapiens (Figure 6–53) together recog-
The 50S subunit contains the largest and smallest mole- nize a segment of DNA 14 bp long by binding consecu-
cules of RNA and 30 of the proteins; the 30S subunit con- tively to segments 3–4 bp in length.508
tains the RNA 1500 bases long and 20 of the proteins. Transcription factor IIIA has nine successive zinc
Yonath and her associates have obtained crystals of the fingers sequentially joined together within a segment of
50S subunit from Haloarcula marismortui491,492 and the its overall sequence.511,512 These nine zinc fingers
30S subunit from Thermus thermophilus,493 and Yusupov together associate with a segment of DNA 55 bp long but
and his associates have obtained crystals of the intact directly recognize sequences only in a segment 11 bp
ribosome from T. thermophilus494–496 and the 30S subunit long beginning 8 bp from one end of the overall segment,
from T. thermophilus.495,497 All of these crystals have a segment 10 bp long beginning 9 bp from the other end,
and a segment 3 bp long in the middle. The first three
* The uncertainty reflects both experimental ambiguity and differ- zinc fingers each associate with overlapping sequences
ences among species. 4 bp long in the segment 10 bp long, the fifth zinc finger
Association of Proteins with Nucleic Acid 325
associates with the sequence 3 bp long in the middle, in the sequences and the functional groups of each by
and the last three zinc fingers associate with the segment their names and their numbers (2–9, 2–10, 2–11, and
11 bp long.513 Side chains from the various fingers asso- 2–12 and Figure 4–14).
ciate with the phosphoribosyl backbone outside of the
three segments the sequences of which are recognized.
The fourth and the sixth zinc fingers do not enter the
major groove and consequently do not recognize and
bind to sequences of base pairs.
Each zinc finger is a segment of polypeptide about
30 amino acids long. Ordinarily a segment of polypeptide
H387
this short would be unable to form a specific structure
H382
because the small size would prevent the folded protein
from removing a sufficient number of hydrogen–carbon
C364
bonds from contact with the water to provide enough of
a hydrophobic effect to overcome the change in standard
C369
entropy required for folding.514 The most common solu-
tion to this problem is that a small protein or small
P361
T374
module of protein will contain several cystines in its core,
the cross-links of which provide sufficient rigidity to the
structure to overcome this deficit in standard free energy.
The zinc finger solves the problem in a similar way, but
instead of using cystines, it uses a Zn2+ cation that forms
four covalent bonds with two cysteines and two his-
tidines in the sequence of the module (Figure 6–53),
cross-linking the four amino acids together:
H387
H382
0
S S C364
C369
Zn
N
P361
N
N T374
H N
H
double-helical DNA that it recognizes.508 The crystallographic
molecular model (Bragg spacing ≥ 0.26 nm) is that of the complex
the fifth zinc finger (Proline 361 to Glycine 388) and the four pairs
of bases [d(GACC) paired with d(GGTC)] that it recognizes in addi-
Figure 6–53: A zinc finger bound to the major groove of the
tion to one pair of bases on each side (dG·dC and dA·dT, respec-
arranged covalent bonds (dashed lines) with Cysteines 364 and 369
and Histidines 382 and 387. Although its van der Waals radius is
tively) are included in the figure. The view is down the axis of the
B conformation of the DNA. The DNA is in the bottom of the figure
and the zinc finger in the top. The polypeptide is numbered
according to the complete amino acid sequence of the zinc finger
protein GLI1. Side chains from the protein read the donors and
acceptors projecting into the major groove of the DNA, but none of
the responsible hydrogen bonds has been drawn. In the protein
crystallized, the Zn2+ had been replaced by Co2+. The Co2+ is the gray
sphere near the top of the finger forming four tetrahedrally
about 10% shorter than that of Co2+ and it is a somewhat softer acid,
Zn2+ would have formed covalent bonds with the same four ligands
to produce the identical tetrahedral geometry. This drawing was
6–19
Suggested Reading
Shakked, Z., Guzikevich-Guerstein, G., Frolow, F., Rabinovich, D.,
Joachimiak, A., & Sigler, P.B. (1994) Determinants of repres-
sor/operator recognition from the structure of the trp operator
binding site, Nature 368, 469–473.
Ban, N., Nissen, P., Hansen, J., Moore, P.B., & Steitz, T.A. (2000) The
complete atomic structure of the large ribosomal subunit at 2.4
Å resolution, Science 289, 905–920.
produced with MolScript.573
Metalloproteins the catalytic subunit and sites for binding ligands on the
regulatory subunits. Therefore, the role of the Zn2+ is
As does a zinc finger, many proteins incorporate one or entirely structural. Its complexation with the thiols cre-
more metallic cations into their structure. Aside from ates and stabilizes the proper structure at the surface of
cations such as lithium, sodium, potassium, rubidium, a regulatory subunit. Only when this stable structure is
magnesium, and calcium that are dissolved in the cyto- formed can the properly folded regulatory subunit asso-
plasm or the extracellular solution and bind adventi- ciate with a complementary structure on the surface of a
tiously and randomly over the surface of a protein, there catalytic subunit, just as only when the proper structure
is a set of metallic cations that participate as specific and of a zinc finger is formed by the binding of the Zn2+ can it
necessary structural and functional elements of metallo- associate with the proper site on DNA (Figure 6–53). A
proteins. These are the cations of sodium, potassium, metallic cation fulfills such a structural role because the
magnesium, calcium, vanadium, manganese, iron, bonds it forms either covalently or ionically with lone
cobalt, nickel, copper, zinc, molybdenum, and tungsten. pairs of electrons on bases within the protein are strong
The nontransition metals, sodium, potassium, magne- ones, especially when the protein itself assists in orient-
sium, and calcium, and zinc, a transition metal inactive ing the bases advantageously.
in oxidation–reduction, occur exclusively in their most In the case of aspartate carbamoyltransferase,
common oxidation states: Na+, K+, Mg2+, Ca2+, and Zn2+, removal of the Zn2+ from the protein produces catalytic
respectively. Because of the availability of two or more subunits with full enzymatic activity. In most instances,
readily accessible oxidation levels, the other transition however, the removal of a metallic cation from a protein
metals, for example, iron or copper, are often used as leads to loss of function, and separating the effect of the
one-electron carriers, and in this role alternate between cation on the structure of a protein, which itself is
oxidation levels, for example, Fe2+ and Fe3+ or Cu+ and responsible for that function, from a direct effect at an
Cu2+. In other situations transition metals, such as the active site, in which a metallic cation is often a catalytic
Ni2+ in urease or the Fe2+ in myoglobin (Figure 4–18), fill group, is difficult. For example, when mammalian liver
roles in the active sites of enzymes in which no changes arginase, which is formed from four identical folded
in oxidation level are required and in fact are to be polypeptides, is treated with the chelating agent
avoided. N,N,N¢,N¢-tetracarboxymethyl-1,2-diaminoethane (5–1),
Eventually, the role of a metallic cation in main- it loses all of its enzymatic activity.518 At the same time,
taining the structure of a protein must be distinguished however, it dissociates into individual folded polypep-
from its role as a catalytic functional group in its active tides. When Mn2+ is added to the inactive protein, enzy-
site. Aspartate carbamoyltransferase from E. coli is a pro- matic activity is regained, but the individual folded
tein constructed from two different folded polypeptides, polypeptides reassociate. It was possible that the dissoci-
the regulatory subunit (naa = 152) and the catalytic sub- ation of the tetramer was responsible for the inactivation
unit (naa = 310). In the crystallographic molecular model and that the Mn2+ required for activity was necessary to
of aspartate carbamoyltransferase, a Zn2+ is tetrahedrally retain the proper structure of the protein rather than as a
coordinated to the four sulfurs of Cysteines 109, 114, 137, catalytic group in the active site. If, however, a crystallo-
and 140 in the regulatory subunit.515 This Zn2+ forms a graphic molecular model of the protein is available it is
tetrahedral, covalent complex with the structure possible to determine, as was the case with both a zinc
finger and aspartate carbamoyltransferase, whether or
2– not the metal is distant from sites involved in the func-
S S tion of the protein and consequently performs purely a
structural role. In the case of arginase, for example, it has
Zn been shown crystallographically that Mn2+ cations form
S S a binuclear cluster within the active site intimately
involved in the catalysis performed by the enzyme.519
The metallic cations incorporated into proteins in
6–20 aqueous solution, because they are themselves Lewis
acids, are at all times surrounded by Lewis bases. The
that resembles closely structures observed in polynu- strongest Lewis bases present in biological fluids are the
clear clusters that form between Zn2+ and small mercap- lone pairs of electrons on oxygens, nitrogens, and sulfurs
tans. When the zinc is displaced from the thiols by and the chloride ion. The proton is also a Lewis acid, and
organic mercurials, the two subunits separate from each in biological fluids every acidic proton is usually sur-
other but can be reassociated516,517 by reincorporating rounded by lone pairs of electrons on oxygens, nitrogens,
the Zn2+. In the crystallographic molecular model, the or sulfurs. The proton is so small, however, that it can
Zn2+ is located adjacent to the boundary between the reg- accommodate directly only two Lewis bases at a time in
ulatory subunit and its neighboring catalytic subunit but one hydrogen bond. Because metallic cations have core
distant from both the active site of the enzyme located on electrons, they are larger than a proton and can accom-
Metalloproteins 327
modate more Lewis bases simultaneously. The metallic Cu2+, and Zn2+ is to use hard bases such as oxygen or
cations incorporated into proteins are always sur- nitrogen as ligands. The bonds formed by these harder
rounded by pairs of electrons from oxygens, nitrogens, ligands, because they have more ionic character, are
sulfurs, or halides. The atoms providing the lone pairs of more flexible in their angles. Regardless of whether the
electrons surrounding a metallic cation are its ligands, bonds are ionic or covalent, their lengths are usually gov-
and the number of ligands surrounding the cation is its erned by the size of the ion, which is reflected in its ionic
coordination number. The complexes formed between radius (Table 6–7).
metallic cations and proteins are tetracoordinate, penta- The major structural difference between ionic
coordinate, hexacoordinate, heptacoordinate, octacoor- bonds and covalent bonds is the directional properties of
dinate, and nonacoordinate. the arrangements of the ligands. Ionic bonds are created
The preferences of a metallic cation for a particular by the electrostatic forces between the metallic cation
type of lone pair of electrons are usually discussed in and an anion or a dipole on a ligand. If the forces holding
terms of the hardness or softness of the Lewis acid and the ligands are entirely ionic, the number and orienta-
the Lewis base.520 The rule is that hard acids prefer hard tion of the Lewis bases around the cation are determined
bases and soft acids prefer soft bases. For the divalent solely by steric considerations. In the case of Ca2+, the
metal ions of importance to the structure of proteins, the number and orientation of the oxygens surrounding the
series of hardness is Mg2+ > Ca2+ > Mn2+ > Fe2+ > Co2+ > Ni2+ dication depend entirely on the size and shape of the
> Cu2+, Zn2+. For the commonly encountered bases, the functional groups that provide them.523 When ligands are
series of hardness is lone pairs on oxygen > lone pairs on bonded ionically, the larger the cation or the smaller the
chloride > lone pairs on nitrogen > lone pairs on sulfur. bases, the more ligands will be gathered. Covalent bonds
These rankings, for example, are consistent with the fact result from the overlap of atomic orbitals to form bond-
that calcium ion has a strong preference for oxygen lig- ing molecular orbitals. Because the degree of overlap
ands while zinc ion has a preference for thiol ligands. It determines the strength of the bond and because the
also explains why the ligands on metallothionein, a pro- degree of overlap depends on the bond lengths and bond
tein responsible for chelating and thus removing from angles, covalent bonds are characterized by specific
solution soft, toxic heavy metal cations, are entirely the bond lengths and bond angles. Because zinc is a d10 tran-
thiols of cysteine side chains in the protein. sition metal, its 2d shell is filled. As a result, the covalent
The Lewis bases surrounding a metallic cation in bonds between Zn2+ and sulfur in the regulatory subunit
solution are attached to it by bonds the characteristics of of aspartate carbomoyltransferase are formed from
which span the spectrum between ionic and covalent. sp3 hybrid orbitals on the zinc, and this produces the
The bonds between hard cationic Lewis acids and hard usual tetrahedral arrangement. A similar tetrahedral dis-
Lewis bases are usually ionic, and those between soft position is assumed by the four Lewis bases around the
cationic Lewis acids and soft Lewis bases are usually Zn2+ in a zinc finger (Figure 6–53). Covalent bonds posi-
covalent. The calcium dication is an example of a hard, tion the participating atoms in strict geometric orienta-
purely ionic metallic cation. In biological solutions, its tions while ionic bonds are completely malleable,
ligands are invariably oxygen atoms,521 the hardest of resembling pigs at a trough.
bases, and those oxygens are held by ionic bonds. The The fact that covalent bonds involving metals are so
zinc dication in the crystallographic molecular model of rigid is reflected in the practice during crystallographic
aspartate carbamoyltransferase (6–20) is a good example refinement of considering them as fixed geometrically as
of a metallic cation participating in covalent bonds. In the bonds involving carbon, nitrogen, and oxygen. For
this arrangement, a soft metallic cation, Zn2+, is bonding example, in the initial crystallographic molecular model
covalently to four soft bases, (RS–)4. Soft bases such as of aspartate carbamoyltransferase built directly from the
sulfur or even nitrogen are rarely found as ligands to hard unrefined map of electron density, the arrangement of
metallic cations such as Na+, K+, Mg2+, and Ca2+, but one the four sulfurs around the Zn2+ was restrained to a tetra-
way that a protein reconciles the steric difficulty of hedral geometry, just as an sp3 carbon would have been,
arranging ligands precisely enough to form unhindered and this geometry was retained in all the subsequent
covalent bonds with soft metallic cations such as Fe2+, refinement.524 This practice can be dangerous, however,
Table 6–7: Ionic Radii and Lengths of Bonds to Ligands of Metallic Cations Found as Structural Elements in Proteins
ionic radiusa (nm) 0.102 0.138 0.072 0.100 0.083 0.061 0.069 0.073 0.074
bond lengthb (nm) 0.22–0.34 0.20–0.21 0.22–0.26 0.20–0.23 0.19–0.23 0.20–0.23 0.20–0.23
a
Ionic radii for the hexacoordinated metallic cation522 and for the dications of transition metals to permit direct comparison. bValues for the metallic cations in crystallo-
graphic molecular models of proteins from the references cited in the text.
328 Atomic Details
particularly if the ligands to the Zn2+ are harder, less cova- low concentration of Na+, it is K+ that is almost invariably
lent bases.525 In such intermediate cases, various mix- incorporated as a structural metallic cation in cytoplas-
tures of ionic and covalent behavior are observed. The mic proteins. There are examples, however, of cytoplas-
main indication of such deviations from covalent behav- mic proteins incorporating Na+ during their
ior is the loss of directional ligation. crystallization, in one case at a site formed by five acyl
The monovalent cations of sodium and potassium oxygens from the backbone and the oxygen of a carboxy-
are hard Lewis acids and are almost always surrounded late528 and in another at a site formed from two acyl oxy-
by lone pairs from oxygen in any situation, as they are gens from the backbone and a molecule of water forming
when they are dissolved in water. There is, however, one three hydrogen bonds with groups on the protein.529
crystallographic molecular model in which a Na+ has the Whether or not these sites are occupied by Na+ when the
p system of a tryptophan as one of its ligands.526,527 proteins are in the cytoplasm is unknown. Extra-
Because cytoplasm has a high concentration of K+ and a cytoplasmic proteins, such as a-thrombin, however, do
seem to incorporate Na+.530
In structural sites for K+,531–534 the complex can be
anywhere from tetracoordinate to heptacoordinate. The
ion is large enough (Table 6–7) to support seven oxygens
easily, but the final number of ligands is dictated by the
cis
H593
S431
phoribosylaminoimidazolecarboxamide formyltransferase
from G. gallus, the cation is liganded by the hydroxyl
groups of two serines, the carboxylate oxygen of one
aspartate, and three acyl oxygens from the backbone
(Figure 6–54).535 The eight ligands from the protein to the
two potassium cations bound within the selectivity filter
cis
S431
H593
and the hardness of both the cation and the oxygens, the
bonding is ionic, the geometry of the ligands around the
V426
that binds Ca2+ with a dissociation constant538 of several of its waters, often all of them.485 For example, in
8 ¥ 10–8 M is typical of a complex between a Ca2+ and pro- the complex between Mg2+ and inorganic diphosphatase
tein. It is representative of such complexes because the from E. coli, the Mg2+ retains all six of its octahedrally
Ca2+ is surrounded by molecules of water, one of which is arrayed molecules of water, each of which forms one to
positioned by the molecule of protein, by acyl oxygens three hydrogen bonds with donors and acceptors on the
from the backbone and by the carboxylate of an aspar- protein.543
tate that provides simultaneously two oxygens as a It appears from crystallographic and spectral544
bidentate ligand. The charge number on the Ca2+ in this observations that Mg2+, when bound to a protein, prefers
site is not matched by that of its ligands, as is also the oxygen ligands exclusively, in particular the oxygens of
case in the heptacoordinate site for Ca2+ in a-lactalbumin phosphates and carboxylates, but the dication of man-
from Papio cynocephalus, in which the ligands are the
carboxylates of three aspartates, each a monodentate
ligand to the Ca2+, as well as two molecules of water and
two acyl oxygens from the backbone.539 There are also
structural Ca2+ that are more completely surrounded by
the protein, such as the heptacoordinate site in thermi-
C249
C178
tase,540 which is surrounded by three acyl oxygens from
the backbone, an acyl oxygen of the side chain of an
D200
asparagine, a single oxygen of the side chain of an aspar-
tate, and the two carboxylate oxygens of another aspar-
tate acting as a bidentate ligand.
S176
In these complexes, the distance between the Ca2+
E174
and the heteroatoms of the ligands is between 0.22 and
0.26 nm (Table 6–7). The position taken by the Ca2+ rela-
tive to an acyl oxygen of the backbone is remarkably sim-
ilar to that of an amido nitrogen–hydrogen in a hydrogen
bond to such an acyl oxygen (Figure 5–11) with a broad
distribution of angle b from 140 ∞ to 180 ∞ and a strong
tendency to lie in the plane of the peptide.541
These complexes are usually specific for Ca2+. The
C249
C178
specificity is provided by the distribution of the oxygens
within the protein and the donors and acceptors of hydro- D200
album
Limber.537 Segments of the folded polypeptide from
Glutamate 174 to Valine 180 and Leucine 199 to
ular model (Bragg spacing ≥ 0.15 nm) of
metal and the ligands are shorter and much less variable
(0.20–0.21 nm) than those for calcium (0.22–0.26 nm). In
a structural role, Mg2+ is often associated with phospho-
ryl oxygens, as for example those on nucleic acid. When
bound to protein or nucleic acid, Mg2+ usually retains
330 Atomic Details
ganese, Mn2+, a softer metallic ion that is about the be crystallized from anhydrous ethanol. Because of its
same size as Mg2+ and that gathers its ligands just as small size and degree of hardness, all of the complexes
tightly (Table 6–7), forms complexes with both between Mn2+ and such unhindered hard and intermedi-
nitrogen and oxygen bases such as ammonia, imidazole, ate bases are hexacoordinate and octahedral, reminiscent
1,2-diaminoethane, water, alcohols, carboxylates, the of directional covalent bonds; and in these complexes,
carbonyl oxygens of ketones and aldehydes, and the acyl mixtures of various ligands around manganese can occur.
oxygens of amides. Consistent with its degree of hardness, For example, each of the species [Mn(OH2)n(NH3)6–n]2+
oxygen bases and nitrogen bases are roughly equivalent with 0 < n < 6 is observed in mixtures of ammonia and
in their affinities for Mn2+. In aqueous solution, the hexa- water. When Mn2+ is bound to a protein, it is complexed
ammine complex is observed only at concentrations of octahedrally by Lewis bases both from amino acids on the
ammonia greater than 2 M, but hexaimidazole salts can protein and from molecules of water (Figure 6–56).2
Iron, when it is found in a protein, is almost always
in a coenzyme such as a heme (Figure 4–18) or an
iron–sulfur cluster:
S
S
D140
Fe S
S155
S Fe
I153
S Fe
S
H145
D149
S Fe S
6–21
E129
D131
S155
D149
thiols coordinating the metallic cation and holding it structural roles in proteins is that of zinc. Its versatility in
within the protein.554 this role arises from its ability to form both tetracoordi-
Copper exists in biochemical situations as the nate and pentacoordinate complexes with Lewis bases
kinetically stable cations Cu+ and Cu2+. It is usually used and its ability, even though it is one of the softest metal-
as a one-electron carrier, often in reactions involving lic cations, to form complexes with lone pairs from
oxygen activation such as those catalyzed by monooxy- oxygen and nitrogen, as well as sulfur. Often a mixture
genases. Although it is a soft metallic cation, Cu2+ can of two or three of these rather different bases forms the
form a number of coordination complexes with lone site on a metalloprotein for the Zn2+. When it is bound by
pairs from oxygen, nitrogen, and sulfur. The variety of four sulfurs, which are soft bases complementary to the
these complexes defies categorization. They range from soft Zn2+, the bonds are covalent and tetrahedral. The
dicoordinate to octacoordinate. Even in the more complex between Zn2+ and the harder base ammonia is
common tetracoordinate and hexacoordinate stereo- tetracoordinate [Zn(NH3)4]2+ and tetrahedral, probably
chemistries, the terms used to describe the variations, because it is also covalent. This tetrahedral covalent
such as square planar, compressed tetrahedral, elon- form of Zn2+ is the most common and is observed when
gated tetragonal octahedral, and trigonal octahedral, Zn2+ forms complexes with 1,2-diaminoethane, cyclic
indicate that the arrangement of many of the Lewis bases lactams, and imidazole. As the ligands become harder,
around copper, as with calcium, is governed mainly by however, geometries become more variable. For exam-
steric effects among the ligands, rather than by covalent ple, the complex [Zn(OH2)6]2+ between Zn2+ and water, a
bonding. Examples of complexes between Cu2+ and hard base, is hexacoordinate and octahedral; but as pro-
simple biochemical ligands are [Cu(NH3)5]2+ tons are removed, it eventually decreases its coordina-
tion to four, as [Zn(OH)3(OH2)]–, as a result of
NH 3 2+ electrostatic repulsion. An ionic, octahedral complex
forms with carboxylates:
0
Cu H
H 3N NH 3 H
H 3N NH 3 OO
O
6–22 O O”:
O ”:
: O O
OO
H 3C Zn CH 3
in which four of the nitrogens are equivalent and the
OO O:
fifth forms a longer bond to Cu2+, and O
[Cu(imidazole)4(OH2)2]2+ and [Cu(formate)2(OH2)2]2+, O
O
H
which are both elongated tetragonal octahedral struc- H
tures. Simple thiols such as mercaptans reduce Cu2+ to
Cu+ and form complex polymeric structures with Cu+ in 6–23
which the coppers are multidentate and the thiols are
bidentate. Zinc dication forms a pentacoordinate complex with,
The azurins and the plastocyanins are related met- among other ligands, 8-aminoquinazoline,558 in which
alloproteins involved in one-electron transfers in which the four nitrogens from two aminoquinazolines and a
the single copper passes reversibly from Cu2+ to Cu+ to molecule of water are the five Lewis bases that generate
carry the electron. In crystallographic molecular models the complex [Zn(N2C9H8)2(OH2)]2+.
of these proteins,555,556 the copper is coordinated to two A pentacoordinate complex is formed by the struc-
histidines, a methionine, and a cysteine with no particu- tural Zn2+ in the periplasmic zinc-binding protein TroA of
lar geometric regularity. In the apoprotein,* the two Treponema pallidium. In this complex with the protein,
nitrogens and the two sulfurs that surround the copper in the Zn2+ is surrounded by the two oxygens of the car-
the holoprotein assume the same orientations even boxylate of Aspartate 257 as a bidentate ligand and three
though the copper is not present.557 Consequently, imidazoyl nitrogens from Histidine 46, Histidine 111,
unlike the situation in a zinc finger in which the Zn2+ dic- and Histidine 177 in an irregular arrangement (Figure
tates the structure assumed by the protein, azurins and 6–57).559 Most structural sites for zinc, however, are tetra-
plastocyanins are large enough proteins that they dictate coordinate, resembling the one in a zinc finger (Figure
the stereochemistry of the ligation. 6–53). For example, in the structural sites for zinc in UTP-
The most versatile metallic dication performing hexose-1-phosphate uridylyltransferase from E. coli and
alanine-tRNA ligase from E. coli, the cations are also sur-
* The metal-free form of a metalloprotein is the apoprotein, and rounded by two histidines and two cysteines in a tetra-
the form of the metalloprotein when it contains the metal is the hedral array.547,560
holoprotein. There are a number of proteins that contain mod-
332 Atomic Details
with MolScript.573
of the protein. This drawing was produced
of the posttranslationally modified version
acid sequence is numbered according to that
metal, is drawn as an open circle. The amino
ands. A molecule of water, not a ligand to the
gray sphere in the center of the cluster of lig-
ing the ligands to the metal. The Zn2+ is the
vide acceptors for hydrogen bonds buttress-
227 and Leucine 69 to Leucine 71) that pro-
well as two others (Glutamate 226 to Serine
Aspartate 257 to Alanine 258) are drawn as
112, Alanine 176 to Alanine 179, and
43 to Valine 47, Phenylalanine 108 to Valine
segments of the folded polypeptide (Valine
acid sequence. Consequently, four short
chains distant from each other in the amino
four ligands to the Zn2+ are provided by side
binding site in lectin IV (Figure 6–56), the
lidum.559 In this case, unlike that of the Mn2+
ing protein TroA from Treponema pal-
spacing ≥ 0.18 nm) of periplasmic zinc-bind-
the crystallographic molecular model (Bragg
Figure 6–57: Site for the structural Zn2+ in
2–
S
S
Zn S
S Zn
S
S
6–24
D257
Suggested Reading
Lee, Y.H., Deka, R.K., Norgard, M.V., Radolf, J.D., & Hasemann, C.A.
(1999) Treponema pallidum TroA is a periplasmic zinc-binding
protein with a helical backbone, Nat. Struct. Biol. 6, 628–633.
ules resembling zinc fingers in that they bind to specific
sequences in DNA and are also zinc metalloproteins. Problem 6–14: The drawing in the figure on the next
Some of them have complexes that resemble the one in page is of a site for the binding of Mg2+ within the crys-
the regulatory subunit of aspartate carbamoyltrans- tallographic molecular model of inorganic pyrophos-
ferase (4–48) because the zinc forms covalent bonds phatase from E. coli.543 This drawing was produced with
with four cysteines.438,561 Others have a site formed from MolScript.573
three cysteines and only one histidine.562 Others contain
clusters formed from two Zn2+ and the sulfurs from six Identify the donors and acceptor for each of the hydro-
cysteines gen bonds.
Figure 6–58: Site for the structural Fe2+ in the crys- References 333
L4
D102
M10
H106
E105
R13
MolScript.573
L4
D102
M10
References
1. Pal, D., & Chakrabarti, P. (1999) J. Mol. Biol. 294,
H106
E105
271–288.
2. Delbaere, L.T., Vandonselaar, M., Prasad, L., Quail,
J.W., Wilson, K.S., & Dauter, Z. (1993) J. Mol. Biol. 230,
R13
950–965.
3. Brandts, J.F., Halvorson, H.R., & Brennan, M. (1975)
Biochemistry 14, 4953–4963.
4. Herzberg, O., & Moult, J. (1991) Proteins: Struct., Funct.,
Genet. 11, 223–229.
5. Blake, C.C., Ghosh, M., Harlos, K., Avezoux, A., &
Anthony, C. (1994) Nat. Struct. Biol. 1, 102–105.
6. Wilson, K.S., Butterworth, S., Dauter, Z., Lamzin, V.S.,
Walsh, M., Wodak, S., Pontius, J., Richelle, J., Vaguine, Problem 6–14
D26
A., Sander, C., Hooft, R.W.W., Vriend, G., Thornton,
J.M., Laskowski, R.A., MacArthur, M.W., Dodson, E.J.,
Murshudov, G., Oldfield, T.J., Kaptien, R., & Rullmann,
N24’
13. Peti, W., Hennig, M., Smith, L.J., & Schwalbe, H. (2000)
J. Am. Chem. Soc. 122, 12017–12018.
14. Smith, L.J., Bolin, K.A., Schwalbe, H., MacArthur, M.W.,
Thornton, J.M., & Dobson, C.M. (1996) J. Mol. Biol. 255,
N24’
494–506.
15. Vrielink, A., Lloyd, L.F., & Blow, D.M. (1991) J. Mol. Biol.
219, 533–554.
16. Strynadka, N.C., & James, M.N. (1991) J. Mol. Biol. 220,
401–424.
N24
17. Birktoft, J.J., & Blow, D.M. (1972) J. Mol. Biol. 68,
D26’
187–240.
18. Esposito, L., Vitagliano, L., Sica, F., Sorrentino, G.,
334 Atomic Details
Zagari, A., & Mazzarella, L. (2000) J. Mol. Biol. 297, 47. Lockhart, D.J., & Kim, P.S. (1993) Science 260, 198–
713–732. 202.
19. Takano, T. (1984) in Methods and Applications in 48. Rogers, N.K., & Sternberg, M.J. (1984) J. Mol. Biol. 174,
Crystallographic Computing: Papers presented at the 527–542.
International Summer School on Crystallographic 49. Nicholson, H., Becktel, W.J., & Matthews, B.W. (1988)
Computing, held at Kyoto, Japan (Ashida, T., & Hall, S. Nature 336, 651–656.
R., Eds.) p 262, Clarendon Press, Oxford, U.K. 50. McLachlan, A.D., & Stewart, M. (1975) J. Mol. Biol. 98,
20. Matthews, B.W., Weaver, L.H., & Kester, W.R. (1974) J. 293–304.
Biol. Chem. 249, 8030–8044. 51. Chatton, B., Walter, P., Ebel, J.P., Lacroute, F., &
21. Pauling, L., Corey, R.B., & Branson, H.R. (1951) Proc. Fasiolo, F. (1988) J. Biol. Chem. 263, 52–57.
Natl. Acad. Sci. U.S.A. 37, 205–211. 52. Zhou, N.E., Zhu, B.Y., Sykes, B.D., & Hodges, R.S. (1992)
22. Artymiuk, P.J., & Blake, C.C. (1981) J. Mol. Biol. 152, J. Am. Chem. Soc. 114, 4320–4326.
737–762. 53. Chou, P.Y., & Fasman, G.D. (1978) Adv. Enzymol. Relat.
23. Barlow, D.J., & Thornton, J.M. (1988) J. Mol. Biol. 201, Areas Mol. Biol. 47, 45–148.
601–619. 54. Fasman, G.D. (1967) Poly-a-Amino Acids; Protein
24. Herzberg, O., & James, M.N. (1988) J. Mol. Biol. 203, Models for Conformational Studies, Marcel Dekker,
761–779. New York.
25. Watson, H.C. (1969) Prog. Stereochem. 4, 299–333. 55. Myers, J.K., Pace, C.N., & Scholtz, J.M. (1997)
26. Gray, T.M., & Matthews, B.W. (1984) J. Mol. Biol. 175, Biochemistry 36, 10923–10929.
75–81. 56. Serrano, L., Neira, J.L., Sancho, J., & Fersht, A.R. (1992)
27. Bolin, J.T., Filman, D.J., Matthews, D.A., Hamlin, R.C., Nature 356, 453–455.
& Kraut, J. (1982) J. Biol. Chem. 257, 13650–13662. 57. Blaber, M., Zhang, X.J., Lindstrom, J.D., Pepiot, S.D.,
28. Manas, E.S., Getahun, Z., Wright, W.W., DeGrado, W.F., Baase, W.A., & Matthews, B.W. (1994) J. Mol. Biol. 235,
& Vanderkooi, J.M. (2000) J. Am. Chem. Soc. 122, 600–624.
9883–9890. 58. Myers, J.K., Pace, C.N., & Scholtz, J.M. (1997) Proc. Natl.
29. Blundell, T., Barlow, D., Borkakoti, N., & Thornton, J. Acad. Sci. U.S.A. 94, 2833–2837.
(1983) Nature 306, 281–283. 59. O’Neil, K.T., & DeGrado, W.F. (1990) Science 250,
30. Takano, T. (1977) J. Mol. Biol. 110, 537–568. 646–651.
31. Remington, S., Wiegand, G., & Huber, R. (1982) J. Mol. 60. Chakrabartty, A., Schellman, J.A., & Baldwin, R.L.
Biol. 158, 111–152. (1991) Nature 351, 586–588.
32. Sauer, U.H., San, D.P., & Matthews, B.W. (1992) J. Biol. 61. Yang, D.S., Sax, M., Chakrabartty, A., & Hew, C.L. (1988)
Chem. 267, 2393–2399. Nature 333, 232–237.
33. Pflugrath, J.W., & Quiocho, F.A. (1988) J. Mol. Biol. 200, 62. Spek, E.J., Wu, H.C., & Kallenbach, N.R. (1997) J. Am.
163–180. Chem. Soc. 119, 5053–5054.
34. Hasemann, C.A., Ravichandran, K.G., Peterson, J.A., & 63. Blaber, M., Zhang, X.J., & Matthews, B.W. (1993)
Deisenhofer, J. (1994) J. Mol. Biol. 236, 1169–1185. Science 260, 1637–1640.
35. Heinz, D.W., Baase, W.A., Zhang, X.J., Blaber, M., 64. Hasson, M.S., Muscate, A., McLeish, M.J., Polovnikova,
Dahlquist, F.W., & Matthews, B.W. (1994) J. Mol. Biol. L.S., Gerlt, J.A., Kenyon, G.L., Petsko, G.A., & Ringe, D.
236, 869–886. (1998) Biochemistry 37, 9918–9930.
36. Keefe, L.J., Sondek, J., Shortle, D., & Lattman, E.E. 65. Kelly, M.A., Chellgren, B.W., Rucker, A.L., Troutman,
(1993) Proc. Natl. Acad. Sci. U.S.A. 90, 3275–3279. J.M., Fried, M.G., Miller, A.F., & Creamer, T.P. (2001)
37. Richardson, J.S., & Richardson, D.C. (1988) Science 240, Biochemistry 40, 14376–14383.
1648–1652. 66. Romao, M.J., Turk, D., Gomis-Rueth, F.X., Huber, R.,
38. Presta, L.G., & Rose, G.D. (1988) Science 240, Schumacher, G., Moellering, H., & Ruessmann, L.
1632–1641. (1992) J. Mol. Biol. 226, 1111–1130.
39. Harper, E.T., & Rose, G.D. (1993) Biochemistry 32, 67. Ma, K., Kan, L.-s., & Wang, K. (2001) Biochemistry 40,
7605–7609. 3427–3438.
40. Bordo, D., & Argos, P. (1994) J. Mol. Biol. 243, 504–519. 68. Boyington, J.C., Gaffney, B.J., & Amzel, L.M. (1993)
41. Serrano, L., Sancho, J., Hirshberg, M., & Fersht, A.R. Science 260, 1482–1486.
(1992) J. Mol. Biol. 227, 544–559. 69. Dickinson, C.D., Veerapandian, B., Dai, X.P., Hamlin,
42. Bell, J.A., Becktel, W.J., Sauer, U., Baase, W.A., & R.C., Xuong, N.H., Ruoslahti, E., & Ely, K.R. (1994) J.
Matthews, B.W. (1992) Biochemistry 31, 3590–3596. Mol. Biol. 236, 1079–1092.
43. Schellman, C. (1980) in Protein Folding: Proceedings of 70. Eklund, H., Nordstreom, B., Zeppezauer, E.,
the 28th Conference of the German Biochemical Society Seoderlund, G., Ohlsson, I., Boiwe, T., Seoderberg,
(Jaenicke, R., Ed.) p 53, Elsevier/North-Holland B.O., Tapia, O., Breandaen, C.I., & Akeson, A. (1976) J.
Biomedical Press, Amsterdam. Mol. Biol. 102, 27–59.
44. Hol, W.G., van Duijnen, P.T., & Berendsen, H.J. (1978) 71. Chothia, C. (1973) J. Mol. Biol. 75, 295–302.
Nature 273, 443–446. 72. Herzberg, O. (1991) J. Mol. Biol. 217, 701–719.
45. Wishart, D.S., Sykes, B.D., & Richards, F.M. (1991) J. 73. Richardson, J.S., Getzoff, E.D., & Richardson, D.C.
Mol. Biol. 222, 311–333. (1978) Proc. Natl. Acad. Sci. U.S.A. 75, 2574–2578.
46. Aqvist, J., Luecke, H., Quiocho, F.A., & Warshel, A. 74. Scapin, G., Gordon, J.I., & Sacchettini, J.C. (1992) J. Biol.
(1991) Proc. Natl. Acad. Sci. U.S.A. 88, 2026–2030. Chem. 267, 4253–4269.
References 335
75. Sacchettini, J.C., Gordon, J.I., & Banaszak, L.J. (1989) J. 106. Sibanda, B.L., & Thornton, J.M. (1985) Nature 316,
Mol. Biol. 208, 327–339. 170–174.
76. Wilmanns, M., Priestle, J.P., Niermann, T., & Jansonius, 107. Efimov, A.V. (1986) Mol. Biol. (USSR) 20, 250–260.
J.N. (1992) J. Mol. Biol. 223, 477–507. 108. Volz, K., & Matsumura, P. (1991) J. Biol. Chem. 266,
77. Murzin, A.G., Lesk, A.M., & Chothia, C. (1994) J. Mol. 15511–15519.
Biol. 236, 1382–1400. 109. Chou, P.Y., & Fasman, G.D. (1977) J. Mol. Biol. 115,
78. Yoder, M.D., Keen, N.T., & Jurnak, F. (1993) Science 260, 135–175.
1503–1507. 110. Moore, S.A., & James, M.N. (1995) J. Mol. Biol. 249,
79. Beaman, T.W., Binder, D.A., Blanchard, J.S., & 195–214.
Roderick, S.L. (1997) Biochemistry 36, 489–494. 111. Milner-White, E., Ross, B.M., Ismail, R., Belhadj-
80. Xia, Z., Dai, W., Zhang, Y., White, S.A., Boyd, G.D., & Mostefa, K., & Poet, R. (1988) J. Mol. Biol. 204, 777–
Mathews, F.S. (1996) J. Mol. Biol. 259, 480–501. 782.
81. Varghese, J.N., & Colman, P.M. (1991) J. Mol. Biol. 221, 112. Henrick, K., Collyer, C.A., & Blow, D.M. (1989) J. Mol.
473–486. Biol. 208, 129–157.
82. Crennell, S.J., Garman, E.F., Philippon, C., Vasella, A., 113. Baker, E.N., & Hubbard, R.E. (1984) Prog. Biophys. Mol.
Laver, W.G., Vimr, E.R., & Taylor, G.L. (1996) J. Mol. Biol. 44, 97–179.
Biol. 259, 264–280. 114. Merritt, E.A., Kuhn, P., Sarfaty, S., Erbe, J.L., Holmes,
83. Ormo, M., Cubitt, A.B., Kallio, K., Gross, L.A., Tsien, R.K., & Hol, W.G. (1998) J. Mol. Biol. 282, 1043–1059.
R.Y., & Remington, S.J. (1996) Science 273, 1392– 115. Martinez-Oyanedel, J., Choe, H.W., Heinemann, U., &
1395. Saenger, W. (1991) J. Mol. Biol. 222, 335–352.
84. Hopf, M., Gohring, W., Ries, A., Timpl, R., & 116. Karplus, P.A., & Schulz, G.E. (1987) J. Mol. Biol. 195,
Hohenester, E. (2001) Nat. Struct. Biol. 8, 634–640. 701–729.
85. Novotny, J., Bruccoleri, R.E., & Newell, J. (1984) J. Mol. 117. Borgstahl, G.E., Rogers, P.H., & Arnone, A. (1994) J. Mol.
Biol. 177, 567–573. Biol. 236, 817–830.
86. Lasters, I., Wodak, S.J., Alard, P., & van Cutsem, E. 118. Goddette, D.W., Paech, C., Yang, S.S., Mielenz, J.R.,
(1988) Proc. Natl. Acad. Sci. U.S.A. 85, 3338–3342. Bystroff, C., Wilke, M.E., & Fletterick, R.J. (1992) J. Mol.
87. Musil, D., Bode, W., Huber, R., Laskowski, M., Jr., Lin, Biol. 228, 580–595.
T.Y., & Ardelt, W. (1991) J. Mol. Biol. 220, 739–755. 119. Wlodawer, A., Svensson, L.A., Sjolin, L., & Gilliland, G.L.
88. Arnold, E., & Rossmann, M.G. (1990) J. Mol. Biol. 211, (1988) Biochemistry 27, 2705–2717.
763–801. 120. Ponder, J.W., & Richards, F.M. (1987) J. Mol. Biol. 193,
89. Koronakis, V., Sharff, A., Koronakis, E., Luisi, B., & 775–791.
Hughes, C. (2000) Nature 405, 914–919. 121. Schrauber, H., Eisenhaber, F., & Argos, P. (1993) J. Mol.
90. Uhlin, U., & Eklund, H. (1996) J. Mol. Biol. 262, 358–369. Biol. 230, 592–612.
91. Baumann, U. (1994) J. Mol. Biol. 242, 244–251. 122. Lovell, S.C., Word, J.M., Richardson, J.S., & Richardson,
92. Steinbacher, S., Seckler, R., Miller, S., Steipe, B., Huber, D.C. (2000) Proteins: Struct., Funct., Genet. 40, 389–408.
R., & Reinemer, P. (1994) Science 265, 383–386. 123. Dunbrack, R.L., Jr. (2002) Curr. Opin. Struct. Biol. 12,
93. Wu, H., Maciejewski, M.W., Marintchev, A., Benashski, 431–440.
S.E., Mullen, G.P., & King, S.M. (2000) Nat. Struct. Biol. 124. Wilson, M.A., & Brunger, A.T. (2000) J. Mol. Biol. 301,
7, 575–579. 1237–1256.
94. Teeter, M.M., Roe, S.M., & Heo, N.H. (1993) J. Mol. Biol. 125. Roberts, J.D., & Caserio, M.C. (1977) Basic Principles of
230, 292–311. Organic Chemistry, 2nd ed., p 457, W.A. Benjamin,
95. Venkatachalam, C.M. (1968) Biopolymers 6, 1425– Menlo Park, CA.
1436. 126. Janin, J., & Wodak, S. (1978) J. Mol. Biol. 125, 357–386.
96. Wilmot, C.M., & Thornton, J.M. (1988) J. Mol. Biol. 203, 127. Russell, R.J., Ferguson, J.M., Hough, D.W., Danson,
221–232. M.J., & Taylor, G.L. (1997) Biochemistry 36, 9983–9994.
97. Fujinaga, M., Delbaere, L.T., Brayer, G.D., & James, 128. Kyte, J. (1995) Structure in Protein Chemistry, 1st ed., p
M.N. (1985) J. Mol. Biol. 184, 479–502. 212, Garland Publishing, Inc., New York.
98. James, M.N., & Sielecki, A.R. (1983) J. Mol. Biol. 163, 129. Wang, J.F., Hinck, A.P., Loh, S.N., & Markley, J.L. (1990)
299–361. Biochemistry 29, 4242–4253.
99. Hamada, K., Bethge, P.H., & Mathews, F.S. (1995) J. 130. Kossiakoff, A.A., & Shteyn, S. (1984) Nature 311,
Mol. Biol. 247, 947–962. 582–583.
100. Louie, G.V., & Brayer, G.D. (1990) J. Mol. Biol. 214, 131. Word, J.M., Lovell, S.C., LaBean, T.H., Taylor, H.C.,
527–555. Zalis, M.E., Presley, B.K., Richardson, J.S., &
101. Bragg, L., Kendrew, J.C., & Perutz, M.F. (1950) Proc. R. Richardson, D.C. (1999) J. Mol. Biol. 285, 1711–1733.
Soc. London, A 203, 321–357. 132. Kossiakoff, A., Shpungin, J., & Sintchak, M. (1990) Proc.
102. Taylor, H.S. (1941) Proc. Am. Philos. Soc. 85, 1–12. Natl. Acad. Sci. U.S.A. 87, 4468–4472.
103. Richardson, J.S. (1981) Adv. Protein Chem. 34, 167–339. 133. Lovell, S.C., Word, J.M., Richardson, J.S., & Richardson,
104. Pavone, V., Di Blasio, B., Santini, A., Benedetti, E., D.C. (1999) Proc. Natl. Acad. Sci. U.S.A. 96, 400–405.
Pedone, C., Toniolo, C., & Crisma, M. (1990) J. Mol. Biol. 134. Gellman, S.H. (1991) Biochemistry 30, 6633–6636.
214, 633–635. 135. Nemethy, G., Gibson, K.D., Palmer, K.A., Yoon, C.N.,
105. Dijkstra, B.W., Kalk, K.H., Hol, W.G., & Drenth, J. (1981) Paterlini, G., Zagari, A., Rumsey, S., & Scheraga, H.A.
J. Mol. Biol. 147, 97–123. (1992) J. Phys. Chem. 96, 6472–6484.
336 Atomic Details
136. Kelly, J.A., & Kuzin, A.P. (1995) J. Mol. Biol. 254, 166. Schwehm, J.M., Kristyanne, E.S., Biggers, C.C., & Stites,
223–236. W.E. (1998) Biochemistry 37, 6939–6948.
137. Sorensen, S.B., Raaschou-Nielsen, M., Mortensen, 167. Cornish, V.W., Kaplan, M.I., Veenstra, D.L., Kollman,
U.H., Remington, S.J., & Breddam, K. (1995) J. Am. P.A., & Schultz, P.G. (1994) Biochemistry 33,
Chem. Soc. 117, 5944–5950. 12022–12031.
138. Endrizzi, J.A., Breddam, K., & Remington, S.J. (1994) 168. Wistow, G., Turnell, B., Summers, L., Slingsby, C.,
Biochemistry 33, 11106–11120. Moss, D., Miller, L., Lindley, P., & Blundell, T. (1983) J.
139. Nelsen, S.F., Teasley, M.F., Bloodworth, A.J., & Eggelte, Mol. Biol. 170, 175–202.
H.J. (1985) J. Org. Chem. 50, 3299–3302. 169. Burley, S.K., & Petsko, G.A. (1985) Science 229, 23–28.
140. Morris, A.L., MacArthur, M.W., Hutchinson, E.G., & 170. Jorgensen, W.L., & Severance, D.L. (1990) J. Am. Chem.
Thornton, J.M. (1992) Proteins: Struct., Funct., Genet. Soc. 112, 4768–4774.
12, 345–364. 171. Pai, E.F., Kabsch, W., Krengel, U., Holmes, K.C., John,
141. Czapinska, H., Otlewski, J., Krzywda, S., Sheldrick, J., & Wittinghofer, A. (1989) Nature 341, 209–214.
G.M., & Jaskolski, M. (2000) J. Mol. Biol. 295, 1237–1249. 172. Hine, J., & Mookerjee, P.K. (1975) J. Org. Chem. 40,
142. Webba da Silva, M., Sham, S., Gorst, C.M., Calzolai, L., 292–298.
Brereton, P.S., Adams, M.W.W., & La Mar, G.N. (2001) 173. Eisenberg, D., & McLachlan, A.D. (1986) Nature 319,
Biochemistry 40, 12575–12583. 199–203.
143. Kuwajima, K., Ikeguchi, M., Sugawara, T., Hiraoka, Y., 174. Fauchere, J.L., & Pliska, V. (1983) Eur. J. Med. Chem.
& Sugai, S. (1990) Biochemistry 29, 8240–8249. Chim. Ther. 18, 369–375.
144. Lee, B., & Richards, F.M. (1971) J. Mol. Biol. 55, 379– 175. Nozaki, Y., & Tanford, C. (1971) J. Biol. Chem. 246,
400. 2211–2217.
145. Janin, J. (1979) Nature 277, 491–492. 176. McPhalen, C.A., & James, M.N. (1987) Biochemistry 26,
146. Miller, S., Janin, J., Lesk, A.M., & Chothia, C. (1987) J. 261–269.
Mol. Biol. 196, 641–656. 177. Hyde, C.C., Ahmed, S.A., Padlan, E.A., Miles, E.W., &
147. Chothia, C. (1974) Nature 248, 338–339. Davies, D.R. (1988) J. Biol. Chem. 263, 17857–17871.
148. Chothia, C. (1976) J. Mol. Biol. 105, 1–12. 178. Choinowski, T., Hauser, H., & Piontek, K. (2000)
149. Wolfenden, R.V., Cullis, P.M., & Southgate, C.C. (1979) Biochemistry 39, 1897–1902.
Science 206, 575–577. 179. Bondi, A. (1964) J. Phys. Chem. 68, 441–451.
150. Tavares, G.A., Beguin, P., & Alzari, P.M. (1997) J. Mol. 180. Richards, F.M. (1974) J. Mol. Biol. 82, 1–14.
Biol. 273, 701–713. 181. Chothia, C. (1975) Nature 254, 304–308.
151. van Raaij, M.J., Schoehn, G., Burda, M.R., & Miller, S. 182. Li, A.J., & Nussinov, R. (1998) Proteins: Struct., Funct.,
(2001) J. Mol. Biol. 314, 1137–1146. Genet. 32, 111–127.
152. Eriksson, A.E., Baase, W.A., Zhang, X.J., Heinz, D.W., 183. Tsai, J., Taylor, R., Chothia, C., & Gerstein, M. (1999) J.
Blaber, M., Baldwin, E.P., & Matthews, B.W. (1992) Mol. Biol. 290, 253–266.
Science 255, 178–183. 184. Chalikian, T.V., Totrov, M., Abagyan, R., & Breslauer,
153. Otzen, D.E., Rheinnecker, M., & Fersht, A.R. (1995) K.J. (1996) J. Mol. Biol. 260, 588–603.
Biochemistry 34, 13051–13058. 185. Jung, K., Jung, H., Colacurcio, P., & Kaback, H.R. (1995)
154. Pace, C.N. (1992) J. Mol. Biol. 226, 29–35. Biochemistry 34, 1030–1039.
155. Pace, C.N. (2001) Biochemistry 40, 310–313. 186. Klapper, M.H. (1971) Biochim. Biophys. Acta 229,
156. Holder, J.B., Bennett, A.F., Chen, J., Spencer, D.S., 557–566.
Byrne, M.P., & Stites, W.E. (2001) Biochemistry 40, 187. Priev, A., Almagor, A., Yedgar, S., & Gavish, B. (1996)
13998–14003. Biochemistry 35, 2061–2066.
157. Mendel, D., Ellman, J.A., Chang, Z., Veenstra, D.L., 188. Cohn, E.J., McMeekin, T.L., Edsall, J.T., & Blanchard,
Kollman, P.A., & Schultz, P.G. (1992) Science 256, M.H. (1934) J. Am. Chem. Soc. 56, 784–794.
1798–1802. 189. Cohn, E.J., & Edsall, J.T. (1943) Proteins, Amino Acids
158. Shortle, D., Stites, W.E., & Meeker, A.K. (1990) and Peptides as Ions and Dipolar Ions, Reinhold, New
Biochemistry 29, 8033–8041. York.
159. Kellis, J.T., Jr., Nyberg, K., & Fersht, A.R. (1989) 190. Traube, J. (1899) Samml. Chem. Chem. Tech. Vortr. 4,
Biochemistry 28, 4914–4922. 255–331.
160. Kellis, J.T., Jr., Nyberg, K., Sali, D., & Fersht, A.R. (1988) 191. Takusagawa, F., & Kamitori, S. (1996) J. Am. Chem. Soc.
Nature 333, 784–786. 118, 8945–8946.
161. Jackson, S.E., Moracci, M., elMasry, N., Johnson, C.M., 192. Taylor, W.R. (2000) Nature 406, 916–919.
& Fersht, A.R. (1993) Biochemistry 32, 11259–11269. 193. Saper, M.A., Bjorkman, P.J., & Wiley, D.C. (1991) J. Mol.
162. Southall, N.T., Dill, K.A., & Haymet, A.D.J. (2002) J. Biol. 219, 277–319.
Phys. Chem., B 106, 521–533. 194. Stewart-Jones, G.B., McMichael, A.J., Bell, J.I., Stuart,
163. Dao-pin, S., Anderson, D.E., Baase, W.A., Dahlquist, D.I., & Jones, E.Y. (2003) Nat. Immunol. 4, 657–663.
F.W., & Matthews, B.W. (1991) Biochemistry 30, 195. Chothia, C., Levitt, M., & Richardson, D. (1977) Proc.
11521–11529. Natl. Acad. Sci. U.S.A. 74, 4130–4134.
164. De Vos, S., Backmann, J., Prevost, M., Steyaert, J., & 196. Klug, A., Crick, F.H.C., & Wyckoff, H.W. (1958) Acta
Loris, R. (2001) Biochemistry 40, 10140–10149. Crystallogr. 11, 199–213.
165. Takano, K., Yamagata, Y., & Yutani, K. (2001) 197. Chothia, C., Levitt, M., & Richardson, D. (1981) J. Mol.
Biochemistry 40, 4853–4858. Biol. 145, 215–250.
References 337
198. Betts, L., Xiang, S., Short, S.A., Wolfenden, R., & Carter, 225. Monera, O.D., Kay, C.M., & Hodges, R.S. (1994)
C.W., Jr. (1994) J. Mol. Biol. 235, 635–656. Biochemistry 33, 3862–3871.
199. Chothia, C., & Finkelstein, A.V. (1990) Annu. Rev. 226. Varughese, K.I., Skinner, M.M., Whiteley, J.M.,
Biochem. 59, 1007–1039. Matthews, D.A., & Xuong, N.H. (1992) Proc. Natl. Acad.
200. Doolittle, R.F., Goldbaum, D.M., & Doolittle, L.R. Sci. U.S.A. 89, 6080–6084.
(1978) J. Mol. Biol. 120, 311–325. 227. Nordlund, P., & Eklund, H. (1993) J. Mol. Biol. 232,
201. Crick, F.H.C. (1953) Acta Crystallogr. 6, 689–697. 123–164.
202. Whitby, F.G., Kent, H., Stewart, F., Stewart, M., Xie, X., 228. Li, J.D., Carroll, J., & Ellar, D.J. (1991) Nature 353,
Hatch, V., Cohen, C., & Phillips, G.N., Jr. (1992) J. Mol. 815–821.
Biol. 227, 441–452. 229. Li, H., Dunn, J.J., Luft, B.J., & Lawson, C.L. (1997) Proc.
203. Brown, J.H., Kim, K.H., Jun, G., Greenfield, N.J., Natl. Acad. Sci. U.S.A. 94, 3584–3589.
Dominguez, R., Volkmann, N., Hitchcock-DeGregori, 230. Huang, X., Nakagawa, T., Tamura, A., Link, K., Koide,
S.E., & Cohen, C. (2001) Proc. Natl. Acad. Sci. U. S. A. 98, A., & Koide, S. (2001) J. Mol. Biol. 308, 367–375.
8496–8501. 231. Spadaccini, R., Crescenzi, O., Tancredi, T., De
204. O’Shea, E.K., Klemm, J.D., Kim, P.S., & Alber, T. (1991) Casamassimi, N., Saviano, G., Scognamiglio, R., Di
Science 254, 539–544. Donato, A., & Temussi, P.A. (2001) J. Mol. Biol. 305,
205. Harbury, P.B., Zhang, T., Kim, P.S., & Alber, T. (1993) 505–514.
Science 262, 1401–1407. 232. Koide, S., Huang, X., Link, K., Koide, A., Bu, Z., &
206. Phillips, G.N., Jr. (1992) Proteins: Struct., Funct., Genet. Engelman, D.M. (2000) Nature 403, 456–460.
14, 425–429. 233. Cohen, F.E., Sternberg, M.J.E., & Taylor, W.R. (1981) J.
207. Chen, L., Glover, J.N., Hogan, P.G., Rao, A., & Harrison, Mol. Biol. 148, 253–272.
S.C. (1998) Nature 392, 42–48. 234. Liao, D.I., Kapadia, G., Reddy, P., Saier, M.H., Jr.,
208. Cusack, S., Berthet-Colominas, C., Hartlein, M., Reizer, J., & Herzberg, O. (1991) Biochemistry 30,
Nassar, N., & Leberman, R. (1990) Nature 347, 249– 9583–9594.
255. 235. Chothia, C., & Janin, J. (1981) Proc. Natl. Acad. Sci.
209. Harbury, P.B., Kim, P.S., & Alber, T. (1994) Nature 371, U.S.A. 78, 4146–4150.
80–83. 236. Matthews, D.A., Appelt, K., & Oatley, S.J. (1989) J. Mol.
210. Shu, W., Liu, J., Ji, H., & Lu, M. (2000) J. Mol. Biol. 299, Biol. 205, 449–454.
1101–1112. 237. Hrabal, R., Chen, Z., James, S., Bennett, H.P., & Ni, F.
211. Lovejoy, B., Choe, S., Cascio, D., McRorie, D.K., (1996) Nat. Struct. Biol. 3, 747–752.
DeGrado, W.F., & Eisenberg, D. (1993) Science 259, 238. Chothia, C., & Janin, J. (1982) Biochemistry 21,
1288–1293. 3955–3965.
212. Pascual, J., Pfuhl, M., Walther, D., Saraste, M., & Nilges, 239. Xu, Z., Bernlohr, D.A., & Banaszak, L.J. (1992)
M. (1997) J. Mol. Biol. 273, 740–751. Biochemistry 31, 3484–3492.
213. Tarbouriech, N., Curran, J., Ruigrok, R.W., & 240. Jones, T.A., Bergfors, T., Sedzik, J., & Unge, T. (1988)
Burmeister, W.P. (2000) Nat. Struct. Biol. 7, 777–781. EMBO J. 7, 1597–1604.
214. Bowman, G.D., Nodelman, I.M., Levy, O., Lin, S.L., 241. Cowan, S.W., Newcomer, M.E., & Jones, T.A. (1990)
Tian, P., Zamb, T.J., Udem, S.A., Venkataraghavan, B., Proteins: Struct., Funct., Genet. 8, 44–61.
& Schutt, C.E. (2000) J. Mol. Biol. 304, 861–871. 242. Yarbrough, D., Wachter, R.M., Kallio, K., Matz, M.V., &
215. Ultsch, M.H., Somers, W., Kossiakoff, A.A., & de Vos, Remington, S.J. (2001) Proc. Natl. Acad. Sci. U.S.A. 98,
A.M. (1994) J. Mol. Biol. 236, 286–299. 462–467.
216. Vlassi, M., Steif, C., Weber, P., Tsernoglou, D., Wilson, 243. Janin, J., & Chothia, C. (1980) J. Mol. Biol. 143, 95–
K.S., Hinz, H.J., & Kokkinidis, M. (1994) Nat. Struct. 128.
Biol. 1, 706–716. 244. Rabijns, A., De Bondt, H.L., & De Ranter, C. (1997) Nat.
217. Malashkevich, V.N., Kammerer, R.A., Efimov, V.P., Struct. Biol. 4, 357–360.
Schulthess, T., & Engel, J. (1996) Science 274, 761–765. 245. Beamer, L.J., Carroll, S.F., & Eisenberg, D. (1997)
218. Weaver, T.M., Levitt, D.G., Donnelly, M.I., Stevens, P.P., Science 276, 1861–1864.
& Banaszak, L.J. (1995) Nat. Struct. Biol. 2, 654–662. 246. Uhlin, U., & Eklund, H. (1994) Nature 370, 533–539.
219. Calladine, C.R., Sharff, A., & Luisi, B. (2001) J. Mol. Biol. 247. Van Petegem, F., Contreras, H., Contreras, R., & Van
305, 603–618. Beeumen, J. (2001) J. Mol. Biol. 312, 157–165.
220. Milburn, M.V., Prive, G.G., Milligan, D.L., Scott, W.G., 248. Gassner, N.C., Baase, W.A., & Matthews, B.W. (1996)
Yeh, J., Jancarik, J., Koshland, D.E., Jr., & Kim, S.H. Proc. Natl. Acad. Sci. U.S.A. 93, 12155–12158.
(1991) Science 254, 1342–1347. 249. Liu, R., Baase, W.A., & Matthews, B.W. (2000) J. Mol.
221. Stetefeld, J., Jenny, M., Schulthess, T., Landwehr, R., Biol. 295, 127–145.
Engel, J., & Kammerer, R.A. (2000) Nat. Struct. Biol. 7, 250. Baldwin, E., Xu, J., Hajiseyedjavadi, O., Baase, W.A., &
772–776. Matthews, B.W. (1996) J. Mol. Biol. 259, 542–559.
222. Bailey, K., Astbury, W.T., & Ruddall, K.M. (1943) Nature 251. Hamilton, J.A., Steinrauf, L.K., Braden, B.C., Liepnieks,
151, 716–717. J., Benson, M.D., Holmgren, G., Sandgren, O., & Steen,
223. Heimburg, T., Schunemann, J., Weber, K., & Geisler, N. L. (1993) J. Biol. Chem. 268, 2416–2424.
(1999) Biochemistry 38, 12727–12734. 252. Lim, W.A., Farruggio, D.C., & Sauer, R.T. (1992)
224. Hecht, M.H., Richardson, J.S., Richardson, D.C., & Biochemistry 31, 4324–4333.
Ogden, R.C. (1990) Science 249, 884–891. 253. Martensson, L.G., Jonsson, B.H., Andersson, M.,
338 Atomic Details
Kihlgren, A., Bergenhem, N., & Carlsson, U. (1992) 283. van Aalten, D.M., Synstad, B., Brurberg, M.B., Hough,
Biochim. Biophys. Acta 1118, 179–186. E., Riise, B.W., Eijsink, V.G., & Wierenga, R.K. (2000)
254. Collyer, C.A., Guss, J.M., Sugimura, Y., Yoshizaki, F., & Proc. Natl. Acad. Sci. U.S.A. 97, 5842–5847.
Freeman, H.C. (1990) J. Mol. Biol. 211, 617–632. 284. Denisov, V.P., Peters, J., Horlein, H.D., & Halle, B.
255. Connolly, M.L. (1983) Science 221, 709–713. (1996) Nat. Struct. Biol. 3, 505–509.
256. Xu, J., Baase, W.A., Baldwin, E., & Matthews, B.W. 285. Leslie, A.G., Moody, P.C., & Shaw, W.V. (1988) Proc.
(1998) Protein Sci. 7, 158–177. Natl. Acad. Sci. U.S.A. 85, 4133–4137.
257. Bruns, C.M., & Karplus, P.A. (1995) J. Mol. Biol. 247, 286. Housset, D., Habersetzer-Rochat, C., Astier, J.P., &
125–145. Fontecilla-Camps, J.C. (1994) J. Mol. Biol. 238, 88–
258. Morton, A., Baase, W.A., & Matthews, B.W. (1995) 103.
Biochemistry 34, 8564–8575. 287. Fontecilla-Camps, J.C., Habersetzer-Rochat, C., &
259. Buckle, A.M., Cramer, P., & Fersht, A.R. (1996) Rochat, H. (1988) Proc. Natl. Acad. Sci. U.S.A. 85,
Biochemistry 35, 4298–4305. 7443–7447.
260. Varadarajan, R., & Richards, F.M. (1992) Biochemistry 288. Yu, B., Blaber, M., Gronenborn, A.M., Clore, G.M., &
31, 12315–12327. Caspar, D.L. (1999) Proc. Natl. Acad. Sci. U.S.A. 96,
261. McRee, D.E., Redford, S.M., Getzoff, E.D., Lepock, J.R., 103–108.
Hallewell, R.A., & Tainer, J.A. (1990) J. Biol. Chem. 265, 289. Ernst, J.A., Clubb, R.T., Zhou, H.X., Gronenborn, A.M.,
14234–14241. & Clore, G.M. (1995) Science 267, 1813–1817.
262. Lim, W.A., & Sauer, R.T. (1989) Nature 339, 31–36. 290. Schrag, J.D., & Cygler, M. (1993) J. Mol. Biol. 230,
263. Behe, M.J., Lattman, E.E., & Rose, G.D. (1991) Proc. 575–591.
Natl. Acad. Sci. U.S.A. 88, 4195–4199. 291. Nakasako, M. (1999) J. Mol. Biol. 289, 547–564.
264. Hanukoglu, I., & Fuchs, E. (1983) Cell 33, 915–924. 292. Teeter, M.M. (1984) Proc. Natl. Acad. Sci. U.S.A. 81,
265. Contreras-Martel, C., Martinez-Oyanedel, J., Bunster, 6014–6018.
M., Legrand, P., Piras, C., Vernede, X., & Fontecilla- 293. Kielkopf, C.L., Ding, S., Kuhn, P., & Rees, D.C. (2000) J.
Camps, J.C. (2001) Acta Crystallogr., D 57, 52–60. Mol. Biol. 296, 787–801.
266. Charron, C., Kadri, A., Robert, M.C., Giege, R., & Lorber, 294. Watenpaugh, K.D., Margulis, T.N., Sieker, L.C., &
B. (2002) Acta Crystallogr., D 58, 2060–2065. Jensen, L.H. (1978) J. Mol. Biol. 122, 175–190.
267. Bjorkman, A.J., Binnie, R.A., Zhang, H., Cole, L.B., 295. Thanki, N., Thornton, J.M., & Goodfellow, J.M. (1988) J.
Hermodson, M.A., & Mowbray, S.L. (1994) J. Biol. Mol. Biol. 202, 637–657.
Chem. 269, 30206–30211. 296. Jiang, J.S., & Brunger, A.T. (1994) J. Mol. Biol. 243,
268. Matthews, B.W. (1977) in The Proteins, 3rd Ed. 100–115.
(Neurath, H., & Hill, R.L., Eds.) Vol. III, pp 404–590, 297. Svergun, D.I., Richard, S., Koch, M.H., Sayers, Z.,
Academic Press, New York. Kuprin, S., & Zaccai, G. (1998) Proc. Natl. Acad. Sci.
269. Blake, C.C., Pulford, W.C., & Artymiuk, P.J. (1983) J. U.S.A. 95, 2267–2272.
Mol. Biol. 167, 693–723. 298. Denisov, V.P., & Halle, B. (1994) J. Am. Chem. Soc. 116,
270. Noguchi, S., Satow, Y., Uchida, T., Sasaki, C., & 10324–10325.
Matsuzaki, T. (1995) Biochemistry 34, 15583–15591. 299. Scanlon, W.J., & Eisenberg, D. (1975) J. Mol. Biol. 98,
271. Levitt, M., & Park, B.H. (1993) Structure 1, 223–226. 485–502.
272. Tanaka, N., Arai, J., Inokuchi, N., Koyama, T., Ohgi, K., 300. Fisher, H.F. (1965) Biochim. Biophys. Acta 109,
Irie, M., & Nakamura, K.T. (2000) J. Mol. Biol. 298, 544–550.
859–873. 301. Grant, E.H., Mitton, B.G., South, G.P., & Sheppard, R.J.
273. Katti, S.K., LeMaster, D.M., & Eklund, H. (1990) J. Mol. (1974) Biochem. J. 139, 375–380.
Biol. 212, 167–184. 302. McMeekin, T.L., Groves, M.L., & Hipp, N.J. (1954) J. Am.
274. Knox, J.R., & Moews, P.C. (1991) J. Mol. Biol. 220, Chem. Soc. 76, 407–413.
435–455. 303. Bull, H.B., & Breese, K. (1968) Arch. Biochem. Biophys.
275. Malin, R., Zielenkiewicz, P., & Saenger, W. (1991) J. Biol. 128, 488–496.
Chem. 266, 4848–4852. 304. Arakawa, T., & Timasheff, S.N. (1982) Biochemistry 21,
276. Carrell, C.J., Schlarb, B.G., Bendall, D.S., Howe, C.J., 6536–6544.
Cramer, W.A., & Smith, J.L. (1999) Biochemistry 38, 305. Lee, J.C., & Timasheff, S.N. (1981) J. Biol. Chem. 256,
9590–9599. 7193–7201.
277. Lu, G., Lindqvist, Y., Schneider, G., Dwivedi, U., & 306. Mc Clure, R.J., & Craven, B.M. (1974) J. Mol. Biol. 83,
Campbell, W. (1995) J. Mol. Biol. 248, 931–948. 551–555.
278. Otting, G., Liepinsh, E., & Wuthrich, K. (1991) Science 307. Kuntz, I.D., Brassfield, T.S., Law, G.D., & Purcell, G.V.
254, 974–980. (1969) Science 163, 1329–1331.
279. Denisov, V.P., & Halle, B. (1995) J. Mol. Biol. 245, 308. Tanford, C. (1961) Physical Chemistry of
682–697. Macromolecules, John Wiley, New York.
280. Furey, W., Wang, B.C., Yoo, C.S., & Sax, M. (1983) J. Mol. 309. Kumosinski, T.F., & Pessen, H. (1982) Arch. Biochem.
Biol. 167, 661–692. Biophys. 219, 89–100.
281. Ramaswamy, S., Eklund, H., & Plapp, B.V. (1994) 310. Wang, J.H. (1954) J. Am. Chem. Soc. 76, 4755–4763.
Biochemistry 33, 5230–5237. 311. Oncley, J.L. (1943) in Proteins, Amino Acids, and
282. Benning, M.M., Smith, A.F., Wells, M.A., & Holden, Peptides (Cohn, E.J., & Edsall, J.T., Eds.) pp 543–568,
H.M. (1992) J. Mol. Biol. 228, 208–219. Reinhold, New York.
References 339
312. Usha, M.G., & Wittebort, R.J. (1989) J. Mol. Biol. 208, 343. Zheng, Y.J., & Ornstein, R.L. (1996) J. Am. Chem. Soc.
669–678. 118, 11237–11243.
313. Bull, H.B., & Breese, K. (1968) Arch. Biochem. Biophys. 344. Simonson, T., & Brooks, C.L. (1996) J. Am. Chem. Soc.
128, 497–502. 118, 8452–8458.
314. Miller, S., Lesk, A.M., Janin, J., & Chothia, C. (1987) 345. Parsegian, A. (1969) Nature 221, 844–846.
Nature 328, 834–836. 346. Tissot, A.C., Vuilleumier, S., & Fersht, A.R. (1996)
315. Vossen, K.M., Wolz, R., Daugherty, M.A., & Fried, M.G. Biochemistry 35, 6786–6794.
(1997) Biochemistry 36, 11640–11647. 347. Hendsch, Z.S., & Tidor, B. (1994) Protein Sci. 3, 211–226.
316. Dzingeleski, G.D., & Wolfenden, R. (1993) Biochemistry 348. Waldburger, C.D., Jonsson, T., & Sauer, R.T. (1996)
32, 9143–9147. Proc. Natl. Acad. Sci. U.S.A. 93, 2629–2634.
317. Colombo, M.F., Rau, D.C., & Parsegian, V.A. (1992) 349. Baker, E.N. (1988) J. Mol. Biol. 203, 1071–1095.
Science 256, 655–659. 350. Baud, F., & Karlin, S. (1999) Proc. Natl. Acad. Sci. U.S.A.
318. Rand, R.P., Fuller, N.L., Butko, P., Francis, G., & 96, 12494–12499.
Nicholls, P. (1993) Biochemistry 32, 5925–5929. 351. Johnson, M.S., & Overington, J.P. (1993) J. Mol. Biol.
319. Sun, D.P., Sauer, U., Nicholson, H., & Matthews, B.W. 233, 716–738.
(1991) Biochemistry 30, 7142–7153. 352. Schirmer, T., Huber, R., Schneider, M., Bode, W.,
320. Sun, D.P., Soderlind, E., Baase, W.A., Wozniak, J.A., Miller, M., & Hackert, M.L. (1986) J. Mol. Biol. 188,
Sauer, U., & Matthews, B.W. (1991) J. Mol. Biol. 221, 651–676.
873–887. 353. Warshel, A. (1987) Nature 330, 15–16.
321. Loladze, V.V., Ibarra-Molero, B., Sanchez-Ruiz, J.M., & 354. Ippolito, J.A., Alexander, R.S., & Christianson, D.W.
Makhatadze, G.I. (1999) Biochemistry 38, 16419– (1990) J. Mol. Biol. 215, 457–471.
16423. 355. Yao, N., Trakhanov, S., & Quiocho, F.A. (1994)
322. Spector, S., Wang, M., Carp, S.A., Robblee, J., Hendsch, Biochemistry 33, 4769–4779.
Z.S., Fairman, R., Tidor, B., & Raleigh, D.P. (2000) 356. Murray-Rust, J., Leiper, J., McAlister, M., Phelan, J.,
Biochemistry 39, 872–879. Tilley, S., Santa Maria, J., Vallance, P., & McDonald, N.
323. Escobar, L., Root, M.J., & MacKinnon, R. (1993) (2001) Nat. Struct. Biol. 8, 679–683.
Biochemistry 32, 6982–6987. 357. Steiner, T., & Koellner, G. (2001) J. Mol. Biol. 305,
324. Imoto, K., Busch, C., Sakmann, B., Mishina, M., Konno, 535–557.
T., Nakai, J., Bujo, H., Mori, Y., Fukuda, K., & Numa, S. 358. Burley, S.K., & Petsko, G.A. (1986) FEBS Lett. 203,
(1988) Nature 335, 645–648. 139–143.
325. Stocker, M., & Miller, C. (1994) Proc. Natl. Acad. Sci. 359. Gallivan, J.P., & Dougherty, D.A. (1999) Proc. Natl.
U.S.A. 91, 9509–9513. Acad. Sci. U.S.A. 96, 9459–9464.
326. Rodgers, K.K., Pochapsky, T.C., & Sligar, S.G. (1988) 360. Mitchell, J.B., Nandi, C.L., McDonald, I.K., Thornton,
Science 240, 1657–1659. J.M., & Price, S.L. (1994) J. Mol. Biol. 239, 315–331.
327. Stayton, P.S., & Sligar, S.G. (1990) Biochemistry 29, 361. Knight, S., Andersson, I., & Branden, C.I. (1990) J. Mol.
7381–7386. Biol. 215, 113–160.
328. Fujimori, K., Sorenson, M., Herzberg, O., Moult, J., & 362. McDonald, I.K., & Thornton, J.M. (1994) J. Mol. Biol.
Reinach, F.C. (1990) Nature 345, 182–184. 238, 777–793.
329. Russell, A.J., Thomas, P.G., & Fersht, A.R. (1987) J. Mol. 363. Chattopadhyaya, R., Meador, W.E., Means, A.R., &
Biol. 193, 803–813. Quiocho, F.A. (1992) J. Mol. Biol. 228, 1177–1192.
330. Doolittle, R.F. (1981) Science 214, 149–159. 364. Kyte, J., & Doolittle, R.F. (1982) J. Mol. Biol. 157,
331. Tanford, C. (1962) Adv. Protein Chem. 17, 69–165. 105–132.
332. Crammer, J.L., & Neuberger, A. (1943) Biochem. J. 37, 365. Dardel, F., Davis, A.L., Laue, E.D., & Perham, R.N.
302–310. (1993) J. Mol. Biol. 229, 1037–1048.
333. Lenstra, J.A., Bolscher, B.G., Beintema, J.J., & Kaptein, 366. Ludwig, M.L., Metzger, A.L., Pattridge, K.A., & Stallings,
R. (1979) Eur. J. Biochem. 98, 385–397. W.C. (1991) J. Mol. Biol. 219, 335–358.
334. Matthew, J.B., & Richards, F.M. (1982) Biochemistry 21, 367. Bode, W., Walter, J., Huber, R., Wenzel, H.R., &
4989–4999. Tschesche, H. (1984) Eur. J. Biochem. 144, 185–
335. Giletto, A., & Pace, C.N. (1999) Biochemistry 38, 190.
13379–13384. 368. Lah, M.S., Palfey, B.A., Schreuder, H.A., & Ludwig, M.L.
336. Aberg, A., Nordlund, P., & Eklund, H. (1993) Nature 361, (1994) Biochemistry 33, 1555–1564.
276–278. 369. Khan, A.R., Parrish, J.C., Fraser, M.E., Smith, W.W.,
337. Hecht, H.J., Kalisz, H.M., Hendle, J., Schmid, R.D., & Bartlett, P.A., & James, M.N. (1998) Biochemistry 37,
Schomburg, D. (1993) J. Mol. Biol. 229, 153–172. 16839–16845.
338. Gibbs, M.R., Moody, P.C., & Leslie, A.G. (1990) 370. Jacobson, B.L., & Quiocho, F.A. (1988) J. Mol. Biol. 204,
Biochemistry 29, 11261–11265. 783–787.
339. Leslie, A.G. (1990) J. Mol. Biol. 213, 167–186. 371. Tanford, C. (1954) J. Am. Chem. Soc. 76, 945–946.
340. Cheng, X.D., & Schoenborn, B.P. (1991) J. Mol. Biol. 372. Phillips, S.E. (1980) J. Mol. Biol. 142, 531–554.
220, 381–399. 373. Myers, J.K., & Pace, C.N. (1996) Biophys. J. 71,
341. Tainer, J.A., Getzoff, E.D., Beem, K.M., Richardson, J.S., 2033–2039.
& Richardson, D.C. (1982) J. Mol. Biol. 160, 181–217. 374. Carter, P.J., Winter, G., Wilkinson, A.J., & Fersht, A.R.
342. James, M.N., & Sielecki, A.R. (1986) Nature 319, 33–38. (1984) Cell 38, 835–840.
340 Atomic Details
375. Serrano, L., Horovitz, A., Avron, B., Bycroft, M., & 403. Slupphaug, G., Mol, C.D., Kavli, B., Arvai, A.S., Krokan,
Fersht, A.R. (1990) Biochemistry 29, 9343–9352. H.E., & Tainer, J.A. (1996) Nature 384, 87–92.
376. Heinz, D.W., Baase, W.A., & Matthews, B.W. (1992) 404. Horton, J.R., Nastri, H.G., Riggs, P.D., & Cheng, X.
Proc. Natl. Acad. Sci. U.S.A. 89, 3751–3755. (1998) J. Mol. Biol. 284, 1491–1504.
377. Horovitz, A., Serrano, L., Avron, B., Bycroft, M., & 405. Shakked, Z., Guzikevich-Guerstein, G., Frolow, F.,
Fersht, A.R. (1990) J. Mol. Biol. 216, 1031–1044. Rabinovich, D., Joachimiak, A., & Sigler, P.B. (1994)
378. Strop, P., & Mayo, S.L. (2000) Biochemistry 39, Nature 368, 469–473.
1251–1255. 406. Nadassy, K., Wodak, S.J., & Janin, J. (1999) Biochemistry
379. Myers, J.K., & Oas, T.G. (1999) Biochemistry 38, 38, 1999–2017.
6761–6768. 407. Huai, Q., Colandene, J.D., Topal, M.D., & Ke, H. (2001)
380. Schreiber, G., & Fersht, A.R. (1995) J. Mol. Biol. 248, Nat. Struct. Biol. 8, 665–669.
478–486. 408. Pelletier, H., Sawaya, M.R., Kumar, A., Wilson, S.H., &
381. Albeck, S., Unger, R., & Schreiber, G. (2000) J. Mol. Biol. Kraut, J. (1994) Science 264, 1891–1903.
298, 503–520. 409. Mondragon, A., & Harrison, S.C. (1991) J. Mol. Biol. 219,
382. Tronrud, D.E., Holden, H.M., & Matthews, B.W. (1987) 321–334.
Science 235, 571–574. 410. Albright, R.A., & Matthews, B.W. (1998) J. Mol. Biol. 280,
383. Morgan, B.P., Scholtz, J.M., Ballinger, M.D., Zipkin, 137–151.
I.D., & Bartlett, P.A. (1991) J. Am. Chem. Soc. 113, 411. Redinbo, M.R., Stewart, L., Kuhn, P., Champoux, J.J., &
297–307. Hol, W.G. (1998) Science 279, 1504–1513.
384. Koh, J.T., Cornish, V.W., & Schultz, P.G. (1997) 412. Weston, S.A., Lahm, A., & Suck, D. (1992) J. Mol. Biol.
Biochemistry 36, 11314–11322. 226, 1237–1256.
385. Chapman, E., Thorson, J.S., & Schultz, P.G. (1997) J. 413. Kostrewa, D., & Winkler, F.K. (1995) Biochemistry 34,
Am. Chem. Soc. 119, 7151–7152. 683–696.
386. Thorson, J.S., Chapman, E., Murphy, E.C., Schultz, 414. Jordan, S.R., & Pabo, C.O. (1988) Science 242, 893–899.
P.G., & Judice, J.K. (1995) J. Am. Chem. Soc. 117, 415. Beamer, L.J., & Pabo, C.O. (1992) J. Mol. Biol. 227,
1157–1158. 177–196.
387. Hennig, M., & Geierstanger, B.H. (1999) J. Am. Chem. 416. Arents, G., & Moudrianakis, E.N. (1993) Proc. Natl.
Soc. 121, 5123–5126. Acad. Sci. U.S.A. 90, 10489–10493.
388. Kurokawa, H., Mikami, B., & Hirose, M. (1995) J. Mol. 417. Glover, J.N., & Harrison, S.C. (1995) Nature 373,
Biol. 254, 196–207. 257–261.
389. Dewan, J.C., Mikami, B., Hirose, M., & Sacchettini, J.C. 418. Mo, Y., Vaessen, B., Johnston, K., & Marmorstein, R.
(1993) Biochemistry 32, 11963–11968. (2000) Nat. Struct. Biol. 7, 292–297.
390. He, Q.Y., Mason, A.B., Tam, B.M., MacGillivray, R.T., & 419. Mo, Y., Ho, W., Johnston, K., & Marmorstein, R. (2001)
Woodworth, R.C. (1999) Biochemistry 38, 9704–9711. J. Mol. Biol. 314, 495–506.
391. MacGillivray, R.T., Bewley, M.C., Smith, C.A., He, Q.Y., 420. Somers, W.S., & Phillips, S.E. (1992) Nature 359,
Mason, A.B., Woodworth, R.C., & Baker, E.N. (2000) 387–393.
Biochemistry 39, 1211–1216. 421. Kamada, K., Horiuchi, T., Ohsumi, K., Shimamoto, N.,
392. Nurizzo, D., Baker, H.M., He, Q.Y., MacGillivray, R.T., & Morikawa, K. (1996) Nature 383, 598–603.
Mason, A.B., Woodworth, R.C., & Baker, E.N. (2001) 422. Raumann, B.E., Rould, M.A., Pabo, C.O., & Sauer, R.T.
Biochemistry 40, 1616–1623. (1994) Nature 367, 754–757.
393. Loh, S.N., & Markley, J.L. (1994) Biochemistry 33, 423. Deibert, M., Grazulis, S., Sasnauskas, G., Siksnys, V., &
1029–1036. Huber, R. (2000) Nat. Struct. Biol. 7, 792–799.
394. Bowers, P.M., & Klevit, R.E. (1996) Nat. Struct. Biol. 3, 424. Pavletich, N.P., & Pabo, C.O. (1991) Science 252,
522–531. 809–817.
395. Loh, S.N., & Markley, J.L. (1993) in Techniques in 425. Otwinowski, Z., Schevitz, R.W., Zhang, R.G., Lawson,
Protein Chemistry IV (Angeletti, R. H., Ed.) pp 517–524, C.L., Joachimiak, A., Marmorstein, R.Q., Luisi, B.F., &
Academic Press, San Diego, CA. Sigler, P.B. (1988) Nature 335, 321–329.
396. Bowers, P.M., & Klevit, R.E. (2000) J. Am. Chem. Soc. 426. Batchelor, A.H., Piper, D.E., de la Brousse, F.C.,
122, 1030–1033. McKnight, S.L., & Wolberger, C. (1998) Science 279,
397. Khare, D., Alexander, P., & Orban, J. (1999) 1037–1041.
Biochemistry 38, 3918–3925. 427. Muller, C.W., Rey, F.A., Sodeoka, M., Verdine, G.L., &
398. Dauter, Z., Wilson, K.S., Sieker, L.C., Meyer, J., & Harrison, S.C. (1995) Nature 373, 311–317.
Moulis, J.M. (1997) Biochemistry 36, 16065–16073. 428. Ghosh, G., van Duyne, G., Ghosh, S., & Sigler, P.B.
399. Kuhn, P., Knapp, M., Soltis, S.M., Ganshaw, G., Thoene, (1995) Nature 373, 303–310.
M., & Bott, R. (1998) Biochemistry 37, 13446–13452. 429. Hegde, R.S., Grossman, S.R., Laimins, L.A., & Sigler,
400. Morales, R., Chron, M.H., Hudry-Clergeon, G., Petillot, P.B. (1992) Nature 359, 505–512.
Y., Norager, S., Medina, M., & Frey, M. (1999) 430. Clarke, N.D., Beamer, L.J., Goldberg, H.R., Berkower,
Biochemistry 38, 15764–15773. C., & Pabo, C.O. (1991) Science 254, 267–270.
401. Schwartz, B., Drueckhammer, D.G., Usher, K.C., & 431. Kissinger, C.R., Liu, B.S., Martin-Blanco, E., Kornberg,
Remington, S.J. (1995) Biochemistry 34, 15459–15466. T.B., & Pabo, C.O. (1990) Cell 63, 579–590.
402. Klimasauskas, S., Kumar, S., Roberts, R.J., & Cheng, X. 432. Fairall, L., Schwabe, J.W., Chapman, L., Finch, J.T., &
(1994) Cell 76, 357–369. Rhodes, D. (1993) Nature 366, 483–487.
References 341
433. Wuttke, D.S., Foster, M.P., Case, D.A., Gottesfeld, J.M., B.A., & Jen-Jacobson, L. (1993) Proc. Natl. Acad. Sci.
& Wright, P.E. (1997) J. Mol. Biol. 273, 183–206. U.S.A. 90, 7548–7552.
434. Kwon, H.J., Bennik, M.H., Demple, B., & Ellenberger, T. 463. King, D.A., Zhang, L., Guarente, L., & Marmorstein, R.
(2000) Nat. Struct. Biol. 7, 424–430. (1999) Nat. Struct. Biol. 6, 64–71.
435. Keller, W., Konig, P., & Richmond, T.J. (1995) J. Mol. 464. Lima, C.D., Wang, J.C., & Mondragon, A. (1994) Nature
Biol. 254, 657–667. 367, 138–146.
436. Hovde, S., Abate-Shen, C., & Geiger, J.H. (2001) 465. Walker, J.R., Corpina, R.A., & Goldberg, J. (2001) Nature
Biochemistry 40, 12013–12021. 412, 607–614.
437. Reddy, C.K., Das, A., & Jayaram, B. (2001) J. Mol. Biol. 466. Moarefi, I., Jeruzalmi, D., Turner, J., O’Donnell, M., &
314, 619–632. Kuriyan, J. (2000) J. Mol. Biol. 296, 1215–1223.
438. Luisi, B.F., Xu, W.X., Otwinowski, Z., Freedman, L.P., 467. Soumillion, P., Sexton, D.J., & Benkovic, S.J. (1998)
Yamamoto, K.R., & Sigler, P.B. (1991) Nature 352, Biochemistry 37, 1819–1827.
497–505. 468. Latham, G.J., Dong, F., Pietroni, P., Dozono, J.M.,
439. Meinke, G., & Sigler, P.B. (1999) Nat. Struct. Biol. 6, Bacheller, D.J., & von Hippel, P.H. (1999) Proc. Natl.
471–477. Acad. Sci. U.S.A. 96, 12448–12453.
440. Newman, M., Strzelecka, T., Dorner, L.F., Schildkraut, 469. Bochkarev, A., Pfuetzner, R.A., Edwards, A.M., &
I., & Aggarwal, A.K. (1995) Science 269, 656–663. Frappier, L. (1997) Nature 385, 176–181.
441. Bell, C.E., & Lewis, M. (2000) Nat. Struct. Biol. 7, 470. Raghunathan, S., Kozlov, A.G., Lohman, T.M., &
209–214. Waksman, G. (2000) Nat. Struct. Biol. 7, 648–652.
442. Wojciak, J.M., Iwahara, J., & Clubb, R.T. (2001) Nat. 471. Classen, S., Ruggles, J.A., & Schultz, S.C. (2001) J. Mol.
Struct. Biol. 8, 84–90. Biol. 314, 1113–1125.
443. Pellegrini, L., Tan, S., & Richmond, T.J. (1995) Nature 472. Shi, H., & Moore, P.B. (2000) RNA 6, 1091–1105.
376, 490–498. 473. Quigley, G.J., & Rich, A. (1976) Science 194, 796–806.
444. Santelli, E., & Richmond, T.J. (2000) J. Mol. Biol. 297, 474. Quigley, G.J., Wang, A.H., Seeman, N.C., Suddath, F.L.,
437–449. Rich, A., Sussman, J.L., & Kim, S.H. (1975) Proc. Natl.
445. Kosa, P.F., Ghosh, G., DeDecker, B.S., & Sigler, P.B. Acad. Sci. U.S.A. 72, 4866–4870.
(1997) Proc. Natl. Acad. Sci. U.S.A. 94, 6042–6047. 475. Kim, S.H., Suddath, F.L., Quigley, G.J., McPherson, A.,
446. Kim, J.L., & Burley, S.K. (1994) Nat. Struct. Biol. 1, Sussman, J.L., Wang, A.H., Seeman, N.C., & Rich, A.
638–653. (1974) Science 185, 435–440.
447. Starich, M.R., Wikstrom, M., Schumacher, S., Arst, 476. Kim, S.H., Sussman, J.L., Suddath, F.L., Quigley, G.J.,
H.N., Jr., Gronenborn, A.M., & Clore, G.M. (1998) J. Mol. McPherson, A., Wang, A.H., Seeman, N.C., & Rich, A.
Biol. 277, 621–634. (1974) Proc. Natl. Acad. Sci. U. S. A. 71, 4970–4974.
448. Cho, Y., Gorina, S., Jeffrey, P.D., & Pavletich, N.P. (1994) 477. Suddath, F.L., Quigley, G.J., McPherson, A., Sneden, D.,
Science 265, 346–355. Kim, J.J., Kim, S.H., & Rich, A. (1974) Nature 248, 20–
449. Aggarwal, A.K., Rodgers, D.W., Drottar, M., Ptashne, 24.
M., & Harrison, S.C. (1988) Science 242, 899–907. 478. Bjork, G.R. (1995) in tRNA: Structure, Biosynthesis, and
450. Schumacher, M.A., Choi, K.Y., Zalkin, H., & Brennan, Function (Söll, D., & RajBhandary, U.L., Eds.), ASM
R.G. (1994) Science 266, 763–770. Press, Washington, DC.
451. Kim, J.L., Nikolov, D.B., & Burley, S.K. (1993) Nature 479. Bjork, G.R., Ericson, J.U., Gustafsson, C.E., Hagervall,
365, 520–527. T.G., Jonsson, Y.H., & Wikstrom, P.M. (1987) Annu. Rev.
452. Kim, Y., Geiger, J.H., Hahn, S., & Sigler, P.B. (1993) Biochem. 56, 263–287.
Nature 365, 512–520. 480. Ennifar, E., Nikulin, A., Tishchenko, S., Serganov, A.,
453. Schultz, S.C., Shields, G.C., & Steitz, T.A. (1991) Science Nevskaya, N., Garber, M., Ehresmann, B., Ehresmann,
253, 1001–1007. C., Nikonov, S., & Dumas, P. (2000) J. Mol. Biol. 304,
454. Shimon, L.J., & Harrison, S.C. (1993) J. Mol. Biol. 232, 35–42.
826–838. 481. Oubridge, C., Ito, N., Evans, P.R., Teo, C.H., & Nagai, K.
455. Mol, C.D., Izumi, T., Mitra, S., & Tainer, J.A. (2000) (1994) Nature 372, 432–438.
Nature 403, 451–456. 482. Rould, M.A., Perona, J.J., & Steitz, T.A. (1991) Nature
456. Li, T., Stark, M.R., Johnson, A.D., & Wolberger, C. (1995) 352, 213–218.
Science 270, 262–269. 483. Antson, A.A., Dodson, E.J., Dodson, G., Greaves, R.B.,
457. Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F., Chen, X., & Gollnick, P. (1999) Nature 401, 235–242.
& Richmond, T.J. (1997) Nature 389, 251–260. 484. Valegard, K., Murray, J.B., Stonehouse, N.J., van den
458. Tan, S., & Richmond, T.J. (1998) Nature 391, 660– Worm, S., Stockley, P.G., & Liljas, L. (1997) J. Mol. Biol.
666. 270, 724–738.
459. Sierk, M.L., Zhao, Q., & Rastinejad, F. (2001) 485. Batey, R.T., Rambo, R.P., Lucast, L., Rha, B., & Doudna,
Biochemistry 40, 12833–12843. J.A. (2000) Science 287, 1232–1239.
460. Obmolova, G., Ban, C., Hsieh, P., & Yang, W. (2000) 486. Biou, V., Yaremchuk, A., Tukalo, M., & Cusack, S. (1994)
Nature 407, 703–710. Science 263, 1404–1410.
461. Lamers, M.H., Perrakis, A., Enzlin, J.H., Winterwerp, 487. Frugier, M., Soll, D., Giege, R., & Florentz, C. (1994)
H.H., de Wind, N., & Sixma, T.K. (2000) Nature 407, Biochemistry 33, 9912–9921.
711–717. 488. Rould, M.A., Perona, J.J., Soll, D., & Steitz, T.A. (1989)
462. Lesser, D.R., Kurpiewski, M.R., Waters, T., Connolly, Science 246, 1135–1142.
342 Atomic Details
489. Stark, H., Dube, P., Luhrmann, R., & Kastner, B. (2001) 511. Miller, J., McLachlan, A.D., & Klug, A. (1985) EMBO J. 4,
Nature 409, 539–542. 1609–1614.
490. Price, S.R., Evans, P.R., & Nagai, K. (1998) Nature 394, 512. Brown, R.S., Sander, C., & Argos, P. (1985) FEBS Lett.
645–650. 186, 271–274.
491. Shevack, A., Gewitz, H.S., Hennemann, B., Yonath, A., 513. Nolte, R.T., Conlin, R.M., Harrison, S.C., & Brown, R.S.
& Wittmann, H.G. (1985) FEBS Lett. 184, 68–71. (1998) Proc. Natl. Acad. Sci. U.S.A. 95, 2938–2943.
492. von Bohlen, K., Makowski, I., Hansen, H.A., Bartels, H., 514. Dill, K.A., Alonso, D.O., & Hutchinson, K. (1989)
Berkovitch-Yellin, Z., Zaytzev-Bashan, A., Meyer, S., Biochemistry 28, 5439–5449.
Paulke, C., Franceschi, F., & Yonath, A. (1991) J. Mol. 515. Honzatko, R.B., Crawford, J.L., Monaco, H.L., Ladner,
Biol. 222, 11–15. J.E., Ewards, B.F.P., Evans, D.R., Warren, S.G., Wiley,
493. Yonath, A., Glotz, C., Gewitz, H.S., Bartels, K.S., von D.C., Ladner, R.C., & Lipscomb, W.N. (1982) J. Mol.
Bohlen, K., Makowski, I., & Wittmann, H.G. (1988) J. Biol. 160, 219–263.
Mol. Biol. 203, 831–834. 516. Rosenbusch, J.P., & Weber, K. (1971) Proc. Natl. Acad.
494. Karpova, E.A., Serdiuk, I.N., Tarkhovskii, I.S., Orlova, Sci. U.S.A. 68, 1019–1023.
E.V., & Boroviagin, V.L. (1986) Dokl. Akad. Nauk SSSR 517. Nelbach, M.E., Pigiet, V.P., Jr., Gerhart, J.C., &
289, 1263–1266. Schachman, H.K. (1972) Biochemistry 11, 315–327.
495. Trakhanov, S.D., Yusupov, M.M., Agalarov, S.C., 518. Carvajal, N., Venegas, A., Oestreicher, G., & Plaza, M.
Garber, M.B., Ryazantsev, S.N., Tischenko, S.V., & (1971) Biochim. Biophys. Acta 250, 437–442.
Shirokov, V.A. (1987) FEBS Lett. 220, 319–322. 519. Kim, N.N., Cox, J.D., Baggio, R.F., Emig, F.A., Mistry,
496. Trakhanov, S., Yusupov, M., Shirokov, V., Garber, M., S.K., Harper, S.L., Speicher, D.W., Morris, S.M., Jr., Ash,
Mitschler, A., Ruff, M., Thierry, J.C., & Moras, D. (1989) D.E., Traish, A., & Christianson, D.W. (2001)
J. Mol. Biol. 209, 327–328. Biochemistry 40, 2678–2688.
497. Yusupov, M.M., Trakhanov, S.D., Barynin, V.V., 520. Pearson, R.G. (1966) Science 151, 1721–1727.
Boroviagin, V.L., Garber, M.B., Sedelnikova, S.E., 521. Martin, R.B. (1984) in Metal Ions in Biological Systems:
Selivanova, O.M., Tishchenko, S.V., Shirokov, V.A., & Volume 17, Calcium and Its Role in Biology (Sigel, H.,
Edintsov, I.M. (1987) Dokl. Akad. Nauk SSSR 292, Ed.) pp 1–50, Marcel Dekker, New York.
1271–1274. 522. Lide, D.R. (1998) CRC Handbook of Chemistry and
498. Wimberly, B.T., Brodersen, D.E., Clemons, W.M., Jr., Physics, 79th Ed., CRC Press, Boca Raton, FL.
Morgan-Warren, R.J., Carter, A.P., Vonrhein, C., Hartsch, 523. Einspahr, H., & Bugg, C.E. (1984) in Metal Ions in
T., & Ramakrishnan, V. (2000) Nature 407, 327–339. Biological Systems: Volume 17, Calcium and Its Role in
499. Yusupov, M.M., Yusupova, G.Z., Baucom, A., Biology (Sigel, H., Ed.) pp 51–97, Marcel Dekker, New
Lieberman, K., Earnest, T.N., Cate, J.H., & Noller, H.F. York.
(2001) Science 292, 883–896. 524. Kim, K.H., Pan, Z., Honzatko, R.B., Ke, H.M., &
500. Ban, N., Nissen, P., Hansen, J., Moore, P.B., & Steitz, Lipscomb, W.N. (1987) J. Mol. Biol. 196, 853–875.
T.A. (2000) Science 289, 905–920. 525. Holmes, M.A., & Matthews, B.W. (1981) Biochemistry
501. Harms, J., Schluenzen, F., Zarivach, R., Bashan, A., Gat, 20, 6912–6920.
S., Agmon, I., Bartels, H., Franceschi, F., & Yonath, A. 526. Wouters, J. (1998) Protein Sci. 7, 2472–2475.
(2001) Cell 107, 679–688. 527. De Wall, S.L., Meadows, E.S., Barbour, L.J., & Gokel,
502. Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., G.W. (2000) Proc. Natl. Acad. Sci. U.S.A. 97, 6271–6276.
Gluehmann, M., Janell, D., Bashan, A., Bartels, H., 528. DeLaBarre, B., Thompson, P.R., Wright, G.D., &
Agmon, I., Franceschi, F., & Yonath, A. (2000) Cell 102, Berghuis, A.M. (2000) Nat. Struct. Biol. 7, 238–244.
615–623. 529. Mueller, U., Perl, D., Schmid, F.X., & Heinemann, U.
503. Pioletti, M., Schlunzen, F., Harms, J., Zarivach, R., (2000) J. Mol. Biol. 297, 975–988.
Gluhmann, M., Avila, H., Bashan, A., Bartels, H., 530. Wells, C.M., & Di Cera, E. (1992) Biochemistry 31,
Auerbach, T., Jacobi, C., Hartsch, T., Yonath, A., & 11721–11730.
Franceschi, F. (2001) EMBO J. 20, 1829–1839. 531. Rhee, S., Parris, K.D., Ahmed, S.A., Miles, E.W., &
504. Ramakrishnan, V. (2002) Cell 108, 557–572. Davies, D.R. (1996) Biochemistry 35, 4211–4221.
505. Ogle, J.M., Brodersen, D.E., Clemons, W.M., Jr., Tarry, 532. Toney, M.D., Hohenester, E., Cowan, S.W., &
M.J., Carter, A.P., & Ramakrishnan, V. (2001) Science Jansonius, J.N. (1993) Science 261, 756–759.
292, 897–902. 533. Isupov, M.N., Antson, A.A., Dodson, E.J., Dodson, G.G.,
506. Thompson, J., Kim, D.F., O’Connor, M., Lieberman, Dementieva, I.S., Zakomirdina, L.N., Wilson, K.S.,
K.R., Bayfield, M.A., Gregory, S.T., Green, R., Noller, Dauter, Z., Lebedev, A.A., & Harutyunyan, E.H. (1998)
H.F., & Dahlberg, A.E. (2001) Proc. Natl. Acad. Sci. J. Mol. Biol. 276, 603–623.
U.S.A. 98, 9002–9007. 534. Larsen, T.M., Laughlin, L.T., Holden, H.M., Rayment, I.,
507. Nissen, P., Hansen, J., Ban, N., Moore, P.B., & Steitz, & Reed, G.H. (1994) Biochemistry 33, 6301–6309.
T.A. (2000) Science 289, 920–930. 535. Greasley, S.E., Horton, P., Ramcharan, J., Beardsley,
508. Pavletich, N.P., & Pabo, C.O. (1993) Science 261, G.P., Benkovic, S.J., & Wilson, I.A. (2001) Nat. Struct.
1701–1707. Biol. 8, 402–406.
509. Choo, Y., & Klug, A. (1994) Proc. Natl. Acad. Sci. U.S.A. 536. Zhou, Y., Morais-Cabral, J.H., Kaufman, A., &
91, 11168–11172. MacKinnon, R. (2001) Nature 414, 43–48.
510. Choo, Y., & Klug, A. (1994) Proc. Natl. Acad. Sci. U.S.A. 537. Betzel, C., Pal, G.P., & Saenger, W. (1988) Eur. J.
91, 11163–11167. Biochem. 178, 155–171.
References 343
538. Bajorath, J., Hinrichs, W., & Saenger, W. (1988) Eur. J. 555. Guss, J.M., & Freeman, H.C. (1983) J. Mol. Biol. 169,
Biochem. 176, 441–447. 521–563.
539. Acharya, K.R., Stuart, D.I., Walker, N.P., Lewis, M., & 556. Norris, G.E., Anderson, B.F., & Baker, E.N. (1983) J. Mol.
Phillips, D.C. (1989) J. Mol. Biol. 208, 99–127. Biol. 165, 501–521.
540. Teplyakov, A.V., Kuranova, I.P., Harutyunyan, E.H., 557. Garrett, T.P.J., Clingeleffer, D.J., Guss, J.M., Rogers, S.J.,
Vainshtein, B.K., Frommel, C., Hohne, W.E., & Wilson, & Freeman, H.C. (1984) J. Biol. Chem. 259, 2822–
K.S. (1990) J. Mol. Biol. 214, 261–279. 2825.
541. Chakrabarti, P. (1990) Biochemistry 29, 651–658. 558. Kerr, M.C., Preston, H.S., Ammon, H.L., Huheey, J.E., &
542. Snyder, E.E., Buoscio, B.W., & Falke, J.J. (1990) Stewart, J.M. (1981) J. Coord. Chem. 11, 111–115.
Biochemistry 29, 3937–3943. 559. Lee, Y.H., Deka, R.K., Norgard, M.V., Radolf, J.D., &
543. Kankare, J., Salminen, T., Lahti, R., Cooperman, B.S., Hasemann, C.A. (1999) Nat. Struct. Biol. 6, 628–
Baykov, A.A., & Goldman, A. (1996) Biochemistry 35, 633.
4670–4677. 560. Miller, W.T., Hill, K.A., & Schimmel, P. (1991)
544. Ray, W.J., Jr., & Multani, J.S. (1972) Biochemistry 11, Biochemistry 30, 6970–6976.
2805–2812. 561. Qian, X., Gozani, S.N., Yoon, H., Jeon, C.J., Agarwal, K.,
545. Kuo, C.F., McRee, D.E., Fisher, C.L., O’Handley, S.F., & Weiss, M.A. (1993) Biochemistry 32, 9944–9959.
Cunningham, R.P., & Tainer, J.A. (1992) Science 258, 562. Green, L.M., & Berg, J.M. (1989) Proc. Natl. Acad. Sci.
434–440. U.S.A. 86, 4047–4051.
546. Guan, Y., Manuel, R.C., Arvai, A.S., Parikh, S.S., Mol, 563. Kraulis, P.J., Raine, A.R., Gadhavi, P.L., & Laue, E.D.
C.D., Miller, J.H., Lloyd, S., & Tainer, J.A. (1998) Nat. (1992) Nature 356, 448–450.
Struct. Biol. 5, 1058–1064. 564. Pan, T., & Coleman, J.E. (1990) Proc. Natl. Acad. Sci.
547. Wedekind, J.E., Frey, P.A., & Rayment, I. (1995) U.S.A. 87, 2077–2081.
Biochemistry 34, 11049–11061. 565. Swaminathan, K., Flynn, P., Reece, R.J., &
548. Nagashima, S., Nakasako, M., Dohmae, N., Tsujimura, Marmorstein, R. (1997) Nat. Struct. Biol. 4, 751–759.
M., Takio, K., Odaka, M., Yohda, M., Kamiya, N., & 566. Benning, M.M., Kuo, J.M., Raushel, F.M., & Holden,
Endo, I. (1998) Nat. Struct. Biol. 5, 347–351. H.M. (1995) Biochemistry 34, 7973–7978.
549. Jabri, E., Carr, M.B., Hausinger, R.P., & Karplus, P.A. 567. White, A., Ding, X., vanderSpek, J.C., Murphy, J.R., &
(1995) Science 268, 998–1004. Ringe, D. (1998) Nature 394, 502–506.
550. Teixeira, M., Moura, I., Xavier, A.V., Dervartanian, D.V., 568. Tao, X., & Murphy, J.R. (1992) J. Biol. Chem. 267,
Legall, J., Peck, H.D., Jr., Huynh, B.H., & Moura, J.J.G. 21761–21764.
(1983) Eur. J. Biochem. 130, 481–484. 569. Zheng, Y.J., Xia, Z., Chen, Z., Mathews, F.S., & Bruice,
551. Ragsdale, S.W., Clark, J.E., Ljungdahl, L.G., Lundie, L.L., T.C. (2001) Proc. Natl. Acad. Sci. U.S.A. 98, 432–
& Drake, H.L. (1983) J. Biol. Chem. 258, 2364–2369. 434.
552. Ellefson, W.L., Whitman, W.B., & Wolfe, R.S. (1982) 570. Skarzynski, T., Moody, P.C., & Wonacott, A.J. (1987) J.
Proc. Natl. Acad. Sci. U.S.A. 79, 3707–3710. Mol. Biol. 193, 171–187.
553. Hamilton, C.L., Scott, R.A., & Johnson, M.K. (1989) J. 571. Nicholson, H., Anderson, D.E., Dao-pin, S., &
Biol. Chem. 264, 11605–11613. Matthews, B.W. (1991) Biochemistry 30, 9816–9828.
554. Chan, M.K., Mukund, S., Kletzin, A., Adams, M.W., & 572. Andersson, I. (1996) J. Mol. Biol. 259, 160–174.
Rees, D.C. (1995) Science 267, 1463–1469. 573. Kraulis, P.J. (1991) J. Applied Crystallog. 24, 946–950.
Chapter 7
Evolution
Although it is mutations in the DNA that produce the amino acid sequences retain the history of the speciation
diversity upon which natural selection operates, it is of organisms. This history can be reconstructed by com-
within the proteins encoded by that DNA that most of the paring the amino acid sequences of the same protein
diversity is expressed. Consequently, natural selection from an array of different species.
accepts or rejects mutated proteins, not mutated genes. Even in the most advantageous instances in which
The two genes encoding the two calmodulins in Arbacia amino acid sequences are compared to each other, con-
punctulata, which arose from the duplication of a single nections can usually be made only as far back as the
gene, differ in nucleotide sequence from each other at 45 common ancestors of prokaryotes and eukaryotes. What
out of 393 positions, but in the two calmodulins them- has been found, however, is that the tertiary structure of
selves, which unlike the genes have been continuously a particular protein, when viewed in crystallographic
scrutinized by natural selection, only two of those 45 dif- molecular models from distantly related species,
ferences have been permitted to change the amino acid changes less rapidly than its amino acid sequence during
sequence.1 evolution by natural selection. Because of this, compar-
It is within the proteins existing today that the his- isons of crystallographic molecular models permit one to
tory of evolution by natural selection can be read. The look back in evolutionary history to the time at which the
later episodes of this history are read by comparing the individual proteins themselves were diverging from
amino acid sequences of the same protein from different common ancestors: to the time, for example, when L-lac-
species. Two new species arise from one ancestral tate dehydrogenase and glyceraldehyde-3-phosphate
species as soon as subpopulations of that ancestral dehydrogenase or triose-phosphate isomerase and
species become so different from each other that two indole-3-glycerol-phosphate synthase diverged from
individuals of different sex, one from each of the sub- their common ancestor. Through such comparisons, the
populations, are no longer able to breed successfully. speciation of proteins can be traced. Because amino acid
Even when two closely related species that have only sequences change more rapidly than tertiary structures,
recently diverged from their common ancestor are com- only a few of the pedigrees of proteins, those that
pared, the amino acid sequences of the same respective diverged recently in geologic time or those in which
proteins from each species will often differ at one or mutations are fixed slowly, can be traced by comparing
more positions. For example, myoglobin from domestic amino acid sequences. Most of our insight into the spe-
sheep differs in amino acid sequence at three of its 143 ciation of organisms has come from comparisons of
positions from myoglobin of domestic goats. Even the amino acid sequences, but most of our insight into the
amino acid sequences of the myoglobins from human speciation of proteins has come from comparisons of ter-
and chimpanzee differ at one of their 153 positions. tiary structures.
The reason for this divergence of amino acid From the comparisons that can be made among the
sequence is that once speciation has occurred and inter- tertiary structures that are now available, it has become
breeding becomes impossible, two versions of the same clear that the larger proteins often, if not always, have
protein are established. These two versions begin to arisen during evolution by the chance fusion of two
evolve in isolation from each other, and mutations occur genes encoding smaller proteins, each of which could
at random in the respective genes encoding each version. fold independently and each of which usually had an
Once in a while one of these mutations in one version independent function prior to the fusion. As a conse-
that produces an acceptable change in amino acid quence of such fusions, larger and larger proteins
sequence of the protein is fixed by genetic drift or natu- appeared. If a particular fusion produced a protein that
ral selection independent of any fixation occurring in the was not impaired functionally, the new gene for the
other version, and slowly the amino acid sequences of larger protein may have been fixed in the population by
the encoded proteins become different, one position at a genetic drift; or, if the fusion produced a protein with
time. Because the geologic instant at which the two advantageous features, the new gene for the larger pro-
species were established from one common ancestral tein may have been fixed in the population by natural
species coincides with the instant at which the two ver- selection. The history of these fusions can be observed in
sions of the same protein began to evolve separately, the existing domains from which these larger proteins
346 Evolution
are constructed. The domains of a protein are discrete As the amino acid sequences of the same protein
regions in the tertiary structure of that protein which from different species have become available, it has usu-
arose from separate, previously independent proteins ally been found that they are similar enough to be read-
that were fused together, one after the other, to produce ily aligned with each other. An alignment of two or more
the present protein. Because a polypeptide shorter than amino acid sequences is a display in which positions that
about 70 amino acids usually cannot fold spontaneously are thought to be directly related to each other from the
to form a tertiary structure, domains, when defined in respective sequences are aligned directly above and
this way, are usually larger than this. They appear in the below each other. The decision that the aligned positions
crystallographic molecular model as independently are related is based on the fact that they are occupied by
folded regions. Because they are the fundamental units the same amino acid or the fact that they are each sur-
in the evolution of proteins, domains must be identified rounded by similar sequences of amino acids. The
by a set of conservative, objective criteria, if our descrip- cytochromes c from human, corn, and yeast can be read-
tion of the evolutionary history of a set of proteins is to be ily aligned (Figure 7–1A).2,3 There is no uncertainty about
accurate. the alignment even though the three proteins are from
It may be possible, by examining enough crystallo- distantly related species.
graphic molecular models, to trace the ancestry of the The fact that, in most instances, the amino acid
proteins that presently exist, in a sense to derive a molec- sequences of the three respective proteins responsible
ular phylogeny of the proteins. Because most of the exist- for a particular function in humans, yeast, and corn can
ing proteins were produced by fusion of smaller units, be readily aligned, as can the three cytochromes c, is the
this molecular phylogeny of the proteins must be based strongest evidence for the fact that these three species
on a reconstruction of two processes. First, the family share a common ancestor. Consequently, each of the
trees of the individual, ancestrally related domains from proteins that are responsible for a particular function
different proteins must be reconstructed. In almost every and the amino acid sequences of which can be aligned
instance these family trees must be based on patterns in also share a common ancestor. Any two proteins that
which the secondary structures are arranged to form the have descended from a common ancestor are homo-
tertiary structures of the domains being compared logues of each other.
because similarity in amino acid sequence has been In the distant past, when only the common ances-
completely lost. Second, the separate events that pro- tral species was present, all of the individuals in the pop-
duced each of the fusions of the independent domains to ulation of that ancestral species contained, for all
produce the larger chimeric proteins must also be recon- practical purposes, a cytochrome c the amino acid
structed. sequence of which was the same, just as all individuals of
Although the most interesting question may be how an extant species contain a cytocrome c with the same
the large array of existing proteins arose from a much sequence. As natural selection operated upon the genetic
smaller array of smaller proteins present in the distant
past, it should be stressed that new proteins are continu- Figure 7–1: Alignment of amino acid sequences of cytochromes c
ously being made by this process of fusion of different and replacements observed at each of the positions in the common
pieces. We know this because, in some instances, sequence. (A) Alignment of ungapped amino acid sequences. The
domains that have homologous amino acid sequences three amino acid sequences below the numerical scale are the
can be found in otherwise completely different proteins. aligned amino acid sequences of the cytochromes c from Homo
sapiens, Zea mays, and Saccharomyces cerevisiae. The amino acid
Because similarity in amino acid sequence usually disap- sequence of cytochrome c from Thunnus alalunga is immediately
pears quite rapidly over geologic time, these domains above the numerical scale, which is based on this latter sequence.
must have been separately incorporated into their Above each position in this top sequence is a list of the other amino
respective proteins fairly recently; the greater the simi- acids found in this position in a collection of 40 cytochromes c from
larity of amino acid sequence among them, the more various eukaryotes.2,3 Letters below the horizontal lines in each of
the columns are variations found among cytochromes c of animals,
recently the separate fusions must have occurred. and letters above the horizontal lines are the additional variations
found in fungi and plants, more distantly related eurkaryotes.
(B) Insertion of gaps for the purpose of alignment. The two aligned
amino acid sequences are those for cytochromes c from T. alalunga
Molecular Phylogeny from Amino Acid and Paracoccus dinitrificans. Each set of dashes represents a gap
Sequence that must be made in one of the sequences to align it reasonably
with the other sequence. You should convince yourself that the
The amino acid sequences of a set of related polypep- gaps are inescapable. When the two sequences are aligned in this
tides retain a record of the history of their evolution by way, the size of each gap is determined by the number of extra
natural selection. That record provides information amino acids in the sequence that does not have a gap. (C) Gaps
visualized as insertions. Instead of introducing gaps to permit the
about the speciation of organisms, the specialization of
alignment shown in panel B, the extra amino acids in each inter-
tissues, and the conversion of older proteins into newer vening segment are shown as loops. This presents a more realistic
ones. This evolutionary history is read from aligned picture of the situation but is a significantly more awkward method
amino acid sequences. for displaying an alignment.
Molecular Phylogeny from Amino Acid Sequence 347
A G
F
ESR D G Y D
PTA EG A G A Q A K
SDT RD KS E L E T TI S VV T P E
SIES ENL TTQ EA EGKNLG GQ N FTN QA SVV A A
NAKN ATI IMR SL GIDAAAPQSTA A H IYS HS TSQKFT SN
tuna GDVAKGKKTFVQKCAQCHTVENGGKHKVGPNLWGLFGRKTGQAEGYSYTD
10 20 30 40 50
human GDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTA
corn ASFSEAPPGNPKAGEKIFKTKCAQCHTVEKGAGHKQGPNLNGLFGRQSGTTAGYSYSA
yeast TEFKAGSAKKGATLFKTRCLQCHTVEKGGPHKVGPNLHGIFGRHSGQAEGYSYTD
Q
G A N
D Q T S K G N
Q E D Q Y DT K TV D
R L EPPH PK P TS V QLKE
KS N QQKV HD K S G PE T EETSK
AAAN T KEEN RV L V A V P AD A S VDKAN
GMINMAVI GDNDMFIF T S FM V T LSAEND ENIITFMLKSCA
tuna ANKSKGIVWNNDTLMEYLENPKKYIPGTKMIFAGIKKKGERQDLVAYLKSATS
60 70 80 90 100
human ANKNKGIIWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE
corn ANKNKAVVWEENTLYDYLLNPKKYIPGTKMVFPGLXKPQERADLIAYLKEATA
yeast ANIKKNVLWDENNMSEYLTNPKKYIPGTKMAFGGLKKEKDRNDLITYLKKACE
B
10 20 30 40 50
GDVAKGKKTFVQKCAQCHTV––––-–ENGGKHKVGPNLWGLFGRKTGQAEGYSYTDANKS–––––KGIV
QDGDAAKGEKEF–NKCKACHMIQAPDGTDIIKGGKTGPNLYGVVGRKIASEEGFKYGEGILEVAEKNPDLT
60 70 80 90 100
WNNDTLMEYLENPKKYI––––––––PGTKMIFAGIKKKGERQDLVAYLKSATS
WTEADLIEYVTDPKPWLVKMTDDKGAKTKMTFKMGK---NQADVVAFLAQNSPDAGGDGEAA
C
20 30 40 50
V
GDVAKGKKT KCAQCHTVENGGKHKVGPNLWGLFGRKTGQAEGYSYTDANKSKGIV
FQ
QDGDAAKGEKEFNKCKACH IKGGKTGPNLYGVVGRKIASEEGFKYGEGI LT
LE V A
MI Q A
D
G T DI
NP
P D
K
E
K
GE R
I KK
60 70 80 100
WNNDTLMEYLENPKKYIPGTKMIFAG QDLVAYLKSATS
WTEADLIEYVTDPKP TKMTFKMGKNQADVVAFLAQNSPDAGGDGEAA
K
WL V
A
K G
K M
T D
348 Evolution
variation present within the population of that ancestral population of its species, or became fixed, by genetic
species, varieties arose that occupied different ecological drift. When one views aligned sequences of the same
niches. These varieties eventually diverged sufficiently to protein from different species, one is examining the
become separate species. At that point, the genes for record of this gradual increase in entropy.
cytochrome c in these two new species became discon- This increase in entropy, however, is biased. From
nected, and the amino acid sequences encoded by those examining aligned amino acid sequences of the same pro-
genes from that time forth were altered independently tein from many species, it is clear that each position in the
and continuously by mutation, genetic drift, and natural underlying sequence that gives the protein its unique
selection. As a result of a long series of such disconnec- character is under a different degree of negative selective
tions, the distantly related species Homo sapiens, Zea pressure. Mutations can occur with equal frequency at any
mays, and Saccharomyces cerevisiae eventually appeared. position in the sequence of the DNA encoding for the
The differences and similarities between the three extant sequence of a protein. Each of these individual mutations
amino acid sequences for the three respective is assessed by natural selection, and the majority5 disap-
cytochromes c are the accumulated result of the individ- pear almost immediately because they adversely affect the
ual steps in this process of speciation. An underlying function of the protein or are otherwise deleterious. For
assumption of this description is that a function per- example, in the human population, there are many
formed only by the protein encoded by a certain gene mutant forms of hemoglobin that bind oxygen improperly
remains the exclusive property of the product of that or are unstable proteins.7 These represent deleterious
gene as it passes from species to species. Although this is mutations that survive for a limited time before disap-
usually the case, there are isolated examples in which the pearing from the population. These mutant forms can be
amino acid sequence of a protein from one species contrasted with fetal hemoglobin that has been fixed in
seems to be unrelated to that of the protein performing the human population because it is a stable protein and
the same role in another species and must be the result has beneficial properties. The most deleterious mutation
of convergent rather than divergent evolution.4 is one that kills the individual in which it arises before that
Evolution by natural selection is usually viewed individual has had an opportunity to mate or otherwise
from its optimistic side. Natural selection operates on the reproduce after the mutation occurs. The more critical a
variation inherent in any large population of a given particular amino acid in the sequence of the protein is to
species of organisms to shift the distribution of its assem- its function, the less prone will that position be to substi-
bled abilities gradually in a direction that makes that tution over time. For this reason, the aligned amino acid
species or its descendant species more successful. sequences of the same protein from an array of species
Beneficial traits are patiently nurtured and multiplied. evaluate the scope of the intolerance to variation
The major portion of the variation upon which natural expressed at each position in the sequence of the protein.
selection operates to achieve this progress is variation in This record of intolerance can be read from exam-
the sequences of the proteins within the population of a ining consecutively each position in the aligned
given species. sequences of a large collection of the same proteins from
It is unlikely, however, that more than a small different species. Above the numerical scale in Figure
number of the differences seen when two aligned amino 7–1A, the sequence of cytochrome c from Thunnus
acid sequences are compared (Figure 7–1A) reflect alalunga is presented, and above each of its positions in
improvements in the ability of the individuals of that a column of letters are tallied the amino acids found there
species to survive relative to that of individuals of other in the cytochromes c from 40 other eukaryotes.2,3 The hor-
species or their common ancestors. There is little evi- izontal lines in each of these columns of letters separate
dence that the cytochrome c from either H. sapiens or amino acids found in the sequences of cytochromes c
Z. mays is an improved version of the cytochrome c that from animals, of which far more are available, from amino
was used by their common ancestor or that any of the acids found in the sequences of cytochromes c from fungi
proteins the amino acid sequences of which are being and plants, which represent more distant relationships.
presently compared are improved versions. The majority A similar record of intolerance is observed when the
of the differences that accumulate in the sequences of amino acid sequence of a particular protein is mutated at
the same protein in two lineages, following their diver- random and the resulting mutants are selected for their
gence from their common ancestor, are neutral replace- ability to function properly.8
ments.5,6 A neutral replacement is a change of one This intolerance to substitution is most strongly
amino acid for another that is harmless enough that the manifested at an invariant position. An invariant posi-
biological function of the protein does not deteriorate tion in a protein is a position at which no replacement
sufficiently to cause the elimination of the replacement has been made over the history encompassed by the
by natural selection. These neutral replacements arise aligned amino acid sequences. A few of the positions in
from mutations in the DNA encoding the protein. Each the aligned sequences of the cytochromes c have
that is now in existence began as a mutation in the remained absolutely invariant, for example, Cysteine 14,
genome of one individual and then spread through the Cysteine 17, Histidine 18, and Methionine 80 because
Molecular Phylogeny from Amino Acid Sequence 349
these are functionally irreplaceable and consequently occur during evolution. Because there are three bases
define a cytochrome c. Some positions such as those coding for each amino acid and mutation occurs one
occupied by Glycine 6, Glycine 34, Glycine 41, Glycine 77, base at a time, replacements requiring one base change
and Glycine 84 are invariant among the eukaryotes but should be more common than those requiring two or
are replaced in bacterial cytochromes c. Several of these more.5 There are, however, some interesting apparent
glycines are examples of the fact that glycines with exceptions to this generalization that occur even in the
angles f and y outside the boundaries on a Rama- comparisons of the various eukaryotic cytochromes c
chandran plot (Figure 6–4B) are difficult to replace.9 (Figure 7–1). For example, at position 31, only asparagine
Nevertheless, their eventual replacement demonstrates and alanine are found; at position 72, only lysine and
that a designation of invariant is always provisional. As serine; at position 45, only lysine and glycine; and at posi-
more and more amino acid sequences of the same pro- tion 19, only glycine and threonine. Each of these four
tein from different species become available, the number replacements would require that two bases of the respec-
of invariant positions usually decreases.10 tive codon be mutated consecutively. Although these are
The fact that any designation of invariant is neces- unlikely events, the constraints on the occupation of
sarily based on a limited set of amino acid sequences these positions seem to have been severe enough to con-
may explain why site-directed mutation of amino acids fine the replacements among the eukaryotes to these
at apparently invariant positions often has little effect on choices. In the short term, however, the difficulty of
the function of a protein.11–15 For example, site-directed changing more than one base to effect a replacement is
mutation of five of the 15 highly conserved amino acids more acute. When the detailed history of the mutational
in lathosterol oxidase had little effect on its function.16 events that have occurred during the recent evolution of
Consequently, the intuition that an invariant position artiodactyl fibrinopeptides5 was examined, it was
must be structurally or functionally important is unreli- observed that replacements requiring the mutation of
able. The situation is even more confusing when a posi- only one base were far more frequent than those requir-
tion known to be functionally critical nevertheless ing two consecutive mutations. It is altogether likely that,
displays several replacements even among closely in circumstances where two consecutive mutations seem
related species.17 to have occurred, the amino acid sequence of the protein
Many of the changes accumulating over time seem displaying the intermediate single mutation, although it
to be conservative replacements. A conservative exists, has not yet been determined.
replacement is a replacement at a position in which only One approach to examining quantitatively the
similar amino acids, either in size or in chemical proper- progress of natural selection has been to calculate a
ties, can be tolerated. For example, only valine, mutation probability for every pair of possible replace-
isoleucine, phenylalanine, and leucine, each of the side ments.2 This was accomplished by reconstructing a
chains of which is a hydrocarbon, seem to occur in probable sequence of events in the evolution of 10 differ-
position 35 of eukaryotic cytochrome c (Figure 7–1A). ent groups of closely related sequences. Sequences for
Either glycine or alanine, the side chains of which are common ancestors were predicted from alignments, and
small, seems to be necessary in position 29. Either serine, all of the replacements that should have occurred follow-
threonine, or glutamine, the side chains of which ing the divergence of the progeny from that ancestor
are polar but uncharged, seems to be necessary in were tabulated to provide the basis for the calculation of
position 42. probabilities. The results of this study were presented as
It was once thought that each position in the amino mutation probabilities. A mutation probability is the
acid sequence of a protein could be assigned unambigu- probability that a certain replacement will occur during
ously to one of a few categories, for example, invariant, a time long enough for a particular number of replace-
conservative, physicochemically constant, and vari- ments to accumulate for every 100 amino acids of the
able.18 When it became possible, however, to compare sequence.
the amino acid sequences of the same protein from a Values for mutation probabilities over a period of
large number of distantly related species, the majority of time long enough for two replacements for every 100
the replacements observed could not be easily explained. amino acids (Table 7–1) register changes that occur over
This fact suggests that even the specific designations just the short term. Almost all of the replacements with the
presented may themselves be rationalizations of more highest mutation probabilities over this period require
subtle processes that are not understood. A close exami- only one base change to occur and are also remarkably
nation of the actual results, however, sequence position conservative. For example, the 12 most frequent replace-
by sequence position (Figure 7–1), does produce an intu- ments do not involve any change in charge number or
itive feeling for the play of evolution. even polarity. Replacements involving alanine are the
In addition to the capacity of a particular amino acid most frequent by a considerable margin, an observation
to be tolerated at a particular position in the sequence of suggesting that a truncation to the b carbon is the most
a protein, the nature of the genetic code itself also affects readily tolerated change. A large number of replace-
the patterns in which replacements in the sequence ments are not tolerated well at all (mutational probabil-
350 Evolution
amino acid sequence that is missing a segment of amino sequences of two proteins from that of their common
acids present in the other amino acid sequence with ancestor, the percentage of identity decreases and the
which the first is being aligned. For example, it is neces- gap percentage increases until it becomes difficult to
sary to insert three gaps of 6, 5, and 8 spaces in length into align them. Appropriately programmed digital comput-
the amino acid sequence of cytochrome c from ers are used to align such distantly related sequences.25
T. alalunga and two gaps of 1 and 3 spaces in length into The computational alignment of two distantly
the amino acid sequence of cytochrome c-550 from related amino acid sequences is accomplished by con-
Paracoccus denitrificans in order to achieve the most rea- structing a matrix.26 If one sequence A has p amino acids,
sonable alignment of these two proteins (Figure 7–1B). arranged in the order a1a2a3 … ap, and the other
On either side of each gap, the alignments are convinc- sequence B has q amino acids, arranged in the order
ing enough to justify the insertions of the gaps required b1b2b3 … bq, the product of these two vectors is a matrix
to bring those alignments into register. It must be kept in C, the coefficients of which, cij, are equal to ai ¥ bj, where
mind that in the actual polypeptide there is no gap; rather ai and bj are particular amino acids. For example, in the
it is the other polypeptide, the one with the ungapped alignment of the cytochromes c from T. alalunga and
sequence, that has additional amino acids at that point P. denitrificans (Figure 7–1B), a9 ¥ b11 would be Thr ¥ Glu.
(Figure 7–1C). The use of a gap is simply a convenient The numerical value assigned to a particular position cij
method for displaying the alignment of the sequences. in the matrix representing ai ¥ bj depends on the schemes
When two sequences are aligned, their similarity is chosen to weight the comparisons.
usually quantified by stating their percentage of identity The simplest scheme is to decide that when ai = bj,
and the gap percentage. The percentage of identity is the cij = ai ¥ bj = 1, and when ai π bj, cij = ai ¥ bj = 0. This pro-
percentage of the average number of positions in the two duces a matrix the coefficients of which, cij, are either 1
aligned sequences that are occupied by the same amino or 0. When the amino acid in position ai in the first
acid. The gap percentage is the number of gaps that had sequence is the same as the amino acid in position bj in
to be inserted for every 100 amino acids in the alignment. the second sequence, cij = 1; when they are different,
For example, in the alignment of the cytochromes c from regardless of the difference, cij = 0. Such a matrix, spread
T. alalunga and P. denitrificans (Figure 7–1B), an average upon a two-dimensional field, can be represented dia-
of 110.5 positions from the two sequences are aligned, grammatically by placing a dot on every position with a
there are 38 identities for a percentage of identity of 34% score of 1 (Figure 7–2).27 In such a dot matrix, the align-
identity, and there are 5 gaps for a gap percentage of ment is represented by diagonal strings of dots. In the dot
4.5 gap percent.* matrices comparing the amino acid sequence of the
The alignments of the cytochromes c in Figure 7–1 cytochrome c of human with those of monkey and fish in
are so obvious that they can be performed unassisted. Figure 7–2, the diagonals are obvious and unbroken. In
Even for the cytochromes c from T. alalunga and P. den- the matrix comparing those of human and bacterium,
itrificans, with only 34% identity and 4.5 gap percent, the the alignment is a set of at least three diagonal segments
sequences are easily aligned by eye. The amino acid that can be picked out by eye if the figure is tilted and
sequence of each protein changes, however, at a differ- viewed along the diagonal direction. The offsets between
ent rate during evolution. Although there are proteins the diagonal segments are the gaps in the alignments.
that change more slowly than cytochrome c, such as his- There are, however, 231 different outcomes* for
tone H4 (20 times more slowly), calmodulin (4 times ai ¥ bj if one assumes symmetry, namely, that Glu ¥ Thr =
more slowly), a tubulin (2 times more slowly), ubiquitin Thr ¥ Glu, if one treats cysteine and cystine as separate
(2 times more slowly), and protein phosphatase 2A amino acids, and if one treats each of the 21 types of
(2 times more slowly),24 most proteins change more rap- identity as a unique result. It has always seemed that
idly than cytochrome c. As more and more time has some of these 231 outcomes are more probable and that
passed following the divergence of the amino acid recognition of this probability with the proper weighting
scheme might enhance the ability to align distantly
* There is no agreement as to the length of amino acid sequence to related sequences. Each of the more than 18 available
be used in calculating the percentage of identity. The most weighting schemes28 is a table of numbers assigned to
common choice is the length of the shorter sequence. The justifi- the 231 possible identities and replacements. Each of
cation for this choice is that the inserts in the longer protein cannot these entries reflects the author’s view of the probability
be compared to anything and should therefore be discounted. This
choice, however, in a self-contradiction, ignores the inserts in the that such an outcome is the result of evolution by natu-
smaller protein. Probably, the best choice would be to use the ral selection.
length of the common amino acid sequence in which a gap appears The ultimate goal in aligning two amino acid
in neither protein. The problem with this choice is that it would sequences is to decide whether position ai in sequence A
inflate both percentages and would be misleading in the absence and position bj in sequence B arose from the same posi-
of universal agreement. The choice made in the present calcula-
tions was to use the mean length of the two sequences being com- tion in the sequence of a common ancestor or position ai
pared, which produces somewhat smaller percentages than any of
the other choices. * [(21 ¥ 20)/2] + 21.
352 Evolution
Figure 7–2: Dot matrices27 for the amino acid sequences of the cytochromes c from Macaca mulatta, T. alalunga, and Rhodospirillum
rubrum, each compared to the amino acid sequence of human cytochrome c. The sequence of human cytochrome c is the vertical vector (top
to bottom, amino to carboxy terminus), and the respective sequences with which it is compared are the horizontal vectors (left to right, amino
to carboxy terminus). A dot is placed in the matrix when the amino acids at those two positions, horizontal and vertical, are the same.
Reprinted with permission from ref 27. Copyright 1970 Springer-Verlag.
in sequence A and position bj in sequence B are unre- ticular replacement ai ¥ bj or the probability that the
lated to each other either because protein A and pro- particular replacement ai ¥ bj would arise as the result of
tein B do not share a sufficiently recent common ancestor evolution rather than chance. Consequently, logarithms
that they can be aligned or because the two sequences of these probabilities are used as entries in the matrix so
are misaligned. If the two amino acid sequences are that the summation to be performed will represent prod-
unrelated, ai ¥ bj is governed solely by chance. If the ucts of probabilities.2 It is also possible to incorporate
respective positions are descended from the same posi- weights into a dot matrix by assigning a dot to any cij the
tion in an ancestral sequence, then ai ¥ bj should retain weight of which exceeds a certain threshold.31
some of the biases enforced by natural selection, and When the matrix has been constructed to the taste
these biases, if they can be quantified, should be consid- of the practitioner, the alignment can be performed.26
ered while a decision is reached. If a particular replace- Any alignment of the two sequences A and B can be rep-
ment has a higher probability of occurring as a result of resented as a set of consecutive diagonal segments run-
evolutionary change than it does of occurring as a result ning through the matrix, for example, the three diagonal
of random change, then whenever that particular segments in the dot matrix comparing the cytochromes c
replacement is encountered, those two positions have a of humans and R. rubrum (Figure 7–2). To be included in
higher probability of being evolutionarily related than of the alignment, the end of one of these diagonal segments
being unrelated. For example, every ai ¥ bj where ai and must be in line with, above, or to the left of the beginning
bj are interconvertible by only one base change should of the next diagonal segment when the diagonals run
have a higher probability of being evolutionarily related from top left to bottom right as in Figure 7–2. Each dis-
than those ai ¥ bj where ai and bj are interconvertible only continuity requiring a negative vertical shift or a positive
by two or three base changes.29 Every ai ¥ bj where ai and horizontal shift to connect the previous diagonal to the
bj are similar in size or chemical properties should have next diagonal represents a gap in one of the sequences
a higher probability of being evolutionarily related than being aligned. Associated with each individual alignment
those in which they are dissimilar.30 The mutation prob- is an alignment score
ability (Table 7–1)2 can also be used to weight ai ¥ bj.
The net effect of any one of the different weighting AS = ∑∑ cij – ∑ Pk (7–1)
schemes, or some combination of them, is to assign a i j k
number to every coefficient cij of the matrix. The magni-
tude of this number is thought to quantify either the where the respective sums are over all cij intersected by
effect of natural selection relative to chance on the par- the diagonal segments and over all gaps k that must be
Molecular Phylogeny from Amino Acid Sequence 353
inserted and Pk is a penalty assessed for creating the mum alignment score was 3.5 standard deviations larger
gap k. A computer can be programmed to find the path than the mean of the alignments of the jumbled
of diagonal segments through the matrix that has the sequences (Table 7–2). Unfortunately, there is no
largest alignment score,26 and this path produces the accepted level of statistical significance above which an
most appropriate alignment of the two sequences dic- alignment is judged to be real. Consequently, each
tated by the choice of weighting scheme and gap penalty. person is left to make her own decision. Two sequences
The penalty assessed for each gap is an estimate of of amino acids are considered to be homologous to each
the logarithm of the probability of a gap the length of other once the decision has been made that their align-
gap k appearing during evolution by natural selection on ment is statistically significant.
the same numerical scale used to assign the logarithms As the data bases from which candidates for align-
of the probabilities to each cij. It is possible to optimize ment are drawn become larger and larger, the risk that
such a gap penalty for the particular weighting scheme the alignment of the amino acid sequences of two unre-
being used.28 For example, it has been shown that when lated proteins will nevertheless be judged to be statisti-
values of 1 are assigned to identities and values of 0 are cally significant becomes greater. For example, if the
assigned to nonidentities, the appropriate gap penalty is lengths of the sequences are disregarded, in a data base
containing 100,000 amino acid sequences, each of the
P = 1.2 + 0.23l (7–2) amino acid sequences should be able to be aligned with
two or three other unrelated sequences in the data base
where l is the length of the gap. with alignment scores that are at least 4 standard devia-
The most important responsibility of an investiga- tions greater than the means of the jumbles.
tor who performs such a computation and produces an There is a frequently encountered sleight of hand
alignment with the maximum alignment score is to pro- that is practiced in the alignment of amino acid
vide an assessment of its statistical significance. The sequences and that violates the rules of statistics. This
accepted criterion for this assessment is a statistical eval- trick is to align two sequences and then select only the
uation of a set of alignments produced from randomly regions in which there is a higher frequency of coinci-
jumbled sequences of the same length and amino acid dence for the statistical test. Because the sample has
composition as the actual amino acid sequences.2 First been preselected, it usually shows a higher frequency of
the two actual sequences are aligned, and a maximum coincidences than occurs when jumbled sequences of
alignment score for the optimum alignment is calcu- the same small regions are compared. Ordinarily, statis-
lated. Then each of the two actual sequences is randomly tical evaluation of an alignment of two amino acid
jumbled a number of times to produce for each a set of sequences shorter than those of complete, naturally
nonsense sequences that have the same amino acid occurring, and logically defensible domains within the
composition and length as the actual amino acid native protein should not be accepted without the clos-
sequence from which they were generated. This pro- est scrutiny.
duces two sets of randomly jumbled amino acid At the present time, statistically significant align-
sequences, one derived from each of the two actual ments can be made only between two amino acid
sequences. All of the different combinations of one jum- sequences that have a percentage of identity of 15% or
bled sequence from one of these two sets and one jum- greater upon alignment.28,29,32 If a set of three or four
bled sequence from the other set are aligned by the same amino acid sequences can be assembled, however, that
algorithm that was used to align the two actual, unjum- are from a set of proteins that share some structural or
bled sequences, and a large number of maximum align- functional feature, it is often possible to demonstrate
ment scores for the nonsense sequences is gathered in with high statistical confidence that the members of this
this way. The mean and standard deviation of the align- set all share the same common ancestor even when pair-
ment scores of this collection of randomly jumbled non- wise comparisons between the members of the set fail to
sense sequences are calculated by the usual statistical demonstrate convincing homology.33,34 In these
formulas. instances, the statistical significance only becomes con-
The number of standard deviations that the align- vincing when the whole set is aligned together. Such
ment score for the two actual amino acid sequences lies methods for multiple alignment can detect with statisti-
above the mean for the maximum alignment scores for cal significance many more correct relationships
the jumbled sequences is a measure of the confidence between distantly related proteins than can pairwise
that can be assigned to the decision that the two actual alignments.35
sequences share a common ancestor and to the decision Although they also identify statistically signifi-
that the alignment has juxtaposed positions in the cant36,37 relationships among proteins, computational
sequence that have evolved independently from the procedures that rapidly search large banks of amino
same position in the ancestral sequence. For example, acid sequences should be distinguished from the com-
when human b2 microglobulin was aligned with the putational procedure for aligning two sequences by
k-constant region of human immunoglobulin, the maxi- using a complete matrix. Banks of the currently available
354 Evolution
amino acid sequences, such as the Swiss-Prot Sequence the score for this longer segment, based on the identities
Database (www.expasy.ch) and the Protein Sequence and replacements it contains, that must exceed the final
Database of the Protein Information Resource threshold. In the FASTA algorithm,41 regions within the
(http://pir.georgetown.edu), contain the sequences of new amino acid sequence and regions within an amino
hundreds of thousands of proteins. When the sequence acid sequence in the bank with the highest density of
of a new protein becomes available, it is not possible to identities are located. Ten of these regions are then
attempt a complete computational alignment between it trimmed until the portion of each giving the highest
and each of the proteins in such large collections. score is identified. All of these regions with scores above
Consequently, other strategies have been developed to a threshold are joined, and the gaps resulting from the
search such a bank and rapidly find as many candidates joining are penalized. It is the score for this rough align-
for a relationship as possible. Each of these candidates ment of a portion of the protein that must exceed the
can then be aligned by the standard matrix method with final threshold. The SSEARCH algorithm32,42 searches
the new amino acid sequence to validate statistically sig- directly each sequence in the bank for the segment that
nificant relationships. has the highest score when aligned with a segment of the
The methods that are used to search the banks take new sequence. Because each of these procedures focuses
advantage of the fact that evolution operates unevenly only on short segments of the sequences being searched,
over a sequence of amino acids. Because segments of the each misses some statistically significant matches, but
sequence in the core of the structure of a protein or in the advantage gained is that they are rapid enough that
functionally important locations change far less rapidly such searches of the large extant banks of sequences can
than regions on the surface;38 in distantly related amino be performed in a reasonable amount of time.
acid sequences, identities and conservative replace- Computer-assisted searches of banks and complete
ments tend to be clustered. For example, in the amino computational alignments have permitted the relatives of
acid sequences of the group of proteins containing the a newly sequenced protein to be located so that it can be
ATP-binding cassette, the sequence –SGCGKST–, or lim- joined with a known group. The most obvious successes
ited variations of it in which the first serine is replaced by occur when the new amino acid sequence is identical to
a proline or a threonine, the cysteine is replaced by a one that already is known, because this identification
serine, the second serine is replaced by a threonine or a often demonstrates that the same protein has two differ-
glycine, or the threonine is replaced by a serine or a glu- ent, unsuspected, and unconnected functions.43 For
tamine, appears in all of the members even though the example, it has been discovered that the protein respon-
rest of the amino acid sequences show low percentages sible for the function of neuroleukin, autocrine motility
of identity.39 Consequently, searching a bank for short factor, maturation factor, and myofibril-bound serine
segments of amino acid sequence that have a high endopeptidase inhibitor is glucose-6-phosphate isom-
degree of similarity with short segments in the new erase.44 It also happens that the amino acid sequence of
amino acid sequence has a significant probability of a protein of known function can be matched with known
locating relatives and has the advantage that it can be amino acid sequences of proteins with unknown function
done rapidly. and such an alignment gives a strong indication of the
The bank of amino acid sequences is searched for identity of that hitherto unknown function.45,46 When a
short segments of sequence that are similar to any of the new genome is sequenced, one of the banks is searched
segments of a certain length in the newly sequenced pro- for matches to the new amino acid sequences of each of
tein. The similarity is quantified by summing the weights the previously unidentified proteins it contains. For
given to each identity and each replacement in the example, when the genome of Archaeoglobus fulgidus had
aligned segments by use of the weighting scheme pre- been sequenced, it was found to encode 1797 proteins
ferred by the investigator. If the score for the match is that could be matched with amino acid sequences already
above a certain predetermined threshold, the amino acid known while it encoded only 639 proteins that could not
sequence in the bank containing this segment is judged be matched.47
to be a candidate for a relationship. The aligned amino acid sequences of polypeptides
The three most widely used algorithms for search- have been used to provide information about the speci-
ing banks of amino acid sequences differ only in how ation of organisms. The sequences of the same protein
they find the segments. In the BLAST algorithm,40 every from a set of different species, for example, the
sequence four amino acids in length that appears in the sequences of the cytochromes c from different eukary-
complete sequence of the protein is tabulated. The otic species, serve as the data on which such studies are
amino acid sequences in the bank are then searched for based. The goal of the exercise is to construct a phyloge-
segments identical or highly similar to one of the tabu- netic tree (Figure 7–3)48 that displays the evolutionary
lated segments. When such a match is found, the align- history of the species bearing that protein. The lengths of
ment is extended in both directions to find a longer the limbs in the tree are estimates of the evolutionary dis-
segment that has a high degree of similarity to the corre- tances between any two present-day species and their
sponding segment of the sequence being matched. It is common ancestor, represented by the node at which the
Molecular Phylogeny from Amino Acid Sequence 355
Figure 7–3: Phylogenetic tree48 for the cytochromes c from 53 species of eukaryotes. Each of the possible 1378 pairs of sequences was aligned
and the minimal mutational distance between each pair was tabulated. These numerical values were then adjusted statistically for mutations
that would have not left a trace to obtain estimates of evolutionary distances. The magnitudes of these corrections are indicated in the upper
right corner. These estimates of evolutionary distance were used to construct the phylogenetic tree. If one passes along the branches of the
tree between any two species, the numbers on the branches that are passed through sum to give the estimated evolutionary distance. For
example, the evolutionary distance between carp and dogfish is 67. The noted length of the branches, in evolutionary distance, and the posi-
tions of the nodes are determined by the most parsimonious sequence of events that satisfy the requirement that the evolutionary distances
from the alignments of the sequences of amino acids equal the distances along the branches. In this figure, the nodes connecting marsupi-
als (kangaroo) with the eutheria (the rest of the mammals), reptiles and birds with mammals, amphibians (frog) with amniotes (reptiles, birds,
and mammals), fish (tuna, bonito, and carp) with tetrapods (amphibians, reptiles, birds, and mammals), and cartilagenous fish (lamprey and
dogfish) with bony fish were placed on the scale of geologic time (millions of years) by using the dates from the fossil record at which the
respective divergence from a common ancestor took place. Adapted with permission from ref 48. Copyright 1976 Academic Press.
branches to those two species join. The evolutionary dis- ceptibility to replacement (Table 7–1), and each protein
tance between the amino acid sequences of two proteins has its own characteristic rate of change.24 Third, there
is the value of any quantity that is thought to be directly are examples of accelerated changes occurring along
proportional to the time that has elapsed since those pro- only one branch of a tree containing species that seem
teins shared a common ancestor. The first step in con- indistinguishable from each other. For example, rat
structing a phylogenetic tree is to estimate the ribonuclease seems to have accumulated replacements
evolutionary distances between each of the (n2 – n)/2 at 4 times the rate of its close relatives the ribonucleases
pairs of sequences in the set of n sequences. from mice, muskrats, and hamsters.50 Such accelerations
There are a number of problems involved in trans- in the replacement of amino acids are also observed in
forming the differences in the aligned sequences of two those members of a related set of proteins that happen to
proteins into an evolutionary distance.5,49 First, the be involved in the battle between a species and one of its
number of replacements observed in the two aligned pathogens because the weapons of such a battle are
sequences is always an underestimate of the number of often rapid changes in amino acid sequence either on the
replacements that have actually occurred since the two part of the pathogen to avoid the defenses or on the part
proteins diverged from a common ancestor because sev- of the attacked to reinstate defenses.51 Fourth, the size of
eral successive unregistered replacements at the same the population of a given species or its generation time
site have often taken place. Second, each position in the may affect the rate at which mutations become fixed.
two aligned sequences varies at a different rate (Figure This may explain why the sequences of the
7–1A), each type of amino acid has a characteristic sus- cytochromes c from three closely related strains of the
356 Evolution
bacterial genus Pseudomonas show as much variation in cally48,56 for all of these missing mutations to obtain esti-
their amino acid sequences (78–61% identity)52 as is mates of evolutionary distances (Figure 7–3).
shown between mammals and amphibians (82% iden- The major contributors to the minimal mutational
tity) or between mammals and insects (67% identity). distances calculated for each pair of aligned amino acid
These problems have been addressed with varying suc- sequences are the regions of the protein that have expe-
cess by the two methods used to estimate evolutionary rienced the greatest change over time. Unfortunately,
distance. One method relies on the percentage of identity these are also the most difficult segments of the amino
between the two aligned sequences; the other, on the acid sequences to align convincingly. As a result, the
minimal mutational distance between them. choice of the method used to align the sequences can
It has been shown, both theoretically53 and by sim- have a significant effect on the structure of the final tree.
ulation,54 that With this in mind, a method of progressive alignment of
amino acid sequences has been developed to provide
ln (1 + D ) the most suitable and internally consistent alignments of
q = (7–3)
2D a large collection of sequences of the same protein from
different species.57 The basis of this method is the
where q is the fraction of the positions in the alignment assumption from the beginning that all of the amino acid
occupied by identical, unreplaced amino acids and D is sequences to be aligned share a common ancestor and
the evolutionary distance. This equation corrects for the have diverged from that common ancestor along their
variations in rate of replacement both among the differ- own unique lineages. The most closely related sequences
ent positions in the two aligned sequences and among are aligned first, and the gaps in these more certain align-
the different types of amino acids and provides an esti- ments are retained as the more distant alignments are
mate of evolutionary distance from the percentage of made. This is advantageous because it is the uncertainty
identity. in the precise location of the gaps that must be inserted
The minimal mutational distance, however, to align distantly related sequences that creates the
focuses on the changes that have occurred rather than on greatest uncertainty in the final value for the minimal
the positions that have remained unchanged. In theory, mutational distance. An example of the product of this
there should be more information in these replacements method is the progressive alignment of the amino acid
of one amino acid for another because they are progres- sequences of 11 globins (Figure 7–4).57 The important
sive rather than static, but the corrections required to feature of these alignments is that the gaps are confined
account for the unrecorded changes that have occurred to specific locations rather than being more randomly
over time are inaccurate enough that the advantages of distributed as would result from simple pairwise align-
the greater information are significantly diminished. ments.
To calculate its minimal mutational distance, a pair The tabulated values of evolutionary distances are
of aligned amino acid sequences in the set is compared used to construct a tree the branches of which connect
position by position, and the minimum number of muta- the species being compared (Figure 7–3).48 The tree is
tions that had to be fixed to accomplish each replace- arranged so that the connections made produce the
ment is scored. Because of the redundancy of the genetic most parsimonious sequence of events that can repro-
code, these individual minimum numbers of mutations duce the observed evolutionary distances. The overall
are most accurately assessed if the actual codons used for length of the line segments connecting any two present
each amino acid are known from the nucleic acid day species is equal to the evolutionary distance between
sequence. These individual minimum numbers of muta- the two aligned sequences of the proteins from each of
tions are added together to obtain the minimum total them. The branching order in such a tree conveys a his-
number of mutations that had to be fixed to convert torical sequence of the relationships among the species
either of the two sequences into the other. This sum is represented, and these historical sequences seem to be
the minimal mutational distance between the two pro- reasonable, based on the fossil record and anatomical
teins.55 resemblances.
The actual number of mutations that were fixed in Usually the phylogenetic trees that are built from
the two lineages diverging from the common ancestor the amino acid sequences of only one protein, for exam-
represented by each of the comparisons between two ple, those of the cytochromes c (Figure 7–3), are unsatis-
species is almost always greater than the number calcu- factory. Often there are sequences of a particular protein
lated, even when the nucleic acid sequences are known, available for only a limited number of species. Often the
because mutations fixed in the past but then replaced by phylogenetic tree based on the amino acid sequences of
mutations fixed at the same position at a later date one protein disagrees with the phylogenetic tree based
cannot be scored. If the nucleic acid sequence encoding on the sequences of another.58 There are a number of
either protein is unknown, mutations to an alternative solutions to this problem. For example, a more compre-
codon for the same amino acid are also missed. The min- hensive and detailed phylogenetic tree of the eukaryotes
imal mutational distances must be corrected statisti- than the one displayed in Figure 7–3 has been built by
Molecular Phylogeny from Amino Acid Sequence 357
Figure 7–4: Multiple alignment of 11 globins by the progressive method.57 The amino acid sequences aligned were the g polypeptide of
human hemoglobin (hghu), the b polypeptide of human hemoglobin (hbhu), the a polypeptide of human hemoglobin (hahu), globin III from
Myxine glutinosa (heha), globin I from Petromyzon marinus (hbrl), human myoglobin (myhu), myoglobin from Cerithidea rhizophorarum
(mycr), globin II from Lumbricus terrestris (haew), globin I from Tylorrhynchus heterochaetus (hety), leghemoglobin from Phaseolus vulgaris
(gpfb), and bacterial hemoglobin from Vitreoscilla stercoraria (hbvs). Reprinted with permission from ref 57. Copyright 1987 Springer-Verlag.
combining alignments of the amino acid sequences of ancestor 130 million years ago (mya); mammals and
a-tubulins, b-tubulins, actins, and elongation fac- either reptiles or birds, 300 mya; amniotes and amphib-
tors 1a.59 The conflicts between three phylogenetic trees ians, 365 mya; and tetrapods and fish, 405 mya.49 The
for gnathosomes were resolved by considering the posi- respective nodes on the tree should fall at these dates
tions in the sequences of gaps, the patterns of alternative (Figure 7–3). Once the distances are calibrated, the times
splicing, and the distributions of introns in the genomic at which divergences unavailable in the fossil record
DNA.58 Nevertheless, disagreements on the branching have occurred can be estimated by extrapolation. To
order of phylogenetic trees, especially the most ancient, overcome the problems of the different rates of change
persist.60 from one protein to the other and rapid rate of change of
Because the rate of replacement varies dramatically a particular protein within a particular branch of the tree,
among the positions in the sequence of a protein (Figure these calibrations are usually performed with sets of
7–1A), minimal mutational distance changes more rap- amino acid sequences for as many proteins as possible
idly than percentage of identity over the short term, and (Figure 7–5).49 In this way, the final factor converting evo-
corrections of minimal mutational distance are also less lutionary distance to time should be as reliable as possi-
significant over the short term (Figure 7–3). Conse- ble to permit a realistic extrapolation beyond the fossil
quently, historical sequences based on minimal muta- record to be performed.
tional distance are preferred for examinations of recent Because the corrections required to convert mini-
speciation. For example, a detailed phylogenetic tree for mal mutational distances to estimates of evolutionary
the order of artiodactyls covering the last 50 million years distance become more significant and less reliable over
has been constructed from considerations of minimal the long term (Figure 7–3), estimates of evolutionary dis-
mutational distances for aligned fibrinopeptides61 and tance by percentage of identity (Equation 7–3) are pre-
pancreatic ribonucleases.62 In fact, ribonucleases with ferred for assigning a date to distant common ancestors.
the amino acid sequences predicted for common ances- For example, it has been estimated from percentage of
tors at the nodes on that tree were produced by site- identity that eukaryotes and archaebacteria diverged
directed mutation and shown to display the functional from a common ancestor 2.3 billion years ago.49
traits characteristic of artiodactyl ribonucleases. Estimates, however, from both percentage of identity49
The phylogenetic tree, however, in addition to the and minimal mutational distance56 agree that deuter-
historical sequence of events, conveys estimates of the stomes and protostomes diverged from a common
evolutionary distances from existing species to common ancestor 0.7 billion years ago.
ancestors. These evolutionary distances can be cali- At the point at which the lineages of two presently
brated (Figures 7–3 and 7–5)49 with estimates from the existing species diverged from their common ancestor,
fossil record of the time at which divergence occurred. the gene for a particular protein carried by the common
Eutheria and marsupials diverged from a common ancestor became two separate and disconnected genes,
358 Evolution
three isoenzymes of L-lactate dehydrogenase have been have evolutionary histories that are independent of the
identified. From this observation, it may be inferred that histories of the organisms carrying them.
an individual mammal or bird contains within its Just as was the case in the phylogenetic tree of the
genome three discrete genes encoding three discrete L-lactate dehydrogenases (Figure 7–6), when the three
L-lactate dehydrogenases. Complete sequences are avail- paralogous folded polypeptides that together form each
able for many of these proteins.66 A representative set of molecule of mammalian fibrinogen were aligned and a
these amino acid sequences have been aligned, and a phylogenetic tree was derived, it was found that they also
phylogenetic tree of minimal mutational distances has diverged from their common ancestor well before the
been constructed from these alignments (Figure 7–6).67 appearance of vertebrates, but Vertebrata is the only sub-
The tree suggests that the three isoforms of L-lactate phylum in which fibrinogen is found. These observations
dehydrogenase diverged from their common ancestor prompted a search for proteins in invertebrates that
before the appearance of the vertebrates. This conclu- share a common ancestor with fibrinogen, and one such
sion has been supported in a more extensive phyloge- protein of unknown function was discovered in an echi-
netic tree constructed from an even larger collection of noderm.68 This result suggests that a protein responsible
amino acid sequences of this protein.66 Because each of for one function can evolve from a protein responsible
these isoenzymes, or appropriate mixtures of them, are for another function. Another variation on this theme of
found in different tissues, it can be concluded that the the transformation of the function of one of the par-
natural selection which has produced them in their pres- alogues of the same protein has occurred during the evo-
ent guise has operated at the level of the tissue rather lution of the isoenzymes of malate dehydrogenase in
than that of the whole organism. To the extent that Trichomonas vaginalis. Alignments of amino acid
different tissues are constructed from different isoforms sequences demonstrate that two of the paralogues of
of the same proteins, these tissues can be considered to malate dehydrogenase in this species have recently
become L-lactate dehydrogenases.69 This shift in func-
tional properties also illustrates the fact that proteins
89.4 with different functions often share a common ancestor.
The genome of a particular species can encode two
20.2 or more paralogues of a given protein if the appropriate
69.3
gene duplications have occurred. It is the potential to
71.2 evolve into a new protein that distinguishes such a set of
51.6 20.2 paralogues from a single, unduplicated orthologue. As
43.4 long as there is only one gene for a given protein in the
20.8 genome, the protein that it encodes is required to per-
form its designated function. That protein evolves along
55.1 its lineage as its gene diverges into separate species, but
24.6 23.3
18.3 it must remain the same protein. If there are two or more
paralogues in each individual of a species, one of those
23.5 22.1 16.3 paralogues has the opportunity to become a new protein
28.4 24.5 20.3 with a new function. Even though paralogues usually
Dogfish Chicken Pig Chicken Pig Mouse Rat retain the same function and specialize to handle differ-
muscle muscle muscle heart heart testis testis ent situations, as in the case of the isoenzymes of L-lac-
A A A B B C C tate dehydrogenase, often one of them changes
Figure 7–6: Phylogenetic tree of seven isoenzymes of vertebrate sufficiently to perform another function as in the case of
67
L-lactate dehydrogenase. The phylogenetic relationship among chymotrypsinogen and haptoglobin (Table 7–2).
the seven proteins, namely, isoform A from the muscle of dogfish, One way to demonstrate that a paralogue, freed
isoform A from the muscle of chicken, isoform A from the muscle
of pig, isoform B from the heart of chicken, isoform B from the from the necessity of performing its function by the exis-
heart of pig, isoform C from the testis of mouse, and isoform C from tence of a sibling, is able to change sufficiently to perform
the testis of rat, is represented by the most parsimonious tree. The an entirely new function is by alignment of amino acid
number on each leg is the minimal mutational distance required to sequences. In addition to alignments of the same protein
account for the descent from the common ancestor, and the from different species and alignments of different iso-
number in italic type at each node is the average of the minimal
mutational distances to its descendents. The minimal mutational forms of the same protein, it has been possible to per-
distance in any one interval is not an integer because of averaging form statistically significant alignments of the amino
over all equally most parsimonious solutions for the topological acid sequences of different proteins (Table 7–2).25 Many
arrangement in which it is a participant. The total number of of these connections make sense from a functional
nucleotide substitutions in the entire set is 366. The count does not standpoint. For example, parvalbumin can be success-
include insertions or deletions. The root is arbitrarily placed
halfway between the two most distantly related groups. Reprinted fully aligned with troponin c (Table 7–2); hemoglobin,
with permission from ref 67. Copyright 1983 Journal of Biological with myoglobin (Table 7–2); the coiled coil of human
Chemistry. vimentin, with the coiled coil of human lamin A (36%
360 Evolution
Table 7–2: Examples of Pairs of Proteins Thought to Share a Common Ancestor on the Basis of an Alignment of Their
Amino Acid Sequencesa
a
Alignment was performed on a matrix where ai ¥ bj = 1 when ai = bj and ai ¥ bj = 0 when ai π bj and the gap penalty was 2.5. When ai = bj = cysteine, at ai ¥ bj = 2.0. Reproduced
from Doolittle.25 bTwo proteins the amino acid sequences of which were aligned. Number of amino acids in each protein is shown in parentheses. cPercentage of the posi-
tions in the aligned sequences at which the same amino acid was found in both sequences, based on the length of the shortest. dTotal number of gaps that had to be intro-
duced to get the best alignment. eDistance in standard deviations that the alignment score for the actual sequences was above the mean of the alignment scores of 36
comparisons of jumbled sequences. Represents only a portion of the entire sequence.
identity, 1.2 gap percent);70 vacuolar H+-transporting The identification of a set of paralogues in the same
two-sector ATPase from Daucus carota, with the b sub- species by alignment of their sequences often provides
unit of H+-transporting two-sector ATPase from Spinacia clues as to the functions of those members of the set that
oleracea (15 standard deviations above the mean of the have not yet been studied.76 For example, the fact that
jumbles);71 dihydrolipoyllysine acetyltransferase from three proteins of unknown function in E. coli were par-
Escherichia coli, with dihydrolipoyllysine succinyltrans- alogues of methylmalonyl-CoA mutase and acetate CoA-
ferase of E. coli (30% identity, 1.7 gap percent);72 tripep- transferase allowed them to be identified as a
tidyl-peptidase II from H. sapiens, with subtilisin from methylmalonyl-CoA decarboxylase, a succinate–propi-
Bacillus subtilis (34% identity, 5.3 gap percent);73 aceto- onate CoA-transferase, and another methylmalonyl-CoA
lactate synthase III from E. coli, with tartronate-semi- mutase.77
aldehyde synthase from E. coli (34% identity, 0.7 gap The more interesting though rarer connections,
percent);74 and 4-hydroxy-2-oxoglutarate aldolase from however, are those between functionally unrelated pro-
E. coli, with 2-dehydro-3-deoxy-phosphogluconate teins. For example, ovalbumin can be successfully
aldolase from E. coli (45% identity with no gaps).75 aligned with antithrombin III (Table 7–2); bovine angio-
Molecular Phylogeny from Amino Acid Sequence 361
genin, with bovine ribonuclease (33% identity, 3.2 gap this shuffle on a piece of paper. Align this jumbled
percent);78 chicken d2 crystallin, with human argini- sequence with the segment of real amino acid
nosuccinate lyase (69% identity, 0.2 gap percent);79 and sequence from the other cytochrome c by shifting and
glucarate dehydratase from Pseudomonas putida, with gapping until you think the alignment will give the
mandelate racemase from P. putida (23% identity, 5.6 highest alignment score. Record that score. Repeat the
gap percent).80 process five times. How do the alignment scores of the
All of these alignments demonstrate that the evolu- jumbled sequences compare to the alignment score of
tion of duplicated proteins is completely analogous to the one unjumbled sequence?
the evolution of species. The fixation of the two forms of
a duplicated gene within a population produces two par- Problem 7–3: On the basis of their locations in the crys-
alogues of the ancestral protein. As the paralogues of the tallographic molecular models of proteins, their struc-
protein evolve independently, they drift slowly from iso- tural roles, and their chemical properties, the amino
forms with the same function, to proteins with similar acids can be divided into three categories: hydrophobic,
but different functions, just as daughter species drift neutral, and hydrophilic. The hydrophobic amino acids
apart from each other, creating separate genuses. are isoleucine (I), valine (V), leucine (L), phenylalanine
Occasionally, a dramatic leap occurs, for example, the (F), cystine (C–C), methionine (M), and alanine (A). The
one turning argininosuccinate lyase into d2 crystallin, a neutral amino acids are glycine (G), cysteine (C), threo-
change resembling on a small scale the appearance of nine (T), tryptophan (W), serine (S), tyrosine (Y), and
chordates. Usually, however, the process is one of slow, proline (P). The hydrophilic amino acids are histidine
continuous divergence. The alignment of amino acid (H), glutamate (E), glutamine (Q), aspartate (D),
sequences gives only hints of the evolving pedigrees. A asparagine (N), lysine (K), and arginine (R).
more complete picture is seen only when the crystallo-
The following alignment is from Figure 7–1B:
graphic molecular models of proteins are superposed.
T F V Q K C A Q C H T V - - - - - - E N G G
Suggested Reading E F - N K C K A C H M I Q A P D G T D I I K
Feng, D.F., Johnson, M.S., & Doolittle, R.F. (1985) Aligning amino Because cytochrome c is a cytoplasmic protein,
acid sequences: comparison of commonly used methods, J. none of the cysteines participates in a cystine.
Mol. Evol. 21, 112–125.
Vogt, G., Etzold, T., & Argos, P. (1995) An assessment of amino acid (A) Construct a 16 ¥ 21 matrix on a sheet of graph
exchange matrixes in aligning protein sequences: the twilight paper for the two segments of sequence involved
zone revisited, J. Mol. Biol. 249, 816–831. in this alignment using the following rules:
(1) ai ¥ bj = 1 for an identity
Problem 7–1: Calculate alignment scores (Equation 7–1) (2) ai ¥ bj = 0.6 for hydrophobic ¥ hydrophobic
for the five cytochromes c aligned in Figure 7–9 based on (3) ai ¥ bj = 0.6 for apathetic ¥ apathetic
the rules that when ai = bj, ai ¥ bj = 1; when ai π bj, (4) ai ¥ bj = 0.6 for hydrophilic ¥ hydrophilic
ai ¥ bj = 0; and P = 1.2 + 0.23 l. (5) ai ¥ bj = 0.2 for hydrophobic ¥ apathetic
(6) ai ¥ bj = 0.2 for apathetic ¥ hydrophilic
Problem 7–2: This exercise will illustrate the method (7) ai ¥ bj = 0.0 for hydrophobic ¥ hydrophilic
for assessing the validity of a particular alignment of
two sequences. Pick a number between 1 and 80 at (B) Trace the alignment presented above through the
random and write it on a piece of paper. Turn to Figure matrix.
7–1 and the alignment of the two amino acid sequences
(C) Calculate an alignment score for that trajectory
of the cytochromes c from T. alalunga and P. denitrifi-
with the gap penalty of Equation 7–2.
cans, respectively. Start at the amino acid in the
sequence of the cytochrome c from T. alalunga corre- (D) What is the most serious difficulty with the rules?
sponding to the number you picked at random, and
write the next 20 amino acids in that sequence across Problem 7–4: From the genetic code, calculate the mini-
the page. Below this sequence write the corresponding mum number of base changes between the amino acid
amino acids of the aligned sequence from the sequences of the g and b polypeptides of human hemo-
cytochrome c of P. denitrificans, as in the figure. globin as they are aligned in Figure 7–4. To do this, make
Calculate an alignment score by the rules in Problem a list containing all of the replacements between the two
7–1 for these two segments of aligned sequences. Take sequences, find the minimum number of base changes
20 playing cards and to each of them assign one of the required for each, and add up the individual minimum
20 amino acids in the amino acid sequence from the base changes to obtain the total.
cytochrome c of the amino acid sequence with the least
number of gaps. Shuffle the cards well and deal them Problem 7–5: The sequences of the fibrinopeptides A
into a row. Copy out the jumbled sequence dictated by and B from a series of primates are given in the table
362 Evolution
below.81 Construct a tree of minimal mutational dis- coincide is usually quantified by the root mean square
tances. Treat a gap as if it were two base changes. deviation. The root mean square deviation is the square
root of the mean of the values for the squares of the dis-
primate fibrinopeptide A fibrinopeptide B
tances between only those pairs of a carbon atoms des-
ignated as belonging to the cores. Consequently, both
green monkey ADTGEGDFLAEGGGVR PCAa-GVNGNEEGLFGGR the root mean square deviation and the percentage of the
human ADSGEGDFLAEGGGVR PCA-GVNDNEEGFFSAR
total number of a carbons that were included in the
drill ADTGDGDFITEGGGVR PCA-GVNGNEEGLFGGR
macaque ADTGEGDFLAEGGGVR NEESLFSGR cores from the two crystallographic molecular models
chimpanzee ADSGEGDFLAEGGGVR PCA-GVNDNEEGFFSAR must be noted (Table 7–3). For example, in the superpo-
a
sition of porcine pancreatic elastase and tryptase from
Pyrrolidone-5-carboxylic acid, a cyclized form of glutamine.
rat mast cells (Figure 7–7A), 65% of the a carbons were
included in the two cores and they aligned with a root
mean square deviation of 0.07 nm. When the crystallo-
Molecular Phylogeny from Tertiary Structure graphic molecular model of 4-a-glucanotransferase from
Thermus aquaticus was superposed in turn upon the
Just as the amino acid sequences either of two different crystallographic molecular models of nine amylases
proteins or of the same protein from two different from bacteria, fungi, and mammals, 270–320 aa (41–65%
species can be aligned, so can their tertiary structures be of the entire amino acid sequences of these proteins) was
superposed.82 The crystallographic molecular models of designated as belonging to the cores, and those amino
two proteins that have tertiary structures so similar to acids in the cores aligned with root mean square devia-
each other that they are thought to share a common tions between 0.29 and 0.35 nm.93
ancestor are chosen for comparison. Those pairs of An alternative method of quantifying the superpo-
respective a carbon atoms that unambiguously occupy sition of two crystallographic molecular models is to note
equivalent positions in equivalent strands of secondary the percentage of the total number of a carbons in the
structure in the two crystallographic models are identi- superposition that lie less than a certain distance from
fied statistically83 and designated as forming the cores of their equivalenced partners. For example, in the super-
the structures to be superposed. To superpose these two position of creatinase from P. putida and methionyl
crystallographic molecular models is to translate and aminopeptidase from E. coli (Figure 7–7B), 86% of the
rotate one of them relative to the other until the sum of a carbons lie less than 0.25 nm from their partners.
the squares of the distances between these pairs of As two proteins diverge steadily from a common
equivalenced a carbon atoms in the cores of the two ancestor, first as orthologues in different species, then,
structures is minimized. Because two different crystallo- following gene duplication, as paralogues of the same
graphic molecular models are being superposed, the protein, then as paralogues with related functions such
structures never coincide exactly, even if they are of the as adenosine kinase and ribokinase, and finally as par-
same protein. The two proteins the crystallographic alogues with unrelated functions such as phosphoribo-
molecular models of which are being compared are con- sylamine–glycine ligase and glutathione synthase, their
sidered to be homologous once a decision has been structures drift apart from each other by greater and
made by the investigator that the superposition of the greater deviations (Table 7–3).
two crystallographic molecular models is significant Whenever the crystallographic molecular models of
enough to demonstrate that they share a common two proteins, the amino acid sequences of which can be
ancestor. aligned computationally with statistical significance,
An example of such a superposition is that between have been compared, they could always be unambigu-
porcine pancreatic elastase and tryptase from rat mast ously superposed.94–100 The root mean square deviation
cells (Figure 7–7A).83,84 These proteins are both serine of a pair of superposed crystallographic molecular
endopeptidases sharing a common enzymatic mecha- models can be plotted as a function of the percentage of
nism, their amino acid sequences are readily aligned identity between the respective aligned amino acid
(33% identity, 2.5 gap percent), and the superposition sequences (Figure 7–8).101,102 When the percentage of
confirms the fact that they share a common ancestor. A identity in just the segments chosen as the core reaches
more distant relationship was validated by the superpo- around 15%, the range in which statistically meaningful
sition of a portion (amino acids 157–403) of creatinase alignments of complete amino acid sequences can no
from P. putida and methionyl aminopeptidase from longer be made, the root mean square deviation of the
E. coli (Figure 7–7B).85 Although both of these enzymes a carbons of the amino acids in the core (50% or greater
catalyze the hydrolysis of an amide and their amino acid of the total a carbons) was only 0.2 nm, and the topolog-
sequences can be aligned computationally, their respec- ical similarity between the structures being compared
tive substrates are quite different from each other and was still unmistakable.
the alignment is marginal (17% identity, 1.9 gap percent). Consequently, structural superpositions are able to
The degree to which two superposed structures establish more distant evolutionary relationships than
a
ribokinase E. coli89
Figure 7–7: Superposition of the a carbons of crystallographic
proteins superposed
molecular models. (A) Superposition of the a-carbon diagrams of
hexokinase H. sapiens
hexokinase S. cerevisiae87
0.50
0.38
0.36
0.23
0.24
0.20
0.13
0.11
graphic molecular model were determined to be respectively
equivalent and the two models were rotated and translated until
rmsb (nm)
the root mean square deviation for these 218 pairs was minimized.
The numbering is for the methionyl aminopeptidase.
57
64
84
50
91
44
80
90
a carbonsc (%)
the respective cores that were considered to be equivalent during the superposition. cPercent of the total
number of a carbons in the two crystallographic molecular models that were designated as being in the
equivalent a carbons in the cores of the two structures. bRoot mean square deviation between atoms in
The two crystallographic molecular models were superposed by use of the respective coordinates of
B Table 7–3: Levels of Coincidence for the Superposition of Crystallo- A
190 2 190 2
142 142
151 151
40 222
104 104 40 222
150
260 14 260 14 150
248 248 58
58
141 16 190 141 16 190
20
20
160 210 160 210 90
80 31 100 90
170 80 31 100
51 51 180 170
161
230 230 161 180
211
72 72 110 230 110 211
200 230
29 29 200
114 40 114 40 130
120 120 130
49 120 49
Molecular Phylogeny from Tertiary Structure
120
89 89
240 240
363
364 Evolution
those established by alignment of amino acid sequences. through a valid alignment of the two amino acid
For example, by superposing crystallographic molecular sequences, reliable conclusions can be drawn about the
models it could be shown that adenylyl-sulfate kinase, unknown tertiary structure by a comparison with the
adenylate kinase, guanylate kinase, and 6-phospho- known tertiary structure.
fructo-2-kinase are all members of a large group of Cytochromes c are present in all organisms, they
kinases that share a common ancestor.103 This ability to are small proteins, and their structures have changed at
recognize distant relationships has permitted the history a rate such that comparisons between them illustrate
of the evolution of proteins to be traced just as the align- many of the facts that can be learned from superposition
ment of amino acid sequences has permitted the history of tertiary structures. The eukaryotic cytochromes c are
of the evolution of species to be traced. indistinguishable from each other in tertiary structure,106
From Figure 7–8, it can be concluded that whenever if those from rice and tuna are assumed to represent the
two proteins have amino acid sequences that can be evolutionary extremes. Consequently, the eukaryotes
aligned with statistical significance, they will also have can be represented by the crystallographic molecular
superposable tertiary structures. For example, the fact model of the protein from the tuna (structure C in Figure
that the amino acid sequences of the three enzymes Ca2+- 7–9).107,108 The other four of the five a carbon drawings of
transporting ATPase, Na+/K+-exchanging ATPase, and the crystallographic molecular models in Figure 7–9 are
H+/K+-exchanging ATPase can be aligned104 demon- those of four bacterial cytochromes c: the one from
strates that their tertiary structures are superposable.105 Chlorobium thiosulfatophilum (Figure 7–9A), the one
This rule is important because far more sequences are from Pseudomonas aeruginosa (Figure 7–9B), the one
available than tertiary structures. If the amino acid from Rhodospirillum rubrum (Figure 7–9D), and the
sequence of a protein the crystallographic molecular one from P. denitrificans (Figure 7–9E). A similar com-
model of which is unavailable can be related to a protein parison of the crystallographic molecular models of four
for which a crystallographic molecular model is available other bacterial cytochromes c with that of a eukaryotic
cytochrome c is also available.109
When the drawings of the crystallographic molecu-
lar models of the cytochromes c (Figure 7–9) are com-
pared, the reason for the gaps in their aligned sequences
(lower part of Figure 7–9) is immediately apparent. They
Root mean square deviation
A D
S
S
S S N
S N N
N
C
5 8
87
15
S
92 97
S
70
N
B S N
103
73 67 E
23
59 34
77 37 46
S
S
S
42
S
54 N
S N S
N N
10 20 30 40 50
A YDAAAGKATYDAS–CAMCH––––––-––––KTGMMGAPKVGDKAAWAPHI––––––––––––––––––
B EDPEVLFKNKGCVACHAI––––-––––DTKMVGPAYKDVAAKFAGQA––––––––––––––––––
C GDVAKGKKTFVQK–CAQCHTV–––-––ENGGKHKVGPNLWGLFGRKTGQAEGYSYTDANKS–––––KG
D EGDAAAGEK––VSKKCLACHTF––––-–DQGGANKVGPNLFGVFENTAAHKDDYAYSESYTEM––KAKG
E QDGDAAKGEKEF––NKCKACHMIQAPDGTDIIKGGKTGPNLYGVVGRKIASEEGFKYGEGILEVAEKNPD
60 70 80 90 100
A –––AKGMNVMVANSIKGYK–––––––GTKGMMPAKGGNPKLTDAGVGNAVAYMVGQSK
B ––GAEAELAQRIKNGSQGV–––––––WGPIPMPPNAVS––––DDEAQTLAKWVLSQK
C IVWNNDTLMEYLENPKKYI––––––––PGTKMIFAGIKKK–––GERQDLVAYLKSATS
D LTWTEANLAAYVKDPKAFVLEKSGDPKAKSKMTFKLTK––––DDEIENVIAYLKTLK
E LTWTEADLIEYVTDPKPWLVKMTDDKGAKTKMTFKMGK––––––NQADVVAFLAQNSPDAGGDGEAA
Figure 7–9: Tertiary structures of five cytochromes c.107,108 Ribbon diagrams with creases at each a carbon were made from the crystallo-
graphic molecular models of (A) cytochrome c-555 from Chlorobium thiosulfatophilum, (B) cytochrome c-551 of Pseudomonas aeruginosa,
(C) cytochrome c of tuna mitochondria, (D) cytochrome c2 of R. rubrum, and (E) cytochrome c-550 of P. denitrificans. The hemes, the iron
cations of which are each liganded by a methionine and a histidine, and a conserved phenylalanine are also drawn. The sequences of these
five cytochromes c are structurally aligned at the bottom of the figure (identified by the respective letters). The alignment of R. rubrum is rep-
resented by the dot matrix in Figure 7–2. Filled portions of the ribbon diagrams highlight loops representing insertions into the basic struc-
ture. The central structure for tuna cytochrome c is numbered with the same numbers as those used in the alignments. Reprinted with
permission from ref 107. Copyright 1982 Academic Press.
366 Evolution
short segment, but the one extra amino acid in tryptase 3-phosphate dehydrogenase.120 In such structural align-
between positions 180 and 188 seems to be more disrup- ments between distantly related sequences, there are sig-
tive. If the loops become long enough they can assume nificantly more gaps than would have been inserted
structure of their own. The extra amino acids between during a computational alignment because gaps are
positions 203 and 266 in methionyl aminipeptidase from more readily recognized. For example, the structural
Pyrococcus furiosus, relative to methionyl aminopepti- alignment of coat protein VP1 from Mengo virus and coat
dase from E. coli, form an almost spherical knob con- protein from human rhinovirus 14 (13% identity) has a
taining three a helices and some random meander 4.9 gap percent, and eight of the 14 gaps are only one
sitting on the surface of the common structure.111 The amino acid in length.121 When crystallographic molecu-
extra amino acids between positions 180 and 317 in lar models are available for many of the members of a
serine-type carboxypeptidase from S. cerevisiae relative large group of related proteins, structural alignments of
to the structures of other members of the group of related the amino acid sequences of those members for which
hydrolases form a compact globular structure, but a long molecular models are available can be combined with
loop of polypeptide protrudes from it and wanders over multiple computational alignments of the sequences of
the surface of the common structure.112 The events that the rest of the members of the group to produce a statis-
produced such insertions can be mimicked by inserting tical template that can be used to search databases for
segments of synthetic DNA into the gene encoding a pro- additional as yet unrecognized members of the group.122
tein and expressing the elongated version. When these It is the ability to perform structural alignments of
inserts are placed at a location where they can loop out amino acid sequences based on superpositions that
of the original structure, they cause little alteration in allows the frequencies with which one amino acid is sub-
that structure.113 stituted for another in more distantly related proteins to
The structural alignment of the five amino acid be ascertained.123 In one study, crystallographic molecu-
sequences presented in Figure 7–9 is based on superpo- lar models of 235 proteins from 65 different groups were
sitions114,115 of the five crystallographic molecular chosen for superposition.124 All of the proteins within
models. A structural alignment of two amino acid each family were superposed, and their amino acid
sequences is an alignment in which two respective sequences were structurally aligned. For each type of
amino acids that occupy the same positions upon super- amino acid in the alignments, the number of times that
position of the two respective crystallographic molecular it shared that position with itself or with each of the other
models also occupy the same position in the alignment. amino acids was tabulated. From this tabulation, the fre-
A structural alignment of amino acid sequences quency of substitution for each amino acid could be cal-
often differs significantly from a computational align- culated (Table 7–4).
ment of the same sequences. For example, a previously The table of the frequency of substitution in the
performed computational alignment of human basic more distantly related proteins (Table 7–4) complements
fibroblast growth factor and human interleukin 1b the tabulation of mutation probabilities for closely
agreed with the subsequent structural alignment from related proteins (Table 7–1). Over the longer term, there
positions 90 to 144 but was out of register with the struc- seems to be a stronger preference for maintaining the
tural alignment by at least seven amino acids between hydropathy of a position. This enhanced preference
positions 20 and 89.116 It has been argued that when they probably arises from the fact that enough time has
are available, structural alignments of amino acid elapsed that significant substitutions have accumulated
sequences are more reliable than computational align- in the more intolerant sites. In addition to its hydropathy,
ments. This is an interesting argument because it the size of the amino acids has a significant effect on the
assumes that no relative movement of b strands relative frequencies at which specific replacements occur. Small
to each other or advancement of the screw of an a helix amino acids are usually replaced by small amino acids;
has occurred during the divergence from a common the large aromatic amino acids are most often replaced
ancestor. There is some support for this assumption.117 by other aromatic amino acids.
Because the tertiary structures of proteins change Each amino acid displays a characteristic tolerance
more slowly than their amino acid sequences, a struc- to replacement by any amino acid other than itself. The
tural alignment based on a superposition can be per- most peculiar amino acids, cystine, tryptophan, and
formed between the amino acid sequences of two glycine, are the most intolerant of replacement. Cystine
proteins that diverged from their common ancestor so is so intolerant that it is deleted more than twice as often
long ago that a computational alignment would be as it is replaced by alanine, the most common substitu-
insignificant. For example, the amino acid sequence of tion. Proline is also deleted more often than it is replaced
phosphopyruvate hydratase can be structurally aligned by any other amino acid.
with that of mandelate racemase;118 that of P1 nuclease The observed frequencies of substitution listed in
from Penicillium citrinum, with that of phospholipase C Table 7–4 do not reflect the full effects of the steric and
from Bacillus cereus;119 and that of aspartate-semialde- chemical properties of each side chain on its capacity to
hyde dehydrogenase, with that of glyceraldehyde- replace another or its tolerance to replacement because
Table 7–4: Frequency with Which Substitutions Occur in Distantly Related Proteinsa
ccyb (0.9)c trp 1 (1.6) gly 4 (8.8) cysg (1.2) pro 4 (4.5) tyr 2 (3.8) leu 6 (7.6) phe 2 (3.9) val 4 (7.3) his 2 (2.2) asp 2 (6.0)
ccy 88.0 trp 55.0 gly 53.0 cys 46.0 pro 44.0 tyr 43.0 leu 41.0 phe 40.0 val 37.0 his 36.0 asp 35.0
gap 3.0 phe 8.2 ala 6.8 ala 9.6 gap 7.3 phe 8.1 val 10.9 leu 11.5 ile 11.8 asn 5.7 ser 8.3
ala 1.4 tyr 6.6 gap 6.7 val 7.9 ser 6.8 gap 4.9 ile 8.8 tyr 8.6 leu 11.6 lys 5.5 gap 7.2
phe 1.0 leu 6.2 ser 6.5 thr 4.7 ala 6.0 val 4.7 phe 5.6 ile 5.4 ala 7.1 gln 5.2 asn 6.9
glu 0.9 gape 2.9 asp 3.1 ser 4.0 gly 4.7 leu 4.4 met 4.4 val 5.4 thr 4.4 gap 4.8 glu 6.9
— —d val 2.3 asn 2.8 gly 3.8 lys 4.5 ser 4.0 — — ala 3.7 ser 3.3 ser 4.8 gly 5.2
val 0.3 ile 2.2 — — gap 2.8 — — thr 3.5 gln 1.4 — — gap 3.2 asp 4.3 ala 5.2
asp 0.2 ser 2.2 tyr 1.1 leu 2.7 tyr 0.8 — — arg 1.3 asp 1.0 — — — — — —
his 0.1 ala 2.1 his 0.9 — — his 0.7 met 1.2 asn 1.3 glu 0.9 arg 1.0 pro 1.6 leu 1.2
asn 0.1 gly 2.1 phe 0.5 ccy 1.0 trp 0.3 gln 1.2 glu 1.2 gln 0.9 his 0.8 ile 1.5 phe 0.7
trp 0.1 — — met 0.5 his 0.7 met 0.2 pro 0.9 asp 0.8 arg 0.8 cys 0.6 met 0.9 met 0.4
tyr 0.1 gln 0.6 trp 0.4 glu 0.6 cys 0.2 cys 0.3 his 0.7 ccy 0.4 trp 0.5 trp 0.6 trp 0.4
pro 0.1 cyxf 0.2 cyx 0.3 trp 0.4 ccy 0.0 ccy 0.0 cyx 0.3 cys 0.3 ccy 0.1 cyx 0.3 cyx 0.2
ser 6 (7.4) thr 4 (6.3) ile 3 (5.2) arg 6 (3.7) lys 2 (5.9) gln 2 (3.6) ala 4 (8.4) glu 2 (5.1) asn 2 (4.7) met 1 (1.9)
ser 33.0 thr 32.0 ile 31.0 arg 31.0 lys 30.0 gln 30.0 ala 30.0 glu 29.0 asn 24.0 leu 20.9
thr 10.2 ser 14.1 val 18.1 lys 11.0 gap 6.9 glu 7.3 ser 8.7 asp 8.6 ser 10.9 met 20.6
a
Structural alignments were performed of the amino acid sequences within each of 65 groups of proteins. Each of the 65 groups contained a different set of superposable proteins. For each of the 21 amino acids, the
frequencies with which it was paired with itself or with each of the other 20 amino acids over all of the structural alignments were calculated.124 The number to the right of each amino acid is the frequency (in per-
cent) with which it was paired with the amino acid at the top of the column. bCystine. cThe number in boldface type at the head of each column is the number of codons that encode that amino acid, and the number
in parentheses is the percentage in which it occurs in the amino acid composition of the complete data set (208,000 amino acids). dThe horizontal lines divide the amino acids with the highest frequencies of replace-
ment from those with the lowest. All of the amino acids that are not listed in a given column have frequencies between the highest of the lowest group and the lowest of the highest group. eFrequency with which a
gap occupied the aligned position. fFrequency of cysteine plus that of cystine. gCysteine; there are two codons for cysteine and cystine combined. The decision between cysteine and cystine is made post translationally.
367
368 Evolution
they are biased. The most important of these biases is smaller numbers of alignments. Weighting schemes that
that of the number of codons for each of the amino acids were not based on frequencies of replacement and iden-
(boldface numbers next to their abbreviations at the tity, however, did almost as well as those that were. Even
head of each column in Table 7–4). It has been shown6 the best weighting scheme was unable to align sequences
that the frequency with which an amino acid occurs in (<30% correctly matched positions) when the percentage
the overall population of proteins (number in parenthe- of identity fell below 15%.
ses next to the number of codons in Table 7–4) is roughly When the procedures used to search databanks
proportional to the number of its codons (compare the were evaluated,32 the success with which they found
numbers in boldface type with those in parentheses). matches was assessed by comparing coverage to error
Consequently, the number of codons for an amino acid rate. Coverage is the fraction of the known relatives that
must significantly affect its frequency of substitution had scores above the chosen threshold. Error rate is the
over the long term. For example, if the frequencies of percentage of the unrelated amino acid sequences in
substitution for leucine in Table 7–4 were corrected only the bank that had scores above the chosen threshold.
for number of codons of each of the amino acids, con- As the threshold was raised, the error rate decreased, but
servation would still have the highest frequency but by so did the coverage. In plots of error rate against cover-
much less of a margin, and the preferred substitution, by age, the three currently used algorithms, WU-BLAST2,
almost a factor of 2, would become methionine, with FASTA, and SSEARCH, all performed equivalently. When
valine, isoleucine, and phenylalanine tied for second a bank containing sets of sequences that were known to
place. This result would make sense because methionine be related by superposition but in which none of the
has the hydrophobic side chain that is the most similar to members had a percentage of identity greater than 40%
that of leucine. In addition to the number of codons, was chosen for the searches, at a threshold producing an
however, other factors such as the number of base error rate of 1%, the coverage was only 0.18. In other
changes for a given substitution and the frequency with words, from a bank containing 100,000 amino acid
which the codons are used by a given species are also fac- sequences, the search would give 1000 false matches but
tors affecting the frequency of substitution. Because miss 82% of the real relationships with percentage of
these effects have yet to be quantitatively assessed, no identities less than 40%.
corrections were applied to the directly observed fre- There are several different levels of agreement
quencies of substitution in Table 7–4. within the superposition of two crystallographic molecu-
Structural alignments also allow the success of lar models. In the core of the structure, where changes
computational alignments and the procedures for occur less rapidly, the superposition is usually accept-
searching banks of amino acid sequences to be evalu- able (Figure 7–7). When the two proteins are functionally
ated. To perform such an evaluation, a bank of amino related, the segments of the polypeptide in the core that
acid sequences of only those proteins for which crystal- participate in this common function usually superpose
lographic molecular models are available is assembled. the most precisely.125 For example, the segments
The amino acid sequences in the bank can then be dis- between positions 51 and 56, 102 and 108, and 195 and
tributed into groups within which each member can be 200 in the superposition of tryptase and elastase (Figure
superposed on every other member. Consequently, all of 7–7A) contain the histidine, the aspartate, and the serine,
the members of one of these groups share a common respectively, that participate in their common mecha-
ancestor. How many of these established relationships nism. As more and more replacements of amino acids
can be detected by a particular algorithm operating only accumulate, the steric effects of these changes cause the
on the amino acid sequences? backbone to shift to accommodate them. For example,
When computational alignments were evalu- the respective replacement of a serine and a valine in rice
ated,28 the success with which they aligned two cytochrome c with a threonine and an isoleucine, both
sequences known to share a common ancestor was larger side chains, in tuna cytochrome c causes a dis-
quantified as the percentage of the positions aligned placement further into the solvent of the polypeptide to
structurally that were also aligned statistically. It was the exterior of this substitution.126 The significant shifts
found that a greater percentage of the positions was cor- of secondary structure in the polypeptide between cre-
rectly matched when weighting schemes were used that atinase and methionyl aminopeptidase (Figure 7–7B)
assigned values to the ai ¥ bj other than just 1 and 0, but within the core of the common structure result from the
the improvement at best was only 1.25 times (from a 51% accumulation of such steric effects. Flexible loops such
rate of success to a 64% rate of success). It made no dif- as the one between positions 32 and 42 in the superposi-
ference whether the weighting scheme was based on fre- tion of tryptase and elastase (Figure 7–7A) often differ
quencies of identity and replacement from entirely dramatically in their disposition, but such differences
computational alignments (Table 7–1) or from entirely may reflect only the effects of crystal packing that pins
structural alignments, but weighting schemes based on down an otherwise fluctuating structure.127
frequencies drawn from larger numbers of alignment As one traces the polypeptides through the super-
performed somewhat better than those drawn from posed a carbon diagrams (Figure 7–7), the distance
Molecular Phylogeny from Tertiary Structure 369
between the backbones fluctuates as one moves through were tabulated (Table 7–5). This list represents a combi-
the core, out through the loops, and back into the core. nation that has been retained since the time that all of
These fluctuations can be represented graphically in a these myoglobins shared a common ancestor. Positions
plot of the distance between the paired a carbons as a marked (Hb) in Table 7–5 are those invariant among
function of their position in the amino acid sequence.128 mammalian hemoglobins and myoglobins, and they rep-
The globins are a group of the same proteins and resent amino acids that have been retained for an even
their isoforms from different species, for which many longer period of time. Finally, the amino acids that
sequences are available. They include myoglobins, appear at these 82 positions in the nine globins aligned
hemoglobins, erythrocruorins, and leghemoglobins. The by superposition have been entered into the tabulation.
details of the variations that occur in the tertiary struc- An examination of Table 7–5 reinforces several features
ture of a protein as amino acids are slowly replaced at the of the atomic structure of molecules of protein.
toleration of natural selection have been examined by Positions in the sequence that are buried in
superposing the nine available crystallographic molecu- hydrophobic clusters are the most conserved. Usually
lar models of different globins117 and using these super- three or four members of the group isoleucine, valine,
positions to align their amino acid sequences.117 Each phenylalanine, leucine, methionine, and alanine (Table
globin is formed from eight a helices stacked one upon 7–4) will substitute among themselves in this role, but
the others as a bundle of sticks would be in a fire. As the occasionally only one or two are suitable. For example,
sequences of the globins have varied, the interdigitations only leucine is found at position 2NA and only valine or
of the amino acid side chains situated between the isoleucine at position 11E in the globins. These two pref-
a helices has adjusted to accommodate changes in their erences presumably reflect the constraints of the intri-
size, and this has caused the helices to shift as rigid cate, interlocking stereochemistry in the interior. In two
bodies with respect to each other. These adjustments are locations, positions 1CD and 4CD, only phenylalanine is
necessary because the amino acid side chains between found among all the globins, and presumably in this
the a helices are tightly packed together and many location the flat disk of the phenyl ring is essential to
atomic contacts occur. As the shifting proceeds, accom- maintain the structure. The phenylalanine at posi-
modating changes in size, the individual pairs of atomic tion 1CD is stacked upon the heme.
contacts persist between two amino acids at different There are usually a number of locations in the
positions in the amino acid sequence but next to each structure of a protein where difficulties resulting from
other in the tertiary structure even though the identities the packing of the backbone of the polypeptide arise. At
of the amino acids themselves change. position 2C in the globins, a proline seems essential to
About 60 amino acids out of the 140 in the polypep- enforce a sharp turn. When two strands of polypeptide
tide of a typical globin remain in equivalent locations in are forced too closely together, these tight locations,
the nine superposed crystallographic molecular models such as positions 6B, 8E, 5F, and 7H in the globins, are
and account for the core of the native structure. Only half occupied by glycine, proline, alanine, serine, or threo-
of these are buried; the ones on the surface remain fixed nine (Table 7–5). Both serine and threonine, by forming
because they are within a helices that are themselves hydrogen bonds to acyl oxygens (Figure 6–7A), are able
rigid structures. The regions in which the greatest varia- to hug the polypeptide. Tight fits can also result from
tion in sequence and tertiary structure occurs are in the the juxtaposition of a large and bulky amino acid. The
seven loops connecting the eight helices. This is due to amino acid at position 16E is crowded by the trypto-
their almost exclusive location at the surfaces of the phan at position 12A in both hemoglobin and myoglo-
molecular models but may also reflect the changes in the bin.
end to end distances between the a helices that were There are several instances in which side chains cap
required to accommodate the slow shifts of the helices one end or the other of an a helix; for example, Serine 1A,
relative to each other as the packing among them has Threonine 4C, Serine 1E, or Tyrosine 23H. It is often
been altered by the substitutions. stated (Table 7–5) that this arrangement has the effect of
These observations suggest that the degree of con- initiating the a helix. The fact that at position 4C other
servation that is displayed by a position in the sequence globins lack an amino acid capable of forming a hydro-
of a protein may provide an indication of its location in gen bond and still contain the a helix suggests that the
the tertiary structure. Positions showing the least toler- assignment of such a purpose in this case is an over-
ance to replacement are often located on the interior of statement. Remarkably, four pairs of participants in ion-
the protein and those displaying the greatest tolerance ized hydrogen bonds between side chains on the surface
tend to be located on flexible surface loops, but the ten- of myoglobin are invariant in the short term. When these
dency is not overwhelming. particular interactions are examined, however, over all
A crystallographic molecular model of myoglobin nine of the globins, which represent a much longer his-
from the sperm whale has been prepared,94 and the tory of evolution, all of these hydrogen bonds are found
structural roles of the 82 invariant amino acids among to be dispensable (Table 7–6).
the 24 myoglobins that had been sequenced at the time A deeply buried position in the sequence of a folded
370 Evolution
polypeptide remains invariably hydrophobic, but a The globins also provide a particularly informative
buried location near the surface will occasionally erupt example of the focused constraints that natural selec-
toward the water. For example, at position 65 in tion places on the gradual shifts in position among seg-
cytochrome c (Figure 7–1) an arginine appears at a loca- ments of secondary structure during evolution. The
tion usually occupied by hydrophobic amino acids. invariant feature of both the structure and the function of
Presumably the alkane portion of the arginine traverses a globin is the heme (Figure 4–18). The only functions of
the hydrophobic region and the guanidinium can push a globin are to provide a fifth ligand to the iron, to make
through the surface into the solvent. its heme soluble in water, and to prevent its heme from
Often a hydrophilic location on the exterior is occu- approaching another heme too closely. Through all of
pied by a hydrophobic amino acid. For example, posi- the alterations encountered during evolution, the amino
tion 9A in the globins (Table 7–5) is on the exterior of the acids responsible for surrounding the heme and sup-
protein and is usually occupied by hydrophilic amino porting it within the protein were required by natural
acids, but in the myoglobins it is occupied by leucine. selection to maintain these roles. The record of this series
Because such a substitution has no effect on the free of accommodations can be inferred from superposing
energy of folding for myoglobin compared to the other crystallographic molecular models of present globins so
globins because the leucine is solvated equivalently in that their hemes are made to coincide rather than their
both the unfolded and folded polypeptide, such polypeptides. The situation is most graphically illus-
exchanges are common during evolution. There is, how- trated when the a subunit from equine hemoglobin is
ever, a price to be paid for such an exchange because a superposed in this way on leghemoglobin from Lupinus
hydrophobic amino acid that replaces a hydrophilic luteus (Figure 7–10).117,130 Over this long period of evolu-
amino acid on the surface of a protein makes it less solu- tion, the amino acids supporting the heme have shifted
ble. The helical polymers formed by human deoxyhemo- their positions relative to it by only small distances. At
globin S, in which a glutamate on the surface at the same time, however, the ends most distant from the
position 4A has been replaced by a valine, are an exam- heme of the two a helices in which these functionally
ple of such a problem. critical amino acids reside (E and G in Figure 7–10) have
structural amino
locationb acidc locationd rolee
structural amino
locationb acidc locationd rolee
a
Adapted from Takano.94 The amino acids listed are those that are invariant in all myoglobins, and the structural roles assigned are those in the crystallographic molecu-
lar model (Bragg spacing ≥ 0.2 nm) of myoglobin from Physeter catodon. bPosition in the common crystallographic molecular model of the globins. Capital letters (A–H)
indicate which a helix, from amino- to carboxy-terminal, and the numbers indicate the position in the a helix. Double letters refer to turns between the respective helices.
The globins are all bundles of eight a helices (Figure 4–18). cAmino acids that are invariant over all myoglobins. dLocation in the crystallographic molecular model of myo-
globin: I, internal; E, external; S, surface crevice. e Three-letter amino acid abbreviations given in uppercase letters represent invariant residues in myoglobin; those given
in lowercase letters are not invariant. *Amino acids appearing at each of these positions in nine superposed globins117 are noted in one-letter code. Dash indicates dele-
tion. gAmino acids noted with (Hb) are invariant in all mammalian hemoglobins.
372 Evolution
2EF D D D D K D D — G
2A P P A G E A A A E B
EHba
a
Four invariant ionized hydrogen bonds that were present in an earlier refined
crystallographic molecular model (Table 7–5)94 and a later refined crystallo-
graphic molecular model (Bragg spacing ≥ 0.16 nm)129 of myoglobin were chosen
for examination. Each of the four pairs is between the amino acids in boldface
type above and below each other in the central positions of the four paired strings B
of letters. The two amino acids forming each of these four hydrogen bonds in the
two crystallographic molecular models, aspartate and lysine, histidine and aspar-
LgHb
tate, aspartate and arginine, and lysine and glutamate, were conserved among all
of the myoglobins. The amino acids occupying each of these eight positions in a Figure 7–10: Arrangement of the a helices and contact side chains
structural alignment of eight other globins117 are listed to the right and left of the that form part of the heme pocket in the a subunit of equine hemo-
pair occupying each of the eight positions in myoglobin. bCode assigned to the globin (EHba) and in leghemoglobin II from Lupinus luteus
positions in the common crystallographic molecular model of the globin class
(Table 7–5). cThe amino acid at the respective position in each of the globins is (LgHb).130 The hemes in the two proteins are superposed. The three
aligned above or below the amino acid at the other position in the same globin. a helices are designated as B, E, and G, in order of their appearance
in the globin molecule. The positions of homologous pairs of side
chains (sequence positions in leghemoglobin are primed) that are
in contact with the heme are indicated by open circles joined by
shifted significantly in their position, and another a helix arrows. The coupling of the shifts at the E–B and B–G helix inter-
that provides no amino acids in contact with the heme faces keeps the side chains that form the heme pocket in the same
(B in Figure 7–10) has shifted even more. relative positions. Reprinted with permission from ref 130, origi-
In any protein, a few amino acids that embody its nally from ref 117. Copyright 1980 Academic Press.
function can be identified. Over evolution, natural selec-
tion maintains the relative separations and orientations models, there is no doubt that these two proteins share a
of these amino acids because if it did not, the protein common ancestor. The a helix in the protein from cobra
could no longer be what it is. An extreme example of this venom that binds to the surface of a biological membrane,
fact is found in the group of related enzymes to which in which the reactants for the enzyme are found, is formed
phosphopyruvate hydratase, mandelate racemase, by the first 20 amino acids of the polypeptide, but these
galactonate dehydratase, glucarate dehydratase, 20 amino acids are missing from the protein from the bee.
muconate cycloisomerase, and methylaspartate ammo- The missing tertiary structure necessary for adhering the
nia-lyase belong. Although each of these enzymes has protein to the membrane is supplied by an additional
diverged widely from its distant common ancestor, the a helix at the carboxy-terminal end of the polypeptide
positions of the functional groups in the active sites of from the bee that takes the place of the a helix from the
these proteins that are responsible for the abstraction of amino-terminal end of the polypeptide from the cobra. It
the proton a to the respective carboxylate, a function was the ability to superpose the crystallographic molecu-
common to the mechanism of each of them, have been lar models of these two proteins that permitted the sub-
conserved.118 The more distant a location within the pro- stitution of one segment of the polypeptide for another in
tein is from such invariant points of reference, however, a functional role to be demonstrated.
the more likely its position will drift as mutations accu- The inability to superpose two crystallographic
mulate that shift the orientations of the segments of sec- molecular models of two proteins can demonstrate an
ondary structure within the overall molecular structure example of convergent evolution. For proteins, conver-
of the protein. gent evolution is the assumption of the same function
An exception to this rule that functional groups are by two proteins that do not share a common ancestor.
usually the most invariant features of a protein can be seen For example, as had been predicted by aligning
in a comparison of the crystallographic molecular models sequences,4 chorismate mutase from S. cerevisiae
of the phospholipase A2 from cobra venom and the phos- cannot be superposed on chorismate mutase from
pholipase from bee venom.131 From aligned amino acid B. subtilis.132 Consequently, it can be concluded that
sequences and superposed crystallographic molecular these two unrelated proteins nevertheless evolved so
Molecular Phylogeny from Tertiary Structure 373
that they each were able to perform the same function. regions from phosphorylase and L-lactate dehydrogenase
Other examples of such convergent evolution are the is presented in Figure 7–11A.82,143
3-dehydroquinate dehydratases from Salmonella This particular topological pattern of secondary
typhimurium and Mycobacterium tuberculosis133 and structures can be identified as a doubly wound, parallel
the Cu,Zn-superoxide dismutase of B. taurus134 and the b sheet. It is a sheet of six parallel b strands flanked on
Fe-superoxide dismutase of M. tuberculosis.135 A partic- both sides by a helices. The basic rhythm of the recurring
ularly interesting example of convergent evolution that theme is b strand, a helix, b strand, a helix, b strand,
was elucidated by superposing crystallographic molecu- random meander, b strand, a helix, b strand, a helix,
lar models is that of the [2Fe–2S] ferredoxin from b strand. The b strands numbered from amino terminus
Clostridium pasteurianum. This protein is completely to carboxy terminus occur in the order 321456 across the
unrelated to the other [2Fe–2S] ferredoxins and turns sheet (Figure 7–20). You should trace this pattern in
out to be a thioredoxin that has been converted into a Figure 7–11A. The six b strands all run parallel to each
ferredoxin.136 The more common examples of conver- other to form a pleated sheet, and the helices arch above
gent evolution, however, are those in which two unre- or below the sheet to connect the end of one b strand to
lated proteins catalyze similar but not identical the next. The complete and concise theme is developed
functions. For example, although they share the same in L-lactate dehydrogenase (Figure 7–11), and there are
mechanism of activating molecular oxygen for insertion variations on this theme in the other proteins. For exam-
into a carbon–hydrogen bond and both use a heme to ple, in phosphoglycerate kinase, there are two a helices
do so, crystallographic molecular models of nitric-oxide after the second b strand and a long additional loop after
synthase and cytochrome P-450 have different, unre- the third b strand, and in alcohol dehydrogenase the last
lated structures.137 a helix is replaced by an additional antiparallel b strand.
Often the catalytic amino acids in the active sites of An interesting variation occurs after the first b strand in
examples of convergent evolution are arranged similarly the structure from phosphorylase, where a large bulge has
in the two proteins even though the overall structures are appeared in the first b strand that pushes up the loop
completely different. The serine, histidine, and aspartate between the third and fourth strands of b structure
responsible for the nucleophilic catalysis in the active (Figure 7–11A).
site of subtilisin are superposable on the serine, histi- Flavodoxin shares this pattern but with a more sig-
dine, and aspartate in the active site of chymotrypsin nificant variation. In this protein, the second a helix
even though the two proteins themselves cannot be and the third b strand have been deleted (Figure 7–11B).
superposed,138 and the functional groups in the active This deletion seems to have been very similar to those
site of alanine dehydrogenase are similarly arranged to seen in cytochrome c (Figure 7–9) in that the loop con-
those in L-lactate dehydrogenase.139 The catalytic amino taining the a helix and the b strand has simply been
acids, however, around the flavin adenine dinucleotide pinched off from the open end of the common struc-
in the active sites of the two flavoenzymes L-lactate dehy- ture. Nevertheless, the superposition of flavodoxin
drogenase (cytochrome) and D-amino-acid oxidase are upon the corresponding region from L-lactate dehydro-
arranged in patterns that are mirror images of each genase is quite close, even though the sequences of
other.140 these two superposed polypeptides appear to be com-
As the number of crystallographic molecular models pletely unrelated. When the sequences are aligned, even
has increased, instances have become more common in with the assistance of the superposition, they have
which two proteins that display no similarity in amino identical amino acids in only 9% of their aligned posi-
acid sequence nevertheless have segments of their terti- tions.
ary structure that can be superposed. An example of such The conclusion that has been drawn from these
a segment of recurring structure is found in the crystal- superpositions is that all of these regions from these very
lographic molecular models of L-lactate dehydrogenase,82 different proteins together share a common ancestor. As
alcohol dehydrogenase,141 phosphoglycerate kinase,142 these structures represent only a portion of each of the
and phosphorylase.143 This common segment is presently existing proteins, and as the other portions of
140–200 aa in length and occurs at different locations in the proteins bear no resemblance to each other, this
the overall sequences of these proteins. It is formed from common ancestor must have been a small primordial
the amino acids in the sequence between Asparagine 21 protein that was combined covalently with other small
and Glycine 162 in isoform A of L-lactate dehydrogenase primordial proteins by gene fusion to produce respec-
from Squalus acanthius, between Phenylalanine 207 and tively these larger chimeric proteins. Gene fusion is a
Serine 392 in equine phosphoglycerate kinase, between process in which genomic DNA is recombined incor-
Serine 193 and Phenylalanine 319 in isoform E of equine rectly so that segments of different genes become fused
alcohol dehydrogenase, and between Asparagine 559 and together rather than, as in the usual process of recombi-
Arginine 713 in the isoform of glycogen phosphorylase nation, allelic segments of the same gene being inter-
from muscle of Oryctolagus cuniculus. All four structures changed in precise alignment. Like gene duplication,
can be superposed.82,143 The superposition of these gene fusion occurs frequently, but only infrequently will
374 Evolution
the gene that results from the fusion spread over the
entire population of a species by genetic drift and
become fixed in its genome. The evidence of early gene
fusions that did spread over a primordial population can
still be observed in the progeny, not only in their struc-
tures but also in the distribution of methionines that are
the fossilized remains of the initiation sites of the smaller
ancient proteins that were fused together.144 The reason
that each of the primordial proteins now resides at a dif-
ferent location in the sequences of the present proteins is
that each was combined with different proteins in differ-
ent orders during these gene fusions. This description of
evolutionary history requires that at one time in the dis-
tant past each of the segments of these polypeptides now
folding to produce each of these superposable regions
was not attached to the remainder of the polypeptide to
which it is now joined. If this is the case, then each of the
doubly wound b-pleated sheets and the other regions in
each of these proteins to which they are now attached
were at one time separate proteins, and a significant part
of the evolution of proteins is a history of the joining
together of smaller proteins to produce ever larger pro-
teins.
Suggested Reading
B
Rossmann, M.G., Moras, D., & Olsen, K.W. (1974) Chemical and
biological evolution of a nucleotide-binding protein, Nature
250, 194–199.
Molecular Phylogeny from Tertiary Structure 375
1 NANTPDRLQQASLPLLSNTNCKK--YWGTKIKD-AMICAG-AS----GVSSCMGDSG
2 gtSYPDVLKCLKAPILSDSSCKS--AYPGQITS-NMFCAG-Yleg--gKDSCQGDSG
3 -gqLAQTLQQAYLPTVDYAICSsssYWGSTVKN-SMVCAG-Gdg---vRSGCQGDSG
4 gKGQPSVLQVVNLPIVERPVCKD--STriRITD-NMFCAGykpdegkRGDACEGDSG
5 dfEFPDEIQCVQLTLLQNtfcAd--AHpdKVTE-SMLCAG-Ylpg--gKDTCMGDSG
6 -dptsytLREVElRIMDEkacVd--YR--yYEykFQVCVGSPT---tLRAAFMGDSG
7 --------GLRSGSVTGlnatvn--ygssgivy-gMIQTN--------vCAQPGDSG
8 --------GTHSGSVTAlnatvn--ygggdvvy-gMIRTN--------vCAEPGDSG
9 --------GYQCGTITAknvtan--ya--egavrgLTQGN--------aCMGRGDSG
10 ---------hGAVQYsgg-------------------rFT-ip----rgvgGRGDSG
The alignment is based on the separate superposi- lowercase normal character represents a distance of
tions of crystallographic molecular models of each of the greater than 0.35 nm. A dash represents a gap.
other nine proteins upon the crystallographic molecular (A) On a sheet of graph paper, construct a dot matrix
model of chymotrypsin, which was chosen as the refer- for a comparison of the sequence of protein 3 and
ence structure for the family. The similarity of the struc- the sequence of protein 6 between the positions of
tures to that of chymotrypsin is given by the case and the the first and the third cysteines in the amino acid
face of the one-letter code of the amino acids. An upper- sequence of chymotrypsin, which is protein 1.
case boldface character represents an a carbon that is
within a distance of 0.15 nm of the equivalent a carbon (B) Trace through the dot matrix the structural align-
in chymotrypsin. An uppercase normal character repre- ment of protein 3 and protein 6.
sents a distance within 0.25 nm, a lowercase boldface (C) What is the percentage of identity for the align-
character represents a distance within 0.35 nm, and a ment in this segment?
(A) Why are the amino acid sequences of the proteins Domains
from cat and chicken so much more similar to
each other than are the sequences from the cat Ferredoxin–NADP+ reductase from spinach is a protein
and the rat to each other? composed of a folded polypeptide 314 aa long (Figure
7–12).148,149 In the native enzyme, the polypeptide
(B) Align the amino acid sequences of the proteins between Asparagine 162 and Tyrosine 314 assumes a
from rabbit muscle and cat muscle from Proline doubly wound, parallel b sheet of five strands (upper
116 to Proline 218. What is the percent identity right of Figure 7–12) and the polypeptide between
and how many gaps are there? Glycine 26 and Glutamate 154 assumes an antiparallel
The following figure146 is a superposition of the
b barrel (lower left of Figure 7–12). Doubly wound, paral-
lel b sheets recur in many different proteins and antipar-
a carbons between Proline 116 and Proline 218 in the
allel b barrels recur in many different proteins, but the
crystallographic molecular models of the pyruvate
two structures are usually not found in the same protein.
kinases from rabbit muscle and cat muscle.
In the crystallographic molecular model of ferre-
doxin–NADP+ reductase, the doubly wound, parallel
b sheet seems to be folded independently from the
antiparallel b barrel. Only one strand of polypeptide runs
P116
between them.
N145
P218
P116
MolScript.409
sites at which cleavage of a native protein by endopepti-
dases can occur are exposed, flexible loops of polypep-
tide on its surface that are rarely situated between
domains, and domains are often not connected by such
loops.155–157 If a polypeptide is cut at only one or two posi-
tions by an endopeptidase when it is folded in its native
conformation, this is not evidence that the fragments
240
124
291
observed compose separate domains in that native con-
formation.158
276
54
If, however, the protein can be digested and the
208
202
pieces that result can be separated as biologically active
85
or structurally intact moieties, they are detachable
314
228
domains. Few examples of such endopeptidolytic
265
175
154
detachments have been reported, and among those are
140
the following. A protein anchored to mammalian cellular
222
membranes contains within its single folded polypeptide
193
the two enzymes peptidylglycine monooxygenase and
36
peptidylamidoglycolate lyase. This protein can be
26
107
65
digested either during normal cellular processes or by
experimental treatment with an endopeptidase to pro-
duce two soluble, detached domains, which can be sep-
124
240
291
54
latter.159 The enzyme sulfite oxidase can be cleaved with
314 208
202
85
detached domains that can be separated from each other
228
175
36
65
with MolScript.409
ative to each other when immunoglobulin is in solution. This drawing was produced
these three detachable domains, each a rigid body, are in constant rearrangement rel-
Fab fragment with the Fc fragment are open, flexible, and unstructured; consequently
drawn with the thickest lines. The two strands of polypeptide connecting each
domain, however, associate with each other through their oligosaccharides, which are
domains. The two centrally located internally repeating domains in the Fc detachable
four respective pairs of internally repeating domains in the two Fab detachable
nal portions of the two heavy chains and associate directly with each other, as do the
domains in the Fc detachable domain are formed respectively by the carboxy-termi-
arose from multiple gene duplications. The two most peripheral internally repeating
these smaller internally repeating domains are superposable, one on the other, and
domains each about 110 aa long composed of seven antiparallel b strands. All 12 of
Fc domain and the two Fab domains, is itself formed by four internally repeating
ate with each other to form the Fc domain. Each of the three detachable domains, the
each form respectively the other half of one of the Fab domains, and they also associ-
light chains are each confined to their respective Fab domains, but the heavy chains
and the other heavy chain and the other light chain are drawn with thinner lines. The
chains. In the figure, one heavy chain and one light chain are drawn with thicker lines;
214 aa long, the light chains, and two identical polypeptides 444 aa long, the heavy
cule of immunoglobulin G. In this particular case, they are two identical polypeptides
to the left. There are four folded polypeptides that together compose an intact mole-
cally, one above the other on the right-hand side of the figure, and the Fc domain is
murine monoclonal immunoglobulin G.151 The two Fab domains are aligned verti-
Figure 7–13: Chrystallographic molecular model (Bragg spacing ≥ 0.28 nm) of a
expressed as separate proteins that exhibit respectively
the abilities of the intact native protein to recognize DNA
containing its operator and to dimerize in an arabinose-
dependent manner.164 Genetic deletion of the carboxy-
terminal 154 aa from cystathionine b-synthase (507 aa)
actually increases its enzymatic activity.165
The advantage of performing a detachment geneti-
cally is that it can be accomplished at the precise bound-
ary and does not require that the polypeptide between
the two domains be as extraordinarily available for diges-
tion by an endopeptidase as are the connecting strands
properly detached, each domain of this bifunctional pro- teins can be aligned with four consecutive regions in the
tein from E. coli remains folded and enzymatically active. amino acid sequence of the multifunctional protein from
The cleavage by an endopeptidase within the exposed Mesocritus auratus; glutaminase with amino acids 2–355
loop in domain 1, which is responsible for aspartate (40% identity, 1.3 gap percent),172,174 carbamoyl-phos-
kinase, causes it to unfold and the unfolded amino-ter- phate synthase (ammonia) with amino acids 397–1440
minal fragment to fall away. Domain 2, which is respon- (40% identity, 1.1 gap percent),172,174 dihydroorotase175
sible for homoserine dehydrogenase, remains folded and with amino acids 1457–1785 (20% identity, 3.5 gap per-
active after the cleavage. If the point of cleavage by glu- cent),176 and aspartate carbamoyltransferase with amino
tamyl endopeptidase were a true boundary between two acids 1921–2225 (44% identity, 1.6 gap percent).177 These
detachable domains, rather than an adventitious loop of regions are enzymatic domains 1 through 4 in the CAD
polypeptide on the surface of the aspartate kinase, the multienzyme complex, respectively.
aspartate kinase activity, which can readily be expressed The fact that dihydroorotase has sustained so much
by the genetically dissected protein, would have been more replacement suggests that even within the same
unaffected. Aspartate kinase–homoserine dehydroge- polypeptide different domains can suffer replacement at
nase, then, is an example of a protein that has domains different rates. In fact, rates of change can differ so much
that cannot be detached from each other at their bound- that one domain in a multienzyme complex can become
ary by cleavage with an endopeptidase. defunct even as others retain their full function. In the
Proteins such as aspartate kinase–homoserine CAD multienzyme complex from yeast, the dihydrooro-
dehydrogenase belong to a class of proteins known as tase domain, although its amino acid sequence is still
multienzyme complexes. A multienzyme complex is a able to be aligned, has lost the ability to catalyze its enzy-
protein that, although it is a single, discrete macromole- matic reaction.175 A similar loss of function has occurred
cule, is able to catalyze two or more enzymatic activities. during the evolution of the fructose-2,6-bisphosphate
Usually each of the enzymatic activities in one of these 2-phosphatase domain of yeast 6-phosphofructo-2-kinase,
multienzyme complexes is expressed by its own unique but its enzymatic activity can be restored by mutating
domain within the folded polypeptide or one of the Serine 404 to histidine.178
folded polypeptides that form the protein. Such an enzy- Because the amino acid sequences of the four dis-
matic domain within a larger protein is a domain that is crete bacterial proteins responsible for the four enzy-
by itself independently responsible for a particular enzy- matic reactions catalyzed by the CAD multienzyme
matic activity. The fact that the several enzymatic activi- complex from animals can be aligned with the amino
ties are expressed respectively by several individual acid sequences of its four enzymatic domains, it follows
proteins in some species yet all of them are expressed by that the tertiary structure of each domain in the animal
only one protein formed from one folded polypeptide in protein must be superposable on the tertiary structure of
other species is sufficient evidence of an independent the corresponding bacterial protein. Consequently, each
existence to conclude that the multienzyme complex is domain in the animal protein must be a compact, inde-
constructed from enzymatic domains. Proteins con- pendently folded unit, and these units must be strung
structed from enzymatic domains presumably arose as together consecutively by the continuity of the polypep-
the result of the fusion of the individual genes that tide. This conclusion is supported by the fact that the
encoded the unfused ancestors of those domains. Many enzymatically active domains responsible respectively
artificial fusions of two genes to produce chimeric pro- for dihydroorotase175,179 and aspartate carbamoyltrans-
teins have been performed, and the products that result ferase177 can be detached either genetically or by cleav-
from these artificial fusions seem to be little affected age of the protein with endopeptidases at the boundaries
functionally.170,171 of the domains. During the digestion with endopepti-
A paradigm for a protein containing enzymatic dases, however, the activity of carbamoyl-phosphate
domains is the CAD multienzyme complex in animal tis- synthase (glutamine-hydrolysing) is lost and can be asso-
sues that comprises a single folded polypeptide about ciated with none of the fragments smaller than 1700 aa in
2220 aa in length172 responsible for the enzymatic activi- length. In situations such as this, digestion of one
ties of carbamoyl-phosphate synthase (glutamine domain, for example, the carbamoyl-phosphate syn-
hydrolysing), aspartate carbamoyltransferase, and dihy- thase domain, at some point on its surface could cause it
droorotase.173 The first enzymatic reaction has two steps, to unfold and make the polypeptide much more suscep-
the production of ammonia from the hydrolysis of gluta- tible to cleavage by endopeptidases in a region forming
mine at the active site of a glutaminase and the synthesis the boundary between that unfolded domain and a
of carbamoyl phosphate from the resulting ammonia at neighboring properly folded domain. An example of such
the active site of a carbamoyl-phosphate synthase a pruning of an unfolded segment of polypeptide from a
(ammonia). Each of the four component enzymatic reac- properly and compactly folded protein by digestion with
tions carried out by the intact complex from animals is an endopeptidase occurred during the production of
carried out by a different discrete protein in E. coli. The hybrids of different portions of micrococcal nuclease
amino acid sequences of the four separate bacterial pro- from Staphylococcus.180
380 Evolution
Anthranilate synthase, CTP synthase, phosphoribo- a separate folded polypeptide181 or an enzymatic domain
sylformylglycinamidine synthase, GMP synthase, imida- in a longer polypeptide.182 Because the sequences of
zole glycerol phosphate synthase, glutamine– these various enzymatic domains catalyzing the hydrol-
fructose-6-phosphate transaminase (isomerizing), and ysis of glutamine are homologous, it follows that they
aminodeoxychorismate synthase, like the carbamoyl- share a common ancestor that in its day was a separate,
phosphate synthase (glutamine hydrolysing) incorpo- independent protein, presumably a glutaminase. The
rated into the CAD multienzyme complex, all contain an offspring of this common ancestor were separately
enzymatic domain responsible for producing ammonia incorporated into the various multienzyme complexes
from glutamine by hydrolysis. The domain can be either in which they are now found. Although each of these
complexes catalyzes a quite different reaction, each uses
the ammmonia supplied by the respective glutaminase
375
domain as a substrate.
411
106
133
47
PDDF
208
477
411
106
153
196
477
thymidylate synthase catalyze successive reactions in the polypeptide 3131 aa in length195 responsible for the
biosynthesis of chorismic acid, orotidine 5¢-phosphate, biosynthesis of enniatin in Fusarium oxysporin catalyzes
and thymidine 5¢-phosphate, respectively. Aspartate all of the enzymatic reactions required to produce this
kinase–homoserine dehydrogenase catalyzes the first cyclic hexadepsipeptide,196 and the multienzyme com-
and third steps in the biosynthesis of homoserine. It is plex containing four folded polypeptides (3587, 3587,
common in prokaryotes to find the enzymes catalyzing 1274, and 240 aa) responsible for the biosynthesis of sur-
the reactions of a metabolic pathway gathered together factin in B. subtilis catalyzes all of the enzymatic reac-
in an operon. Such a gathering may have preceded the tions required to produce this cyclic octadepsipeptide.197
gene fusions producing multienzyme complexes and The multienzyme complex198 responsible for the
facilitated those fusions by placing the genes for the biosynthesis of the polyketide 6-deoxyerythronolide B,
ancestors of the enzymatic domains adjacent to each which is the lactone at the hydroxyl on carbon 13 of the
other in the genome.185 There are, however, examples of fatty acid
single enzymatic domains responsible for only one enzy-
matic reaction but inserted into larger proteins. The CH3 CH3 CH3 CH3 CH3 CH3
domain responsible for protein-tyrosine-phosphatase in
OH 7–1
eukaryotes is a compact enzymatic domain186 that is H3C
fused into a large array of different proteins. OH OH O HH
OH OH O
In prokaryotes and plants,187,188 the seven enzymes
and the acyl carrier protein responsible for the synthesis
of fatty acids from acetyl-SCoA and malonyl-SCoA are is composed of three distinct but related polypeptides,
discrete proteins that can be separated and individually 3200–3600 aa long. The synthesis of the complete
purified, but in fungi and animals all of these activities molecule of 6-deoxyerythronolide B procedes by the suc-
are expressed by a single multienzyme complex. In fungi, cessive condensation of six molecules of (2S)-methyl-
the complex is constructed from two folded polypeptides malonyl-SCoA onto a molecule of propionyl-SCoA. After
that are encoded by different genes and are completely each of the six Claisen condensations, the resulting
different in sequence from each other.189,190 Their lengths ketone either remains untransformed (carbon 9) or is
are 1890 and 1980 aa, respectively.189,190 The fatty-acid reduced to the alcohol (carbons 3, 5, 11, 13), which either
synthase from animals, however, is constructed from remains untransformed or is dehydrated and reduced to
only one polypeptide, 2440 aa in length.191 All seven of the alkane (carbon 7). The entire sequence of reactions at
the enzymatic activities and the acyl carrier protein are each round, from condensation to alkane, requires the
located on the single polypeptide comprising the animal successive participation of an acyl carrier protein present
enzyme, and the domains responsible for each have been as a domain as well as five separate enzymatic domains:
identified in the amino acid sequence.191–193 Animal fatty- an [acyl-carrier-protein] S-acyltransferase, a 3-oxoacyl-
acid synthase has also been dissected both genetically [acyl-carrier-protein] synthase, a 3-oxoacyl-[acyl-
and with the endopeptidase kallikrein193 to produce sev- carrier-protein] reductase, a 3-hydroxyacyl-[acyl-carrier-
eral detached domains that are able to catalyze the enzy- protein] dehydratase, and an enoyl-[acyl-carrier-pro-
matic reactions assigned to them. tein] reductase. On its three polypeptides erythronolide
The order in which these enzymatic domains occur synthase has 22 enzymatic domains and 6 acyl carrier
on the single polypeptide of the animal fatty-acid syn- proteins.
thase is unrelated to the orders in which they appear on The elongating substrate is passed along the
the two unique polypeptides from fungi.194 On the basis assembly line from the first acyl carrier protein to the
of this fact, it has been concluded that all or most of the sixth acyl carrier protein through six successive stations.
gene fusions that produced the animal protein and the At each station, it is operated on by the enzymatic
fungal protein, respectively, must have occurred as inde- domains assembled at that station.199 After it has been
pendent events after the lineages of these two kingdoms processed at the last station, the product 6-deoxyery-
diverged from their common ancestor. These separate thronolide is released from the last acyl carrier protein by
processes would be ones in which each enzymatic an acyl-[acyl-carrier-protein] hydrolase, the last of the 22
domain has been shuffled into a larger protein. enzymatic domains. When one of the domains at a par-
Nevertheless, the individual domains, even though fused ticular station is inactivated or when one or more
in different orders, are still homologous to each other domains are experimentally added to a particular sta-
because the amino acid sequence of the one responsible tion, the product of the multienzyme complex changes at
for a given activity in the fungal enzyme can be aligned the position produced by that station to reflect its altered
with that of the one responsible for the same activity in capacity. When the last station is deleted, a fatty acid lac-
the animal enzyme.194 tone shorter by one C3 unit is produced.200 The enzymatic
There are a number of multienzyme complexes that reactions catalyzed by this large multienzyme complex
are responsible for the biosynthesis of antibiotics in var- are homologous to those catalyzed by fatty-acid syn-
ious fungi and bacteria. For example, the single folded thase. Fatty-acid synthase, however, because each of its
382 Evolution
successive steps passes through the complete sequence allel b barrel,208 and domain 2, the doubly wound paral-
of enzymatic reactions to produce the alkane at each lel b sheet,209 in ferredoxin–NADP+ reductase (Figure
stage, uses only one station for all of the reactions rather 7–12), which are also found as domains in a number of
than the six stations, one for each of its steps, used by different proteins. A recurring domain is a domain that
erythronolide synthase. is folded with a tertiary structure that can be superposed
Coenzymatic domains, such as the acyl carrier pro- on the tertiary structures of other domains in other pro-
teins carrying 4¢-phosphopantetheine that are incorpo- teins of otherwise entirely different structure. A recurring
rated into fatty-acid synthase and erythronolide domain is a compact structure used in more than one
synthase, are domains to which coenzymes are cova- distinct situations. Because of its recurrence in different
lently attached. The domains carrying acyl carrier pro- surroundings, there is little doubt that a domain of this
teins are domains because they are found as type had at one time an independent existence.
independent proteins in prokaryotes and plants; the Pyruvate kinase is one of the more informative
domains carrying lipoic acid that are incorporated into examples of a protein built from recurring domains.210
2-oxo-acid dehydrogenase complexes are domains Domain 1 of the protein is an a-helically wound, parallel
because they can be detached.201 Domains carrying b barrel that is superposable on the entire folded
biotin appear within the longer polypeptides of a polypeptide of triose-phosphate isomerase. Inserted into
number of different biotin-dependent carboxylases.202 the loop between the third b strand and the third a helix
A functional domain within a larger protein is a of domain 1 is domain 2, which is a Greek key, antiparal-
domain that is by itself responsible for a specific func- lel b barrel.211 Domain 3, which follows in the sequence
tion. Enzymatic domains are functional domains, but of the polypeptide the complete elaboration of domain 1,
there are also functional domains that are not enzymatic. can be superposed on half of the doubly wound, parallel
Examples of functional domains would be the domains b sheet found in L-lactate dehydrogenase. (Figure
responsible for binding to specific segments of DNA 7–11).210 Each of these three structures is a recurring
(Figures 6–46, 6–50, and 6–53), each of which is a com- domain. In galactose oxidase from Dactylium den-
ponent of a larger protein responsible for controlling the droides, domain 2 is a b propeller (Figure 6–13) that is
expression of the gene adjacent to the segment of DNA flanked by domain 1, an eight-stranded jelly roll, antipar-
recognized by the binding domain. Each of the other allel b barrel,211 and domain 3, a bundle of seven antipar-
domains in the larger protein is responsible for a func- allel b strands that has the topological arrangement of an
tion essential to this control. For example, in each of the immunoglobulin domain.212 All three of these structures
proteins that controls genes in response to steroid hor- are also recurring domains.
mones such as estrogen, testosterone, progesterone, cor- Recurring domains occasionally appear to be
tisone, and aldosterone, one of these other domains is associated with a particular role. For example, benzoate
responsible for binding the respective hormone.203 There 4-monooxygenase, glucose oxidase, cholesterol oxidase,
are many examples of domains such as these that are and glutathione-disulfide reductase contain a recurring
responsible for binding a ligand and that are part of a domain about 160 aa in length.213,214 In all of these
larger protein. Examples are the two domains responsi- enzymes, the domain serves to bind tightly an integral
ble for binding cyclic AMP in cyclic AMP-dependent pro- flavin coenzyme, and it has been referred to as the “FAD-
tein kinase204 and the domain responsible for binding binding domain”.213
flavin mononucleotide and stabilizing its semiquinone Some secondary structures, such as a b propeller
radical in sulfite reductase.205 When the amino-terminal (Figure 6–13)212 or a b helix, are self-contained and usu-
240 aa of 3-phosphoshikimate 1-carboxyvinyltransferase ally occur as independent structural entities in a pro-
from E. coli, which form a compact globular domain in tein. Because such secondary structures recur in many
the crystallographic molecular model of the intact pro- proteins, they could be considered recurring domains,
tein (427 aa),206 was expressed separately, it folded to and they usually do seem to be independent isolated
form a structure capable of binding shikimate 3-phos- units in a larger protein in which they occur.215
phate.207 An internally repeating domain is a member of a
The domains discussed so far are clearly capable of set of consecutive segments within the same polypep-
independent existence or are descended from ancestors tide, each homologous in amino acid sequence to the
that were. There are, however, domains in proteins that other members of the set or each folded in a tertiary
either cannot be detached or that are not identified with structure superposable on the tertiary structures of the
an independent function. These domains often were other members of the set. An example of a set of such
joined together so long ago that they have become com- internally repeating domains are the 12 domains, four in
pletely dependent on each other both structurally and each of the three detachable domains that compose
functionally. Nevertheless, it is possible to conclude that immunoglobulin G (Figure 7–13). Each of these 12 inter-
they once did have an independent existence because nally repeating domains is a seven-stranded barrel of
they recur in a number of extant proteins. Examples of antiparallel b strands. Each is superposable on all the
such recurring domains would be domain 1, the antipar- others and shares statistically significant similarities in
Domains 383
amino acid sequence with some of the others. The two 0.21 nm, and the amino acid sequences of the two halves
identical long polypeptides in an immunoglobulin G, the can be aligned structurally with a percentage of identity
heavy chains, each contain four of these domains; and of 23%.217 A similar superposition and alignment can be
the two identical short polypeptides, the light chains, performed with the two halves of the a-helically wound,
each contain two. Polypeptides containing such inter- parallel b barrel of imidazole glycerol phosphate syn-
nally repeating domains are quite common. About 10% thase from the same bacterium.217,218 Unlike the two
of polypeptides 200 amino acids in length contain inter- halves of the doubly wound, parallel b sheet (Figure
nally repeating domains but the frequency rises steadily 7–15), there are no examples of proteins in which half of
to 80% for polypeptides 2000 amino acids in length.216 an a-helically wound, parallel b barrel is found.
Internally duplicated domains are internally Nevertheless, these superpositions and alignments sug-
repeating domains that occur only twice in the same gest that all a-helically wound, parallel b barrels are also
polypeptide. Internally duplicated domains arise from the product of an internal duplication.
the internal duplication of a gene encoding a smaller The serine endopeptidases,219 thiosulfate sulfur-
protein. An internal duplication is a gene duplication in transferase,220 carbamoyl-phosphate synthase from
which the duplicated genes end up immediately adjacent E. coli,221 methionyl aminopeptidase from E. coli,222 and
to each other so that when they are transcribed and diaminopimelate epimerase from Haemophilus influen-
translated, the duplicated amino acid sequences remain zae223 are each an example of a protein formed entirely
attached consecutively to each other in the same
polypeptide. Because they are the products of internal
duplication, the single, unrepeated ancestral amino acid
sequence and tertiary structure of each of these dupli-
cated domains must have existed on its own at some time
in the past, before the duplication occurred. As with a
gene duplication producing two separate proteins, an
internal duplication arises in the genome of one individ-
ual and then spreads by genetic drift over the whole pop-
ulation. As with gene duplication in general, internal
duplications arise often but only rarely spread over the
whole population. Following the gene duplication and its
spread and fixation by genetic drift, the two internally
repeating domains begin to evolve separately.
The two halves of the doubly wound, parallel
b sheet as it presently occurs in L-lactate dehydrogenase
(Figure 7–11) can be superposed upon each other (Figure
7–15).82 It has been proposed that the complete doubly
wound, parallel b sheet arose itself from a gene duplica-
tion in which the two segments of polypeptides encoded
by the duplicated gene remained consecutively attached
to each other and then began to evolve independently
but within the same protein. That this did happen is sup-
ported by the fact that recurring domains are found in
the crystallographic molecular models of pyruvate
kinase210 and phosphoglycerate kinase that superpose on
only half of the doubly wound, parallel b sheet from
210
L-lactate dehydrogenase. The lineages leading to these
Figure 7–15: One half of the doubly
wound, parallel b sheet (Figure 7–11)
found in the crystallographic molec-
ular model of L-lactate dehydroge-
nase from S. acanthius (open lines),
drawn as an a-carbon diagram,
superposed on the other half of the
same doubly wound, parallel b sheet
(solid lines).82 Reprinted with per-
mission from ref 82. Copyright 1974
by one internal duplication. In UDP-glucose 6-dehydro- multiplied so recently that they can be readily aligned242
genase from Streptococcus pyogenes, however, the two are proteins in which the internally repeating domains
domains of an internal duplication (123 and 93 aa) are diverged so long ago that their secondary structures,
separated from each other by another domain of although obviously related, have rearranged signifi-
177 aa.224 Although the bifunctional enzyme phosphori- cantly.243
bosylanthranilate isomerase/indole-3-glycerol-phos- Nebulin and spectrin are examples of proteins with
phate synthase has two consecutive enzymatic domains even larger numbers of internally repeating domains.
that are superposable,225 the two enzymes probably Nebulin is a long protein (6669 aa) composed almost
evolved separately from a common ancestor before entirely (the first 6480 aa) of 178 internally repeating
being fused. domains each 30–32 aa long.244 Spectrin is a protein
Aside from the alignments of the duplicated amino composed of 38 internally repeating domains consecu-
acid sequences or superpositions of the duplicated terti- tively occurring in its two unique folded polypep-
ary structures, there are other observations supporting tides.245–247 The folded domains sit like beads on a wire to
the conclusion that duplicated domains at one time had create a long, somewhat flexible protein (Figure
independent existence. The internal duplication in 7–16A).245,248 Each domain of 106 aa246 is an antiparallel
mammalian hexokinase produced two independent coiled coil of three a helices (Figure 7–16B).249–251
enzymatic domains, each containing an active site and Spectrin offers an excellent example of the absence of a
each superposable on the other and on the unduplicated correlation between locations at which cleavages with
enzyme from fungi.87,226 The two internally duplicated endopeptidases occur upon the surface of a native pro-
domains of ovotransferrin227 can be detached by diges- tein and the boundaries between its domains. Of the 15
tion with endopeptidase, and the resulting amino-termi- cleavages of native spectrin produced by trypsin,252 only
nal domain can be crystallized and shown to have the four occur that are even near the boundaries of the
same structure it had in the intact protein.228 The aspar- domains.245 This fact is not surprising, given that the
tic endopeptidase from retroviruses is formed from two junctions between the domains are continuous a helices
identical polypeptides,229 but those from eukaryotes are (Figure 7–16B).
formed from a single polypeptide containing an internal The internal repeats in a protein, if the multiplica-
duplication of the structure assumed by each polypep- tions of the gene that produced them have occurred
tide in the viral protein.125,230 Each domain of the enzyme recently enough, can be recognized on a dot matrix
from eukaryotes is superposable on one of the folded (Figure 7–2) in which the amino acid sequence of the
viral polypeptides. The two halves of porcine aspartic entire protein is compared to itself. Such a dot matrix of
endopeptidase were expressed separately and, when a protein with internally repeating domains contains a
mixed together, folded to produce an enzymatically set of lines parallel to the central diagonal of identity. The
active protein formed, as is the retroviral enzyme, from distance between the lines is equal to the length of the
two folded polypeptides231 rather than two internal internally repeating domains, and the number of lines is
duplications. one less than the number of domains. In the dot matrix
Many proteins contain more than two internally for the self-comparison of the amino acid sequence of
repeating domains. Serum albumin232,233 is composed of human intestinal retinol-binding protein,234 there are
three internally repeating domains; human interstitial three lines parallel to the central diagonal and the dis-
retinol-binding protein, of four;234 gelsolin, of six;235 tances between them are 302–310 aa, and in that for
human placental ribonuclease inhibitor,236 hemocyanin human placental ribonuclease inhibitor,236 there are six
from Octopus dofleini,237 and granulin,238 of seven; and lines and the distances between them are 57 aa.
the polymeric globin from Artemia, of nine.239 Such gene The b propeller (Figure 6–13) is a structure in which
multiplications are usually not produced by several his- four antiparallel b strands form each blade and 6–8
torically distinct duplications but arise when an initial blades form the intact unit. Each blade is superposable
gene duplication then catalyzes the further multiplica- on each of the others253 so an argument might be made
tion of the gene during successive rounds of recombina- that each blade represents an internally repeating
tion. Sometimes the same protein from different species domain,254,255 but there are no examples of such a small
has different numbers of internally repeating domains. structure having independent existence. There are a
Dihydrolipoyllysine-residue acetyltransferase from number of other repeating structures that are too small
E. coli has three consecutive lipoamide domains; the to fold on their own256–259 and consequently should not
enzyme from rat, two; and the enzyme from S. cerevisiae, be considered to be domains. Often short repeating
one.240 An example of a very recently multiplied gene is sequences such as those in dragline silk from spiders260
the one encoding prepromagainin from Xenopus laevis or antifreeze proteins from Tenebrio molitor261 have been
in which the identical sequence of 46 aa repeats five multiplied to produce a protein that is fibrous or that
times241 in the same polypeptide with the replacement of must conform to a repeating molecular structure such as
only one amino acid in only one of the repeats. In con- a crystal of ice but that is not formed from a string of
trast to such proteins containing amino acid sequences independent globular tertiary structures as is spectrin.
Domains 385
B
Figure 7–16: Internal repeats in spectrin. (A)
Hypothetical model of the isoform of spectrin from 1982 1982
human erythrocytes.245 A repeating pattern was 1905 1905
detected in the amino acid sequences of the two dif-
ferent polypeptides (naa,a = 2430, naa,b = 2200) com-
posing this protein. The repeating pattern is 106 aa
in length and occurs 22 times in the a polypeptide248
and about 18–20 times in the b polypeptide.245 From
numerous physical measurements, it was con-
cluded that each repeating domain is a bundle of
three a helices and that the bundles are strung
1882 1882
together as shown. Reprinted with permission from
1800 1800
ref 245. Copyright 1984 Nature. (B) Skeletal repre-
sentation of the crystallographic molecular model
(Bragg spacing ≥ 0.2 nm) of the 16th and 17th inter- 1945 1945
nally repeating domains of the a isoform of spectrin 1869 1869
from brain of Gallus gallus.249 Complementary DNA
encoding the sequence of the protein from Histidine
1772 to Alanine 1982 was expressed in E. coli. The
resulting polypeptide folded to form the two
domains, each of which is an antiparallel coiled coil
of three a helices. The only feature of the structure
unforeseen in the proposal of panel A was that the
last a helix of a preceding domain would be contin- 1772 1772
uous with the first a helix of the next domain. This 1839 1839
drawing was produced with MolScript.409
The proteins discussed so far are simple multiples internally repeating domains, each about 100 aa long,
of internally repeating domains, but other proteins have that account for 90% of its amino acid sequence of
internally repeating domains attached to themselves and 26,900 aa. Unlike spectrin, however, these internal
then to other domains. Sometimes one or the other of the repeats come in two types, immunoglobulin repeats270
odd domains appears to have evolved from the same and fibronectin type III repeats. Also, unlike the inter-
common ancestor as the internally repeating domains nally repeating domains of spectrin, these two types of
but has undergone a few more additions, deletions, or internally repeating domains are widely recurring within
rearrangements of its secondary structure,262,263 an a broad class of mosaic eukaryotic proteins that are cob-
observation suggesting that two gene duplications bled together from a particular collection of promiscu-
occurred at remarkably different times, but usually the ous, modular domains. In the amino acid sequences of
odd domain or odd domains are entirely different. For these proteins, segments have been observed that can be
example, pyruvate oxidase from Lactobacillus plantarum aligned with other segments within the same protein as
has two internally repeating domains, each 200 aa long, well as segments in other proteins.271 These recurring
coupled to a recurring FAD-binding domain in the same domains can appear many times in the same protein as
polypeptide.264 The extracellular portion of mannose- internally repeating domains272 or they can appear in
6-phosphate receptor has 15 contiguous internally combination with other different recurring domains.
repeating domains coupled to a membrane-spanning Because the proteins that contain them seem to have
segment and an intracellular domain.265 resulted from recent, remarkably active domain shuffling
Titin is possibly the longest protein built from a and because their amino acid sequences can usually be
single polypeptide. It is a protein greater that 1 mm in aligned readily with those of the other members of their
length that is found in vertebrate muscle.266,267 The type, these domains are usually considered to be mem-
amino terminus of its polypeptide is embedded in the bers of a unique group and are referred to as modular
Z disc, the carboxy terminus of its polypeptide is embed- domains.
ded in the M line, and its elasticity allows it to shorten A modular domain is a domain that recurs fre-
and lengthen as the muscle contracts and relaxes.268,269 quently within a group of mosaic eukaryotic proteins
As does spectrin (Figure 7–16), titin achieves its remark- containing internal repeats of that domain, mixtures of
able length by using internal repeats. It contains 244 other modular domains, or both internal repeats and
386 Evolution
catalytic
92
apart significantly are the products of ancient genetic
756 244
rearrangements by processes that assembled the pro-
624
295
609
teins common to all existing organisms. Many of the
724
recent rearrangements have been produced by exon
445
shuffling, but whether or not any of the early rearrange-
130
211
12
685
ments also were produced by exon shuffling is
484
49
pleckstrin
unknown.329–332
158
When the tertiary structure of ferredoxin–NADP+
395
738
reductase is examined (Figure 7–12), it seems possible to
558
348
EF-hand
648
divide it reasonably into two domains. It is now known
C2
that both of these are recurring domains, but it has been
argued that, even if this were not known, a judicious
decision that these were distinct domains could still have
been made by inspection of the crystallographic molec-
catalytic
ular model alone. In this sense, these are two structural
92
domains. A structural domain has been defined as a
756 244
295
“section of peptide chain that can be enclosed in a com-
624
609
724
pact volume … by a closed surface …, and is character- 445
130
12
211
terminal points are the point at which the polypeptide
685
484
49
enters the compact volume enclosed by the surface and
158
pleckstrin
the point at which it exits. For example, phosphoglycer-
395
738
ate kinase is constructed from two structural domains,
558
348
EF-hand
648
possessing two terminal points and clearly capable of
C2
being enclosed by continuous surfaces surrounding
compact volumes.142 This definition does not possess a
requirement for evidence of the independent existence
(Glycine 133 to Aspartate 756) was also expressed separately and crystal-
prise positions 133–175, 176–211, 212–245, and 246–282. None has a bound
that produced no electron density, and the active site is occupied by a mol-
phospholipase Cd1 from R. norvegicus.274,324,325 The amino-terminal
Ca2+ even though the cation was present during crystallization, so they do
product of the enzymatic reaction, and a Ca2+ cation (gray circle). The C2
that the carboxy-terminus of the pleckstrin domain is near the amino
terminus of the rest of the molecule. The four EF hands (Table 7–7) com-
not assume the paradigmatic structure, but except for the first one, which
is missing its first a helix, each has an a helix, a loop, and an a helix. The
The intuitive impression persists, nevertheless, that nally repeating domains of aspartic endopeptidases have
the tertiary structures of most proteins, as revealed in been documented by comparisons of crystallographic
their crystallographic molecular models, can be divided molecular models of six different members of the
into two or more autonomous structural domains. It group,230 and the two structural domains in the related
seems to be the case that anyone examining these struc- proteins ferredoxin–NADP+ reductase from B. taurus and
tures would make the same decision, but there is no way thioredoxin-disulfide reductase from E. coli,342 although
to verify this surmise. Some examples of proteins that are separately superposable, differ in their relative positions
thought to contain structural domains are DNA-directed by 66 ∞.
DNA polymerase I,335 dihydrolipoyl dehydrogenase,336 One criterion that is often used as evidence for the
and the hemagglutinin glycoprotein of influenza virus.337 independence of structural domains in a protein is that
An interesting example of a structural domain for which they unfold or fold independently. Separately unfolding
there is independent evidence of its existence occurs in domains are two or more regions of a protein that unfold
catalase from Penicillium vitale. This protein has a struc- independently of each other. Fibrinogen is a protein con-
tural domain formed from the carboxy-terminal 160 aa structed from two copies of each of three polypeptides
in its sequence (amino acids 510–670)338 that is missing that are combined in such a way that the intact protein
entirely from bovine liver catalase,339 even though the contains a central detachable domain, domain E, and
two proteins are superposable throughout the other two identical peripheral, detachable domains, do-
structural domain. Likewise, the carboxy-terminal struc- mains D. The two domains D are attached to domain E
tural domain found in the two structurally superposable by ropes constructed from three-stranded coiled coils of
proteins phosphoribosylamine–glycine ligase and biotin a helices,343–345 and the two domains D can be detached
carboxylase is missing from the otherwise superposable from domain E by cleaving disordered regions in the
proteins glutathione synthase and D-alanine–D-alanine coiled coils with endopeptidases. When fibrinogen is
ligase.92 submitted to differential scanning calorimetry,* two
As the number of crystallographic molecular clearly separated transitions can be observed (Figure
models has increased, more and more of the domains 7–18).346 These have been assigned to the melting of
that were originally designated structural domains have domains D and E, respectively.346
been found to be recurring domains. It is possible that if The melting, or unfolding, of the separately unfold-
the crystallographic molecular models of all of the pro- ing domains of fibrinogen is an irreversible process
teins were known, all structural domains would turn out under the conditions chosen,346 but the unfolding and
to be recurring domains, and this latter fact would pro- the refolding of a protein back to its native structure are
vide the necessary evidence for their independent exis- often reversible processes, even in a calorimeter.
tence. Plasminogen is a protein composed of at least seven
One indication that a structural domain does have domains.300 These are five kringles that repeat consecu-
independent existence is that its position relative to the tively within the entire sequence and two additional seg-
rest of the protein shifts when crystallographic molecular ments of polypeptide on each side of this pentuplication.
models from different crystals of the same protein are Several of these domains or combinations of these
compared. Lysozyme from bacteriophage T4 assumes domains can be detached and isolated separately. The
five different structures in five different crystalline envi- reversible unfolding and refolding of five of these
ronments that differ from each other in the relative detached pieces could be followed by differential scan-
positions of its two structural domains.340 Such inde- ning calorimetry. These individual measurements could
pendently shifting domains that reorient within the be combined to show that the rather complex, fully
same protein over milliseconds or seconds should be dis- reversible calorimetric curve obtained with the intact
tinguished from domains that change their orientations protein was actually the sum of seven independent tran-
over millennia as related proteins diverge from each sitions.347 It is also possible to observe the independent
other during evolution. As two proteins that are derived
from a common ancestor diverge with time, structural * A differential scanning calorimeter is used to measure the differ-
domains often shift positions relative to each other even ence in the absorption of heat, as the temperature is raised,
though the internal structures of the domains them- between a solution containing a protein and an identical solution
selves remain superposable. Such evolutionarily shift- lacking the protein. Two cells, sample and reference, contain pre-
cisely matched coils that introduce identical quantities of heat into
ing domains can be documented by superposing the each of them and establish a constant rate of temperature increase.
crystallographic molecular models of related proteins. The sample cell has an auxiliary coil that provides the additional
When the crystallographic molecular models of NADH heat necessary to keep its temperature exactly the same as that of
peroxidase and glutathione reductase are compared, the reference cell. The power supplied to the auxiliary heater is a
each of the four structural domains in these two related measure of the excess heat absorbed by the sample, the endother-
mic heat flow. A protein unfolds, or melts, as the temperature rises,
proteins superposes on its partner, but their relative and this transition proceeds with the absorption of heat. This
positions in the two proteins are significantly shifted.341 absorption of heat is a convenient way to follow the progress of the
Significant shifts in the relative positions of the two inter- unfolding.
Domains 389
to assume the same structure it had in native ther- domains cause them to be held tightly together (Figure
molysin, that structure is now much less stable. But the 7–12). It is in proteins in which the domains were joined
results are also consistent with it assuming a completely long ago that such contacts between domains are the
different conformation and the protein displays no func- most extensive. In the chaperone protein PapD from
tion that would indicate that it is properly folded. E. coli, however, it is the random meander of the link
A claim that an independently folding domain has between the two domains that forms a hydrophobic core
been produced requires an unambiguous demonstration gluing together the two domains.90
that the domain can refold into the native conformation As a domain, by definition, is a structure that may
it assumes in the parent protein. For example, both be now or has been in the past an independent entity, the
kringle 4 of plasminogen, detached by digestion with an various categories (detachable, enzymatic, coenzymatic,
endopeptidase,360 and kringle 2 of tissue plasminogen functional, recurring, internally repeating, modular,
activator, detached genetically,361 can be unfolded and structural, independently unfolding, and independently
refolded. The refolded detached domains bind lysine as folding) are simply different ways of identifying mem-
they do when they are in their native conformation. The bers of a large group of fundamental units of protein
refolded kringle 2 also retains its affinity for plasminogen structure. This group represents all of the smaller units
activator inhibitor 1. When kringle 4 is unfolded, its from which the larger proteins that now exist were con-
cystines reduced to cysteines, and then refolded, the fact structed, and the various domains that now exist in any
that it has regained its native conformation is demon- one protein were at one time unique, unattached, stable,
strated by the ability of this refolded structure to enforce folded polypeptides that were the ancestors of those por-
the formation of only the properly paired cystines upon tions of the entire polypeptide now containing them.
its exposure to oxygen.360 In this case, the proper pairing These primordial proteins were then internally multi-
of the cysteines, located at distant positions in the amino plied or individually fused together during evolution by
acid sequence, is the result of their proper juxtaposition natural selection.
in a properly folded polypeptide. It is this role as a fundamental unit of evolution
The segments of polypeptide linking domains that lends luster to the title of domain and elicits the
together are of various types. Domains can be joined by desire to grant it. But the term domain should remain an
flexible links such as those connecting the Fc and operational designation, closely tied to the particular evi-
Fab domains of immunoglobulin G (Figure 7–13). The dence presented in each case. Problems can arise when
segments of polypeptide 35, 15, and 75 aa in length con- it is applied indiscriminately. In particular, it often hap-
necting the four enzymatic domains of the CAD multien- pens that when the term is used to describe a region of a
zyme complex are rich in proline and glycine (30%) and protein for a very specific reason, all of the connotations
the segments 27 and 24 aa in length connecting the three associated with it have a way of attaching themselves to
internally repeating lipoyl domains of dihydrolipoylly- that region. For example, a structural domain sublimi-
sine-residue acetyltransferase from E. coli are rich in pro- nally gains the status of an independently folding
line and alanine (73%). All of these links should be domain, or an enzymatic domain is assumed to be also a
unstructured and flexible. The amino acid segment detachable domain. Such confusion should be avoided.
–SKSSKEQKKKQK– connecting the two functional
domains of initiation factor IF3 from E. coli has been Suggested Reading
shown to be randomly disordered in the intact protein in
Ploegman, J.H., Drenth, G., Kalk, K.H., & Hol, W.G.J. (1978)
solution,362 and in the map of electron density for RNA Structure of bovine liver rhodanese I. Structure determination
recognition motifs from the Sex-lethal protein from at 2.5-Å resolution and a comparison of the conformation and
Drosophila melanogaster, the segment connecting these sequence of its two domains, J. Mol. Biol. 123, 557–594.
modular domains is missing owing to its disorder.281 It is Porter, R.R. (1959) The hydrolysis of rabbit g-globulin and antibod-
such extended, disordered segments that are susceptible ies with crystalline papain, Biochem. J. 73, 119–126.
to endopeptidases when domains are detached by diges- Miller, K.I., Cuff, M.E., Lang, W.F., Varga-Weisz, P., Field, K.G., &
tions. The long segment of 60 aa connecting the two van Holde, K.E. (1998) Sequence of the Octopus dofleini hemo-
modular SH2 domains in human protein-tyrosine kinase cyanin subunit: structural and evolutionary implications, J. Mol.
ZAP-70, however, forms a rigid, antiparallel, two- Biol. 278, 827–842.
stranded coiled coil of a helices.295
Domains can also be joined by short inflexible Problem 7–8: There is a protein in vertebrate liver
links, such as the one in dihydrofolate reductase– responsible for three enzymatic activities: phosphoribo-
thymidylate synthase (Figure 7–14), the interdomain sylamine–glycine ligase, phosphoribosylglycinamide
a helix between two spectrin domains (Figure 7–16B), or formyltransferase, and phosphoribosylformylglycinami-
the two connecting, internally repeating, modular EGF dine cyclo-ligase.363 It is composed of a single polypep-
domains 3, 4, and 5 from murine laminin g1.286 In other tide 1010 aa in length. When the protein was digested
instances, the segment connecting two domains may be with chymotrypsin, two products were produced that
structureless, but extensive contacts between the could be separated from each other. They were com-
Domains 391
58 58
299 299
276 276
423 423
30 381 30 381
330 330
328 328
502 502
1 1
392 Evolution
B
74 74
45 17 45 17
110 110
241 241
214 214
C
328 328
750 750
Molecular Taxonomy the same organism responsible for two different func-
tions are usually difficult if not impossible to relate to
The proteins observed today have evolved from a much each other. Thus the lineages of these universally distrib-
smaller group of less elaborate, primordial proteins, just uted proteins have remained almost unbranched since
as the species of organisms observed today have evolved the evolution of the earliest organisms, and the radiation
from a much smaller group of less elaborate primordial producing these lineages must have occurred before that
species. The primordial proteins are now represented time. It is also clear, however, from examining amino
by the domains of presently existing proteins. acid sequences and crystallographic molecular models
Establishing the evolutionary relationships among these that more specialized proteins have been arising contin-
species of domains, however, may be far more difficult uously throughout evolution and are still arising today.
than establishing the evolutionary relationships among These newer proteins are usually members of classes
the species of organisms. Unfortunately, there is no fossil peculiar to a particular kingdom or phylum of organisms,
record of proteins. It is also quite clear that the evolu- and one of the challenges is to identify their ancestral
tionary divergence that produced most of the proteins relationships to the more universally distributed pro-
that are universally distributed among present living teins.
organisms, for example, the metabolic enzymes, It is hoped that, as the number of tertiary structures
occurred before the divergence of the organisms them- elucidated by crystallography grows, an anatomical col-
selves. This follows from the observation that the amino lection of the proteins large enough to form the basis for
acid sequences of the proteins from all living organisms a comprehensive taxonomy can be assembled.211 When
responsible for one particular biological function are Linnaeus developed his taxonomic system of the organ-
usually able to be aligned or their crystallographic isms, it is possible that he was unaware of the reason for
molecular models to be superposed, but proteins from its existence. It was only when taxonomy was connected
Molecular Taxonomy 393
to the theory of evolution through natural selection that to the sequence of the hierarchy of the taxonomic system
an exercise in cataloguing became something more pro- of the organisms, which is species, genus, family, order,
found. At the present time, taxonomy in biology is one of class, phylum, kingdom.
the methods by which evolutionary relationships are The central concept upon which the present taxo-
established. The desire to establish the evolutionary his- nomic systems370–373 for classifying domains are based is
tory of the speciation of proteins has led, in an interest- the common fold. Although the exact definitions differ
ing inversion of history, to the formulation of a among the systems, two or more species of domains
taxonomic system of the proteins. share a common fold if they have the “same major sec-
The fundamental unit in a taxonomic system for ondary structures in the same arrangement with the
proteins is the domain. The history of most of the pro- same topological connections.” Different species of
teins that now exist is that of the random association of domains with the same common fold can have “periph-
domains, much as wildly different species are assembled eral elements of secondary structure and turn regions
into parasitic or symbiotic relationships or into ecosys- that differ in size and conformation.”371 It is within the
tems. Consequently, attempting to formulate a taxo- cores of their structures that the common fold exists, and
nomic system for proteins, just as assembling a loops connecting the elements of the common core can
taxonomic system for ecosystems, would be inappropri- differ significantly in their length and structure. For
ate. It is the domains that are the equivalent of species of example, the motor domains of kinesin and myosin
organisms. An individual domain is a domain of a par- share a common fold of an eight-stranded b sheet sand-
ticular amino acid sequence in a particular isoform of a wiched between two sets of three a helices, but the loops
particular protein found in a particular species of organ- connecting these elements of secondary structure differ
ism, for example, the doubly wound, parallel b sheet in dramatically in length. The short loop of five amino acids
isoform A of L-lactate dehydrogenase from S. acanthius between b strand 6 and b strand 7 and the short loop of
(Figure 7–11). A species of domains is a population con- 11 amino acids between a helix 4 and a helix 5 in kinesin
taining all of the individual domains found in the same are 221 and 142 aa long, respectively, in myosin.374
relative location in the same protein in all of its isoforms Although such insertions have little influence on the
in all of the species of organisms in which it is found. A selection of the common fold of a domain, they can have
protein is regarded as the same protein as another pro- a significant effect on the function of the protein, for
tein if both of them perform the same function in their example, turning a sulfotransferase into a dehy-
respective organisms and their two respective amino dratase.375
acid sequences can be significantly aligned over their It has been estimated that there are fewer than 1000
entire length or their complete tertiary structures can be common folds in existence;370 the estimate varies
superposed. The doubly wound, parallel b sheets in all of depending on the stringency with which a particular tax-
the isoforms of L-lactate dehydrogenases from all of the onomic system divides the species of domains into
species of organisms constitute a species of domains, as common folds. The number of common folds and the
do all of the globins from all of the species of organisms position of the concept of the common fold in the vari-
in which they are found. ous hierarchies are both quite close to the number and
As in populations of organisms, individual domains position of the class in the taxonomic hierarchy of living
of the same species can differ significantly. To add to the organisms. Homo sapiens belongs to the class
confusion, the names of the individual proteins composed Mammalia. The level of the common fold371 is also
from domains of the same species can often be different, referred to as the topology level,372 the level of the struc-
for example, cathepsin K from mammals and papain from ture type,211 or the level of structurally unique domains373
plants368 or ferredoxin–NADP+ reductase from mammals in the different taxonomic systems.
and thioredoxin-disulfide reductase from bacteria,342 but Particular species of domains can be selected as
an examination of their respective functions and an align- representatives of their common fold (Figure 7–19).211
ment of their respective amino acid sequences or a super- L-Lactate dehydrogenase domain 1 (Figure 7–19F) repre-
position of their respective crystallographic molecular sents the common fold of doubly wound, parallel
models establishes they are individuals of the same b sheets containing six b strands in the order 321456
species. Often the structures of individuals of the same (Figure 7–20).211 Five representatives from the large set of
species of domains, such as the single domains constitut- domains of this common fold376 are listed within the box
ing the lysozymes from animals and bacteriophage88 or the in Figure 7–20. Domains of the same common fold are
carbonate dehydratases II from animals and bacteria,369 often found in proteins with significantly different func-
have drifted apart significantly; but their functions iden- tions. The catalytic domain of aspartate–tRNA ligase and
tify them. It is the hundreds of thousands of different the domain constituting an entire molecule of
species of domains that are hierarchically classified in the asparagine synthase are of the same common fold,377 as
taxonomic system. The sequence of the hierarchy for the well as the domains constituting the entire molecules of
taxonomic system of domains is species, family, super- tumor necrosis factor and the coat protein of satellite
family, common fold, architecture. This can be compared tobacco necrosis virus.378
394 Evolution
b Subunit of hemoglobin
Molecular Taxonomy 395
Figure 7–19: A menagerie of representative tertiary structures from eight common folds of domains.211
Each of these cartoons has been drawn from the respective crystallographic molecular model. The flat
arrows represent strands of b structure; and the helical ribbons, a helices. (A) Myohemerythrin is an exam-
ple of an up-down-up-down, antiparallel a-helical bundle. (B) The b subunit of hemoglobin is an exam-
ple of a Greek key, antiparallel a-helical bundle. This can be seen if, from the amino terminus (indicated
by the arrow), the first long, bent a helix, the second short a helix, and the next four long a helices are
numbered 1 through 6, respectively, and it is assumed that a helix 1 has drifted 90 ∞ away from being par-
allel to a helix 6. (C) Domain 2 of papain is an example of an up-down-up-down b barrel if the last two
short and bent b strands are ignored. (D) Domain 2 of pyruvate kinase is an example of a Greek key,
antiparallel b barrel if the short strand of b structure, b strand 3, between b strands 2 and 4 is ignored. The
six strands of the Greek key are numbered as in Figure 7–21. (E) Domain 3 from tomato bushy stunt virus
is an example of a jelly roll, antiparallel b barrel if the first two, amino-terminal b strands are ignored. The
eight strands of the jelly roll are numbered as in Figure 7–21. (F) Domain 1 of L-lactate dehydrogenase is
an example of a doubly wound, parallel b sheet. (G) Triose-phosphate isomerase is an example of an
a-helically wound, parallel b barrel. (H) Domain 3 of glutathione-disulfide reductase is an example of an
open-faced b sandwich. Adapted with permission from ref 211. Copyright 1981 Academic Press.
The next higher level in the taxonomic hierarchy of from these two proteins with coincident structures share
domains is that of architecture.372 A set of different a different common fold.262,263
common folds with the same architecture have the same Examples of domains in the same family illustrate
clearly related spatial arrangement of secondary struc- the classification. The domains constituting the entire
tures even though they differ in the number of individual molecules of the hydrolases carboxymethylenebutenoli-
elements of secondary structure or differ by one or more dase, alkylhalidase, and carboxypeptidase D384 all belong
adjacent interchanges in the order in which those ele- to the same family. The single domains constituting the
ments of secondary structure are juxtaposed or in both entire molecules of the mammalian endopeptidases
their number and their order. A collection of particular factor D and trypsin are in the same family,385 as are
species of domains, each with a different common fold, those constituting the entire molecules of adenosine
can represent the architecture of doubly wound, parallel kinase and ribokinase.89 Phosphoglycerate dehydroge-
b sheets (Figure 7–20). This particular architecture,372 nase, L-2-hydroxyisocaproate dehydrogenase, D-lactate
however, has become so diverse379,380 that its systematic dehydrogenase, erythronate-4-phosphate dehydroge-
reorganization into several different architectures is nase, and glycerate dehydrogenase all have coincident
probably required. A systematic census of the members structures and catalyze similar reactions.386 All of their
of the architecture of open-faced b sheets (Figure 7–19H) domains 1 have one common fold and belong to one
has been taken.381 family, and all of their domains 2, which have a different
Whether or not there is a higher level in the hierar- common fold from that of the domains 1, belong to
chy than architecture is unresolved. For convenience, another family. The corresponding domains from cyclin-
groupings have been used in which domains are sorted dependent protein kinase 2, MAP protein kinase ERK2,
on the basis of whether they are formed entirely from and cyclic-AMP dependent protein kinase are in the
a helices, formed from a helices alternating with same respective families,387,388 as are the corresponding
b strands, formed from segregated a helices and b sheets, domains from aspartate-semialdehyde dehydrogenase
or formed entirely of b structure.372 These groups may and glyceraldehyde-3-phosphate dehydrogenase.120
have no evolutionary significance. For example, a-heli- The elaborations of the common fold of the
cally wound, parallel b barrels (Figure 7–19G) would be in domains within the same family can be dramatic. For
the alternating ab group, but glucan 1,4-a-glucosidase example, domains 1 of tyrosine phenol-lyase, cystathio-
from Aspergillus awamori is an a-helically wound, paral- nine b-lyase, ornithine decarboxylase, aspartate
lel a barrel382 that could be more closely related to a-heli- transaminase, phosphoserine transaminase, and adeno-
cally wound, parallel b barrels than to any entirely sylmethionine–8-amino-7-oxononanate transaminase
a-helical domain. If so, this would demonstrate that are members of the same family because they share the
b strands can become a helices, a possibility for which same common fold in which five b strands form the core,
there is evidence on the much smaller scale of single ele- their pyridoxal phosphates are located in the same posi-
ments of secondary structure.91,383 If such a transforma- tions relative to the core, and the reactions they catalyze
tion turns out to be common, higher groups based on are of the same type. The elements of the common fold
topological arrangements rather than type of secondary beyond the core, however, have drifted so far apart that
structure might turn out to be more appropriate. they cannot be superposed, and there are a number of
The level in the taxonomic hierarchy of domains additional peripheral elements of secondary structure
above that of species is that of family. The central crite- found in some species in the family but not in other
rion on which the level of family is usually based in the species.389 This example illustrates the ambiguity of the
classification of domains is that of the function and the upper limit for the level of family in the hierarchy.
structure of the protein containing the domain. A family Between the level of a family of domains, which is
of domains is a set containing all of the domains of the anchored in the coincident structures of the proteins
same common fold that are found at the same position in containing the domains and their related functions, and
the complete set of those proteins that have coincident the level of the common fold, which is anchored in the
structures and that perform related functions. Two pro- topological identity of their structures, is a region in the
teins have coincident structures when they are both taxonomic hierarchy in which there are, at the moment,
composed of the same number of domains, and the no consistent rules. This region is vaguely referred to as
domains found at the same respective positions in the the level of the superfamily. Domains are grouped in a
two proteins have the same common fold. The crystallo- superfamily if the proteins that contain them have coin-
graphic molecular models of proteins with coincident cident structures but significantly different functions.
structures are superposable, domain by domain, over For example, thiamine pyridinylase and maltose-binding
their entire length. Each of the consecutive folds can be, protein have coincident structures but significantly dif-
but is not necessarily, different. For example, the first and ferent functions.91 Although all of the members of the
second domains in benzoylformate decarboxylase and enolase superfamily do share the function of abstracting
the first and second domains in pyruvate decarboxylase a proton from a carbon a to a carboxylate, the superfam-
all share the same common fold, but the third domains ily has been divided into three families.118,390 Domains
Molecular Taxonomy 397
are also grouped in a superfamily if the proteins that con- the structure. This flattened cylinder can be rolled as the
tain them have only partially coincident structures. The tread on a caterpillar tractor to produce eight different
domains 2 and 3, respectively, from biotin carboxylase, barrels that resemble each other but that vary in the jux-
phosphoribosylamine–glycine ligase, synapsin Ia, D-ala- tapositions of the staves across the center. The connec-
nine–D-alanine ligase, and glutathione synthase have the tions between the segments, which define the
same common folds, and all of the domains 1 have the topological relations of the curve, remain unaltered
same architecture,92, 391 but the five proteins do not have during such rolling.
coincident structures. Whether their domains 2 and 3 are If the flattened cylinder in any of its guises is cut
of the same respective superfamilies or only of the same between the first and second segments in the hairpin
respective common folds is a question that illustrates the (segments 1 and 2 in Figure 7–21) and spread upon a
ambiguity of the upper limit of the level of superfamily in plane, a jelly roll211 (Figure 7–21C) is produced. Shorten
the hierarchy. the hairpin by removing the two most peripheral seg-
There are hundreds of common folds of domains, ments (cuts Ú in Figure 7–21 to remove segments 1 and
so a comprehensive discussion of even the architectures 8). A new flattened barrel is created (dotted lines in
into which these common folds are arranged is not pos- Figure 7–21B) with six staves that alternate in polarity. If
sible. There are some common folds and architectures, this smaller flattened cylinder is cut between the first and
however, that have regular structural patterns that stand last segments in the hairpin (segments 2 and 7 in Figure
out. Both the b helix (Figure 6–12) and the b propeller 7–21) and spread upon a plane, a Greek key211 (Figure
(Figure 6–13) define architectures of domains. One archi- 7–21D) is produced. Shorten the hairpin by removing the
tecture contains right-handed b helices; another, left- two most peripheral segments (cuts ‚ in Figure 7–21 to
handed b helices.392 Within the architecture of remove segments 2 and 7). A new flattened barrel is cre-
b propellers, the number of blades determines the ated with four staves that alternate in polarity. If this
common fold to which a particular member cylinder is cut between the first and last segments in the
belongs.393,394 hairpin (segments 3 and 6 in Figure 7–21) and flattened
A parallel b barrel (Figure 6–11) is usually wound upon a plane, an up-down-up-down211 pattern is pro-
completely by a helices (Problem 7–10B), as is the paral- duced.
lel b barrel constituting the entire folded polypeptide of The polar curve in the topological exercise can be
triose-phosphate isomerase (Figure 7–19G). An a helix substituted with a polypeptide either as a strand of
connects each stave of the barrel to the next. The number b structure or in an a helix, and the staves of the flattened
of staves in such a regularly a-helically wound, parallel barrel will be either strands of b structure or a helices,
b barrel determines the common fold to which it respectively. If they are strands of b structure, they are
belongs. The common fold with the largest population is gathered as systematically antiparallel (Figure 7–21B)
that in which the barrel has eight b strands. There are a pleated sheets (Figure 4–16C) into an antiparallel b bar-
large number of enzymes each of whose entire struc- rel (Figures 7–12 and 7–13). The b jelly roll is represented
ture395 or the majority of each of whose structure390 is an by the cohesin domain (Figure 6–21)288 and domain 3 of
a-helically wound, parallel b barrel of eight strands. the coat protein of tomato bushy stunt virus (Figure
There are often additional elements of secondary struc- 7–19E). There are a large number of viral coat proteins
ture found in the loops connecting an a helix of the each containing a domain of this class.399 There are elab-
winding to a b strand in an a-helically wound, parallel orations on this topological arrangement; for example,
b barrel,396,397 but these are found at the periphery of the each of the spermadhesins from seminal fluid is com-
structure and do not disrupt the common fold. posed of a single domain that represents a class of
a-Helically wound, parallel b barrels can stand alone or domains in which two consecutive antiparallel b strands
have other domains attached to them as in pyruvate are added to a jelly roll (Figure 7–21C) at the amino-ter-
kinase, phosphopyruvate hydratase,398 and the R1 pro- minal end of the polypeptide (strands 0 and –1). These
tein of ribonucleoside-diphosphate reductase.397 additional b strands are inserted into the barrel (Figure
In a number of the common folds of domains 7–21B) between staves 1 and 2.400,401
(Figure 7–19A–E), the structures observed seem to arise The b Greek key is represented by domain 2 of
from a reasonable topological operation (Figure 7–21)211 pyruvate kinase (Figure 7–19D). There are also elabora-
that could explain their creation. Consider a polar curve tions on this theme. The members of the class of
that doubles back upon itself to form a hairpin. Twist the immunoglobulin modular domains (Figure 7–13 and
hairpin thus formed so that it folds into two turns of a Table 7–7) are b Greek keys (Figure 7–21D) in which an
right-handed superhelix (Figure 7–21A). Compress this antiparallel b strand is added at the carboxy-terminal
superhelix until its neighboring segments both in front end of the polypeptide (strand 8), but unlike the strand 8
and behind come in contact and then incorporate the in a jelly roll, which would be located between strands 2
segments into the surface of a flattened cylinder (Figure and 3 of the barrel if strand 1 were deleted (Figure 7–21B),
7–21B). This produces a flattened barrel with eight staves the strand 8 in the immunoglobulin class is found
the polarities of which alternate as one proceeds around between strands 2 and 7 of the barrel.90,402–405 The class of
398 Evolution
B
1 8 2 3 7 6 4 5
1 1
1 2
1 8 3 6 5 4 7 2 2 3 6 5 4 7 6 5 4 3
C D E
polarity opposite to that shown in Figure 7–21B. Because 4. Gray, J.V., Golinelli-Pimpaneau, B., & Knowles, J.R.
the barrel is generated by a helical conformation of the (1990) Biochemistry 29, 376–383.
hairpin, there are two possible twists to the helix, the 5. Doolittle, R.F. (1979) in The Proteins (Neurath, H., &
right-handed one shown in Figure 7–21A and the left- Hill, R. L., Eds.) Vol. IV, pp 1–118, Academic Press, New
handed one. For example, the six-stranded b barrel York.
6. King, J.L., & Jukes, T.H. (1969) Science 164, 788–798.
forming the common fold in the family of domains con-
7. Perutz, M.F., & Lehmann, H. (1968) Nature 219,
taining growth hormones, interleukins, and granulocyte- 902–909.
colony-stimulating factor is a b Greek key in which the 8. Reidhaar-Olson, J.F., & Sauer, R.T. (1988) Science 241,
superhelix is left-handed.408 Because two polarities are 53–57.
possible and two twists are possible, there are four dis- 9. Xu, Z., Bernlohr, D.A., & Banaszak, L.J. (1992)
tinct geometries for the jelly roll, four for the Greek key, Biochemistry 31, 3484–3492.
and two for the up-down-up-down conformation. 10. Drinkwater, C.C., Evans, B.A., & Richards, R.I. (1988) J.
It is possible that the regular structures such as the Biol. Chem. 263, 8565–8568.
b helix, the b propeller, the jelly roll, and the Greek key 11. Pohjanjoki, P., Lahti, R., Goldman, A., & Cooperman,
represent evolutionarily efficient topological solutions B.S. (1998) Biochemistry 37, 1754–1761.
to the problem of folding a polypeptide. Within an archi- 12. Zhang, M., Van Etten, R.L., & Stauffacher, C.V. (1994)
Biochemistry 33, 11097–11105.
tecture of domains or even among the members of the
13. Kanaya, S., Kohara, A., Miura, Y., Sekiguchi, A., Iwai, S.,
same common fold, a good deal of variation is observed; Inoue, H., Ohtsuka, E., & Ikehara, M. (1990) J. Biol.
either individual elements of secondary structure in the Chem. 265, 4615–4621.
common topological pattern do not superpose very well 14. Hege, T., & Baumann, U. (2002) J. Mol. Biol. 314,
or extensive peripheral elements of secondary structure 181–186.
are found in one member of a class that are not found in 15. Lavrukhin, O.V., & Lloyd, R.S. (2000) Biochemistry 39,
others. If these variations represent the drift in the loca- 15266–15271.
tions of the elements of secondary structure from those 16. Taton, M., Husselstein, T., Benveniste, P., & Rahier, A.
they had in the common ancestor or the insertion of ele- (2000) Biochemistry 39, 701–711.
ments following the divergence from a common ances- 17. Laskowski, M., Jr., Kato, I., Ardelt, W., Cook, J., Denton,
tor, they reflect the degree to which these domains are A., Empie, M.W., Kohr, W.J., Park, S.J., Parks, K.,
Schatzley, B.L., et al. (1987) Biochemistry 26, 202–221.
evolutionarily related to each other, and the taxonomic
18. Margoliash, E., & Schejter, A. (1966) Adv. Protein Chem.
system for domains parallels the taxonomic system for 21, 113–286.
species. If many of the variations observed, however, 19. Forry-Schaudies, S., Maihle, N.J., & Hughes, S.H. (1990)
state that the domains being compared differ so dramat- J. Mol. Biol. 211, 321–330.
ically because they do not share a common ancestor, 20. Hendriks, W., Sanders, J., de Leij, L., Ramaekers, F.,
they reflect the fact that the structure is a particularly Bloemendal, H., & de Jong, W.W. (1988) Eur. J.
favorable solution to folding a polypeptide that has been Biochem. 174, 133–137.
exploited many times by convergent evolution and the 21. Noguchi, T., Yamada, K., Inoue, H., Matsuda, T., &
taxonomic systems for domains overstate their phyloge- Tanaka, T. (1987) J. Biol. Chem. 262, 14366–14371.
netic information. 22. Weiss, C., Zeng, Y., Huang, J., Sobocka, M.B., &
Rushbrook, J.I. (2000) Biochemistry 39, 1807–1816.
Suggested Reading 23. Haase, G.H., Brune, M., Reinstein, J., Pai, E.F., Pingoud,
A., & Wittinghofer, A. (1989) J. Mol. Biol. 207, 151–162.
Richardson, J.S. (1981) Protein anatomy, Adv. Protein Chem. 34, 24. MacKintosh, R.W., Haycox, G., Hardie, D.G., & Cohen,
167–339. P.T. (1990) FEBS Lett. 276, 156–160.
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., & 25. Doolittle, R.F. (1981) Science 214, 149–159.
Thornton, J.M. (1997) CATH—A hierarchic classification of pro- 26. Needleman, S.B., & Wunsch, C.D. (1970) J. Mol. Biol. 48,
tein domain structures, Structure 5, 1093–1108. 443–453.
27. Gibbs, A.J., & McIntyre, G.A. (1970) Eur. J. Biochem. 16,
Problem 7–11: Construct a phylogenetic tree from 1–11.
28. Vogt, G., Etzold, T., & Argos, P. (1995) J. Mol. Biol. 249,
Figure 7–20.
816–831.
29. Feng, D.F., Johnson, M.S., & Doolittle, R.F. (1984) J.
References Mol. Evol. 21, 112–125.
1. Hardy, D.O., Bender, P.K., & Kretsinger, R.H. (1988) J. 30. McLachlan, A.D. (1972) J. Mol. Biol. 64, 417–437.
Mol. Biol. 199, 223–227. 31. Staden, R. (1982) Nucleic Acids Res. 10, 2951–2961.
2. Dayhoff, M.O. (1972) Atlas of Protein Sequence and 32. Brenner, S.E., Chothia, C., & Hubbard, T.J. (1998) Proc.
Structure, Vol. 5, National Biomedical Research Natl. Acad. Sci. U.S.A. 95, 6073–6078.
Foundation, Silver Spring, MD. 33. Jue, R.A., Woodbury, N.W., & Doolittle, R.F. (1980) J.
3. Dayhoff, M.O. (1978) Atlas of Protein Sequence and Mol. Evol. 15, 129–148.
Structure, Vol. 5, Suppl. 3, National Biomedical 34. Johnson, M.S., & Doolittle, R.F. (1986) J. Mol. Evol. 23,
Research Foundation, Silver Spring, MD. 267–278.
400 Evolution
35. Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, 64. Monteiro, M.J., & Cleveland, D.W. (1988) J. Mol. Biol.
D., Hubbard, T., & Chothia, C. (1998) J. Mol. Biol. 284, 199, 439–446.
1201–1210. 65. Meyer, U., Benghezal, M., Imhof, I., & Conzelmann, A.
36. Pearson, W.R. (1998) J. Mol. Biol. 276, 71–84. (2000) Biochemistry 39, 3461–3471.
37. Altschul, S.F., & Gish, W. (1996) Methods Enzymol. 266, 66. Crawford, D.L., Constantino, H.R., & Powers, D.A.
460–480. (1989) Mol. Biol. Evol. 6, 369–383.
38. Thayer, M.M., Flaherty, K.M., & McKay, D.B. (1991) J. 67. Li, S.S., Fitch, W.M., Pan, Y.C., & Sharief, F.S. (1983) J.
Biol. Chem. 266, 2864–2871. Biol. Chem. 258, 7029–7032.
39. Hyde, S.C., Emsley, P., Hartshorn, M.J., Mimmack, M.M., 68. Xu, X., & Doolittle, R.F. (1990) Proc. Natl. Acad. Sci.
Gileadi, U., Pearce, S.R., Gallagher, M.P., Gill, D.R., U.S.A. 87, 2097–2101.
Hubbard, R.E., & Higgins, C.F. (1990) Nature 346, 362–365. 69. Wu, G., Fiser, A., ter Kuile, B., Sali, A., & Muller, M.
40. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., & (1999) Proc. Natl. Acad. Sci. U.S.A. 96, 6285–6290.
Lipman, D.J. (1990) J. Mol. Biol. 215, 403–410. 70. Weber, K., Plessmann, U., & Ulrich, W. (1989) EMBO J.
41. Pearson, W.R. (1990) Methods Enzymol. 183, 63–98. 8, 3221–3227.
42. Smith, T.F., & Waterman, M.S. (1981) J. Mol. Biol. 147, 71. Zimniak, L., Dittrich, P., Gogarten, J.P., Kibak, H., &
195–197. Taiz, L. (1988) J. Biol. Chem. 263, 9102–9112.
43. Philpott, C.C., Klausner, R.D., & Rouault, T.A. (1994) 72. Griffin, T.A., Lau, K.S., & Chuang, D.T. (1988) J. Biol.
Proc. Natl. Acad. Sci. U.S.A. 91, 7321–7325. Chem. 263, 14008–14014.
44. Read, J., Pearce, J., Li, X., Muirhead, H., Chirgwin, J., & 73. Tomkinson, B., & Jonsson, A.K. (1991) Biochemistry 30,
Davies, C. (2001) J. Mol. Biol. 309, 447–463. 168–174.
45. Takahashi, S., Kuzuyama, T., Watanabe, H., & Seto, H. 74. Chang, Y.Y., Wang, A.Y., & Cronan, J.E., Jr. (1993) J. Biol.
(1998) Proc. Natl. Acad. Sci. U.S.A. 95, 9879–9884. Chem. 268, 3911–3919.
46. Ziegler, G.A., & Schulz, G.E. (2000) Biochemistry 39, 75. Vlahos, C.J., & Dekker, E.E. (1988) J. Biol. Chem. 263,
10986–10995. 11683–11691.
47. Klenk, H.P., Clayton, R.A., Tomb, J.F., White, O., Nelson, 76. Banfield, M.J., & Brady, R.L. (2000) J. Mol. Biol. 297,
K.E., Ketchum, K.A., Dodson, R.J., Gwinn, M., Hickey, E.K., 1159–1170.
Peterson, J.D., Richardson, D.L., Kerlavage, A.R., Graham, 77. Haller, T., Buckel, T., Retey, J., & Gerlt, J.A. (2000)
D.E., Kyrpides, N.C., Fleischmann, R.D., Quackenbush, Biochemistry 39, 4622–4629.
J., Lee, N.H., Sutton, G.G., Gill, S., Kirkness, E.F., 78. Bond, M.D., & Strydom, D.J. (1989) Biochemistry 28,
Dougherty, B.A., McKenney, K., Adams, M.D., Loftus, B., 6110–6113.
Venter, J.C., et al. (1997) Nature 390, 364–370. 79. Wistow, G., & Piatigorsky, J. (1987) Science 236,
48. Moore, G.W., Goodman, M., Callahan, C., Holmquist, 1554–1556.
R., & Moise, H. (1976) J. Mol. Biol. 105, 15–37. 80. Gulick, A.M., Palmer, D.R., Babbitt, P.C., Gerlt, J.A., &
49. Feng, D.F., Cho, G., & Doolittle, R.F. (1997) Proc. Natl. Rayment, I. (1998) Biochemistry 37, 14358–14368.
Acad. Sci. U.S.A. 94, 13028–13033. 81. Mross, G.A., Doolittle, R.F., & Roberts, B.F. (1970)
50. Lenstra, J.A., & Beintema, J.J. (1979) Eur. J. Biochem. 98, Science 170, 468–470.
399–408. 82. Rossmann, M.G., Moras, D., & Olsen, K.W. (1974)
51. Bishop, J.G., Dean, A.M., & Mitchell-Olds, T. (2000) Nature 250, 194–199.
Proc. Natl. Acad. Sci. U.S.A. 97, 5322–5327. 83. Rossmann, M.G., & Argos, P. (1976) J. Mol. Biol. 105,
52. Ambler, R.P., & Wynn, M. (1973) Biochem. J. 131, 75–95.
485–498. 84. Remington, S.J., Woodbury, R.G., Reynolds, R.A.,
53. Grishin, N.V. (1995) J. Mol. Evol. 41, 675–679. Matthews, B.W., & Neurath, H. (1988) Biochemistry 27,
54. Feng, D.F., & Doolittle, R.F. (1997) J. Mol. Evol. 44, 8097–8105.
361–370. 85. Bazan, J.F., Weaver, L.H., Roderick, S.L., Huber, R., &
55. Fitch, W.M., & Margoliash, E. (1967) Science 155, Matthews, B.W. (1994) Proc. Natl. Acad. Sci. U.S.A. 91,
279–284. 2473–2477.
56. Ayala, F.J., Rzhetsky, A., & Ayala, F.J. (1998) Proc. Natl. 86. Lah, M.S., Dixon, M.M., Pattridge, K.A., Stallings, W.C.,
Acad. Sci. U.S.A. 95, 606–611. Fee, J.A., & Ludwig, M.L. (1995) Biochemistry 34,
57. Feng, D.F., & Doolittle, R.F. (1987) J. Mol. Evol. 25, 1646–1660.
351–360. 87. Aleshin, A.E., Zeng, C., Bourenkov, G.P., Bartunik, H.D.,
58. Venkatesh, B., Erdmann, M.V., & Brenner, S. (2001) Fromm, H.J., & Honzatko, R.B. (1998) Structure 6,
Proc. Natl. Acad. Sci. U.S.A. 98, 11382–11387. 39–50.
59. Baldauf, S.L., Roger, A.J., Wenk-Siefert, I., & Doolittle, 88. Evrard, C., Fastrez, J., & Declercq, J.P. (1998) J. Mol.
W.F. (2000) Science 290, 972–977. Biol. 276, 151–164.
60. Loytynoja, A., & Milinkovitch, M.C. (2001) Proc. Natl. 89. Mathews, I.I., Erion, M.D., & Ealick, S.E. (1998)
Acad. Sci. U.S.A. 98, 10202–10207. Biochemistry 37, 15607–15620.
61. Mross, G.A., & Doolittle, R.F. (1967) Arch. Biochem. 90. Holmgren, A., & Branden, C.I. (1989) Nature 342,
Biophys. 122, 674–684. 248–251.
62. Jermann, T.M., Opitz, J.G., Stackhouse, J., & Benner, 91. Campobasso, N., Costello, C.A., Kinsland, C., Begley,
S.A. (1995) Nature 374, 57–59. T.P., & Ealick, S.E. (1998) Biochemistry 37, 15981–15989.
63. Bradac, J.A., Gruber, C.E., Forry-Schaudies, S., & 92. Wang, W., Kappock, T.J., Stubbe, J., & Ealick, S.E. (1998)
Hughes, S.H. (1989) Mol. Cell Biol. 9, 185–192. Biochemistry 37, 15647–15662.
References 401
93. Przylas, I., Tomoo, K., Terada, Y., Takaha, T., Fujii, K., 121. Krishnaswamy, S., & Rossmann, M.G. (1990) J. Mol.
Saenger, W., & Strater, N. (2000) J. Mol. Biol. 296, Biol. 211, 803–844.
873–886. 122. Al-Lazikani, B., Sheinerman, F.B., & Honig, B. (2001)
94. Takano, T. (1977) J. Mol. Biol. 110, 537–568. Proc. Natl. Acad. Sci. U.S.A. 98, 14796–14801.
95. Poljak, R.J. (1978) CRC Crit. Rev. Biochem. 5, 45–84. 123. Risler, J.L., Delorme, M.O., Delacroix, H., & Henaut, A.
96. Biesecker, G., Harris, J.I., Thierry, J.C., Walker, J.E., & (1988) J. Mol. Biol. 204, 1019–1029.
Wonacott, A.J. (1977) Nature 266, 328–333. 124. Johnson, M.S., & Overington, J.P. (1993) J. Mol. Biol.
97. Kossiakoff, A.A., Chambers, J.L., Kay, L.M., & Stroud, 233, 716–738.
R.M. (1977) Biochemistry 16, 654–664. 125. Sielecki, A.R., Fedorov, A.A., Boodhoo, A., Andreeva,
98. Tang, J., James, M.N., Hsu, I.N., Jenkins, J.A., & N.S., & James, M.N. (1990) J. Mol. Biol. 214, 143–170.
Blundell, T.L. (1978) Nature 271, 618–621. 126. Louie, G.V., & Brayer, G.D. (1990) J. Mol. Biol. 214,
99. Bolin, J.T., Filman, D.J., Matthews, D.A., Hamlin, R.C., 527–555.
& Kraut, J. (1982) J. Biol. Chem. 257, 13650–13662. 127. Bilwes, A., Rees, B., Moras, D., Menez, R., & Menez, A.
100. Hynes, T.R., Randal, M., Kennedy, L.A., Eigenbrot, C., (1994) J. Mol. Biol. 239, 122–136.
& Kossiakoff, A.A. (1990) Biochemistry 29, 10018–10022. 128. Louie, G.V., Hutcheon, W.L., & Brayer, G.D. (1988) J.
101. Chothia, C., & Lesk, A.M. (1986) EMBO J. 5, 823– Mol. Biol. 199, 295–314.
826. 129. Phillips, S.E. (1980) J. Mol. Biol. 142, 531–554.
102. Sielecki, A.R., Hayakawa, K., Fujinaga, M., Murphy, 130. Chothia, C., & Lesk, A.M. (1987) Cold Spring Harbor
M.E., Fraser, M., Muir, A.K., Carilli, C.T., Lewicki, J.A., Symp. Quant. Biol. 52, 399–405.
Baxter, J.D., & James, M.N. (1989) Science 243, 131. Scott, D.L., White, S.P., Otwinowski, Z., Yuan, W., Gelb,
1346–1351. M.H., & Sigler, P.B. (1990) Science 250, 1541–1546.
103. MacRae, I.J., Segel, I.H., & Fisher, A.J. (2000) 132. Xue, Y., Lipscomb, W.N., Graf, R., Schnappauf, G., &
Biochemistry 39, 1613–1621. Braus, G. (1994) Proc. Natl. Acad. Sci. U.S.A. 91,
104. Serrano, R., Kielland-Brandt, M.C., & Fink, G.R. (1986) 10814–10818.
Nature 319, 689–693. 133. Gourley, D.G., Shrive, A.K., Polikarpov, I., Krell, T.,
105. Kyte, J. (1981) Nature 292, 201–204. Coggins, J.R., Hawkins, A.R., Isaacs, N.W., & Sawyer, L.
106. Ochi, H., Hata, Y., Tanaka, N., Kakudo, M., Sakurai, T., (1999) Nat. Struct. Biol. 6, 521–525.
Aihara, S., & Morita, Y. (1983) J. Mol. Biol. 166, 407–418. 134. Tainer, J.A., Getzoff, E.D., Beem, K.M., Richardson, J.S.,
107. Meyer, T.E., & Kamen, M.D. (1982) Adv. Protein Chem. & Richardson, D.C. (1982) J. Mol. Biol. 160, 181–
35, 105–212. 217.
108. Almassy, R.J., & Dickerson, R.E. (1978) Proc. Natl. Acad. 135. Cooper, J.B., McIntyre, K., Badasso, M.O., Wood, S.P.,
Sci. U.S.A. 75, 2674–2678. Zhang, Y., Garbe, T.R., & Young, D. (1995) J. Mol. Biol.
109. Benini, S., Gonzalez, A., Rypniewski, W.R., Wilson, K.S., 246, 531–544.
Van Beeumen, J.J., & Ciurli, S. (2000) Biochemistry 39, 136. Yeh, A.P., Chatelet, C., Soltis, S.M., Kuhn, P., Meyer, J.,
13115–13126. & Rees, D.C. (2000) J. Mol. Biol. 300, 587–595.
110. Goddette, D.W., Paech, C., Yang, S.S., Mielenz, J.R., 137. Crane, B.R., Arvai, A.S., Gachhui, R., Wu, C., Ghosh,
Bystroff, C., Wilke, M.E., & Fletterick, R.J. (1992) J. Mol. D.K., Getzoff, E.D., Stuehr, D.J., & Tainer, J.A. (1997)
Biol. 228, 580–595. Science 278, 425–431.
111. Tahirov, T.H., Oki, H., Tsukihara, T., Ogasahara, K., 138. McPhalen, C.A., & James, M.N. (1988) Biochemistry 27,
Yutani, K., Ogata, K., Izu, Y., Tsunasawa, S., & Kato, I. 6582–6598.
(1998) J. Mol. Biol. 284, 101–124. 139. Baker, P.J., Sawa, Y., Shibata, H., Sedelnikova, S.E., &
112. Endrizzi, J.A., Breddam, K., & Remington, S.J. (1994) Rice, D.W. (1998) Nat. Struct. Biol. 5, 561–567.
Biochemistry 33, 11106–11120. 140. Mattevi, A., Vanoni, M.A., Todone, F., Rizzi, M.,
113. Sagermann, M., Baase, W.A., & Matthews, B.W. (1999) Teplyakov, A., Coda, A., Bolognesi, M., & Curti, B.
Proc. Natl. Acad. Sci. U.S.A. 96, 6078–6083. (1996) Proc. Natl. Acad. Sci. U.S A. 93, 7496–7501.
114. Timkovich, R., & Dickerson, R.E. (1976) J. Biol. Chem. 141. Ohlsson, I., Nordstreom, B., & Breandaen, C.I. (1974) J.
251, 4033–4046. Mol. Biol. 89, 339–354.
115. Chothia, C., & Lesk, A.M. (1985) J. Mol. Biol. 182, 142. Banks, R.D., Blake, C.C., Evans, P.R., Haser, R., Rice,
151–158. D.W., Hardy, G.W., Merrett, M., & Phillips, A.W. (1979)
116. Eriksson, A.E., Cousens, L.S., Weaver, L.H., & Nature 279, 773–777.
Matthews, B.W. (1991) Proc. Natl. Acad. Sci. U.S.A. 88, 143. Fletterick, R.J., Sygusch, J., Semple, H., & Madsen, N.B.
3441–3445. (1976) J. Biol. Chem. 251, 6142–6146.
117. Lesk, A.M., & Chothia, C. (1980) J. Mol. Biol. 136, 144. Kolker, E., & Trifonov, E.N. (1995) Proc. Natl. Acad. Sci.
225–270. U.S.A. 92, 557–560.
118. Babbitt, P.C., Hasson, M.S., Wedekind, J.E., Palmer, 145. Tong, L., Wengler, G., & Rossmann, M.G. (1993) J. Mol.
D.R., Barrett, W.C., Reed, G.H., Rayment, I., Ringe, D., Biol. 230, 228–247.
Kenyon, G.L., & Gerlt, J.A. (1996) Biochemistry 35, 146. Larsen, T.M., Laughlin, L.T., Holden, H.M., Rayment, I.,
16489–16501. & Reed, G.H. (1994) Biochemistry 33, 6301–6309.
119. Volbeda, A., Lahm, A., Sakiyama, F., & Suck, D. (1991) 147. Rossmann, M.G., & Argos, P. (1975) J. Biol. Chem. 250,
EMBO J. 10, 1607–1618. 7525–7532.
120. Hadfield, A., Kryger, G., Ouyang, J., Petsko, G.A., Ringe, 148. Karplus, P.A., Daniels, M.J., & Herriott, J.R. (1991)
D., & Viola, R. (1999) J. Mol. Biol. 289, 991–1002. Science 251, 60–66.
402 Evolution
149. Bruns, C.M., & Karplus, P.A. (1995) J. Mol. Biol. 247, 176. Simmer, J.P., Kelly, R.E., Rinker, A.G., Jr.,
125–145. Zimmermann, B.H., Scully, J.L., Kim, H., & Evans, D.R.
150. Porter, R.R. (1959) Biochem. J. 73, 119–126. (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 174–178.
151. Harris, L.J., Larson, S.B., Hasel, K.W., & McPherson, A. 177. Simmer, J.P., Kelly, R.E., Scully, J.L., Grayson, D.R.,
(1997) Biochemistry 36, 1581–1597. Rinker, A.G., Jr., Bergh, S.T., & Evans, D.R. (1989) Proc.
152. Saphire, E.O., Parren, P.W., Pantophlet, R., Zwick, M.B., Natl. Acad. Sci. U.S.A. 86, 4382–4386.
Morris, G.M., Rudd, P.M., Dwek, R.A., Stanfield, R.L., 178. Kretschmer, M., Langer, C., & Prinz, W. (1993)
Burton, D.R., & Wilson, I.A. (2001) Science 293, Biochemistry 32, 11143–11148.
1155–1159. 179. Mally, M.I., Grayson, D.R., & Evans, D.R. (1981) Proc.
153. Silverton, E.W., Navia, M.A., & Davies, D.R. (1977) Proc. Natl. Acad. Sci. U.S.A. 78, 6647–6651.
Natl. Acad. Sci. U.S.A. 74, 5140–5144. 180. Taniuchi, H., & Anfinsen, C.B. (1971) J. Biol. Chem. 246,
154. Betton, J.M., Desmadril, M., & Yon, J.M. (1989) 2291–2301.
Biochemistry 28, 5421–5428. 181. Guillou, F., Rubino, S.D., Markovitz, R.S., Kinney, D.M.,
155. Yamaguchi, H., Kato, H., Hata, Y., Nishioka, T., Kimura, & Lusty, C.J. (1989) Proc. Natl. Acad. Sci. U.S.A. 86,
A., Oda, J., & Katsube, Y. (1993) J. Mol. Biol. 229, 8304–8308.
1083–1100. 182. Teplyakov, A., Obmolova, G., Badet, B., & Badet-
156. Tomaszek, T.A., Jr., Moore, M.L., Strickler, J.E., Denisot, M.A. (2001) J. Mol. Biol. 313, 1093–1102.
Sanchez, R.L., Dixon, J.S., Metcalf, B.W., Hassell, A., 183. Knighton, D.R., Kan, C.C., Howland, E., Janson, C.A.,
Dreyer, G.B., Brooks, I., Debouck, C., Meek, T.D., & Hostomska, Z., Welsh, K.M., & Matthews, D.A. (1994)
Lewis, M. (1992) Biochemistry 31, 10153–10168. Nat. Struct. Biol. 1, 186–194.
157. Baldwin, E.T., Bhat, T.N., Gulnik, S., Hosur, M.V., 184. Hawkins, A.R., & Smith, M. (1991) Eur. J. Biochem. 196,
Sowder, R.C.N., Cachau, R.E., Collins, J., Silva, A.M., & 717–724.
Erickson, J.W. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 185. Keesey, J.K., Jr., Bigelis, R., & Fink, G.R. (1979) J. Biol.
6796–6800. Chem. 254, 7427–7433.
158. Rubenstein, D.S., Enghild, J.J., & Pizzo, S.V. (1991) J. 186. Barford, D., Flint, A.J., & Tonks, N.K. (1994) Science 263,
Biol. Chem. 266, 11252–11261. 1397–1404.
159. Husten, E.J., & Eipper, B.A. (1991) J. Biol. Chem. 266, 187. Caughey, I., & Kekwick, R.G. (1982) Eur. J. Biochem.
17004–17010. 123, 553–561.
160. Southerland, W.M., Winge, D.R., & Rajagopalan, K.V. 188. Shimakata, T., & Stumpf, P.K. (1982) Arch. Biochem.
(1978) J. Biol. Chem. 253, 8747–8752. Biophys. 218, 77–91.
161. Appell, K.C., & Low, P.S. (1981) J. Biol. Chem. 256, 189. Mohamed, A.H., Chirala, S.S., Mody, N.H., Huang,
11104–11111. W.Y., & Wakil, S.J. (1988) J. Biol. Chem. 263, 12315–
162. Eberhard, M., Tsai-Pflugfelder, M., Bolewska, K., 12325.
Hommel, U., & Kirschner, K. (1995) Biochemistry 34, 190. Chirala, S.S., Kuziora, M.A., Spector, D.M., & Wakil, S.J.
5419–5428. (1987) J. Biol. Chem. 262, 4231–4240.
163. Wenk, M., Baumgartner, R., Holak, T.A., Huber, R., 191. Holzer, K.P., Liu, W., & Hammes, G.G. (1989) Proc. Natl.
Jaenicke, R., & Mayr, E.M. (1999) J. Mol. Biol. 286, Acad. Sci. U.S.A. 86, 4387–4391.
1533–1545. 192. Joshi, A.K., & Smith, S. (1993) J. Biol. Chem. 268,
164. Bustos, S.A., & Schleif, R.F. (1993) Proc. Natl. Acad. Sci. 22508–22513.
U.S.A. 90, 5638–5642. 193. Chirala, S.S., Huang, W.Y., Jayakumar, A., Sakai, K., &
165. Jhee, K.H., McPhie, P., & Miles, E.W. (2000) Wakil, S.J. (1997) Proc. Natl. Acad. Sci. U.S.A. 94,
Biochemistry 39, 10548–10556. 5588–5593.
166. Leistler, B., & Perham, R.N. (1994) Biochemistry 33, 194. Chang, S.I., & Hammes, G.G. (1989) Proc. Natl. Acad.
2773–2781. Sci. U.S.A. 86, 8373–8376.
167. Sibilli, L., Le Bras, G., Le Bras, G., & Cohen, G.N. (1981) 195. Haese, A., Schubert, M., Herrmann, M., & Zocher, R.
J. Biol. Chem. 256, 10228–10230. (1993) Mol. Microbiol. 7, 905–914.
168. Vaeron, M., Falcoz-Kelly, F., & Cohen, G.N. (1972) Eur. 196. Billich, A., & Zocher, R. (1987) Biochemistry 26, 8417–
J. Biochem. 28, 520–527. 8423.
169. Parsot, C., & Cohen, G.N. (1988) J. Biol. Chem. 263, 197. Weinreb, P.H., Quadri, L.E., Walsh, C.T., & Zuber, P.
14654–14660. (1998) Biochemistry 37, 1575–1584.
170. Rechler, M.M., & Bruni, C.B. (1971) J. Biol. Chem. 246, 198. Pieper, R., Ebert-Khosla, S., Cane, D., & Khosla, C.
1806–1813. (1996) Biochemistry 35, 2054–2060.
171. Chen, H.P., & Marsh, E.N. (1997) Biochemistry 36, 199. Wu, N., Kudo, F., Cane, D.E., & Khosla, C. (2000) J. Am.
14939–14945. Chem. Soc. 122, 4847–4852.
172. Bein, K., Simmer, J.P., & Evans, D.R. (1991) J. Biol. 200. Xue, Y., & Sherman, D.H. (2000) Nature 403, 571–
Chem. 266, 3791–3799. 575.
173. Coleman, P.F., Suttle, D.P., & Stark, G.R. (1977) J. Biol. 201. Berg, A., Vervoort, J., & de Kok, A. (1996) J. Mol. Biol.
Chem. 252, 6379–6385. 261, 432–442.
174. Simmer, J.P., Kelly, R.E., Rinker, A.G., Jr., Scully, J.L., & 202. Lim, F., Morris, C.P., Occhiodoro, F., & Wallace, J.C.
Evans, D.R. (1990) J. Biol. Chem. 265, 10395–10402. (1988) J. Biol. Chem. 263, 11493–11497.
175. Zimmermann, B.H., & Evans, D.R. (1993) Biochemistry 203. Chang, C., Kokontis, J., & Liao, S. (1988) Proc. Natl.
32, 1519–1527. Acad. Sci. U.S.A. 85, 7211–7215.
References 403
204. Ringheim, G.E., & Taylor, S.S. (1990) J. Biol. Chem. 265, 231. Lin, X.L., Lin, Y.Z., Koelsch, G., Gustchina, A.,
4800–4808. Wlodawer, A., & Tang, J. (1992) J. Biol. Chem. 267,
205. Coves, J., Zeghouf, M., Macherel, D., Guigliarelli, B., 17257–17263.
Asso, M., & Fontecave, M. (1997) Biochemistry 36, 232. McLachlan, A.D., & Walker, J.E. (1977) J. Mol. Biol. 112,
5921–5928. 543–558.
206. Stallings, W.C., Abdel-Meguid, S.S., Lim, L.W., Shieh, 233. He, X.M., & Carter, D.C. (1992) Nature 358, 209–215.
H.S., Dayringer, H.E., Leimgruber, N.K., Stegeman, 234. Fong, S.L., & Bridges, C.D. (1988) J. Biol. Chem. 263,
R.A., Anderson, K.S., Sikorski, J.A., Padgette, S.R., & 15330–15334.
Kishore, G.M. (1991) Proc. Natl. Acad. Sci. U.S.A. 88, 235. Pope, B., Maciver, S., & Weeds, A. (1995) Biochemistry
5046–5050. 34, 1583–1588.
207. Stauffer, M.E., Young, J.K., & Evans, J.N. (2001) 236. Lee, F.S., Fox, E.A., Zhou, H.M., Strydom, D.J., & Vallee,
Biochemistry 40, 3951–3957. B.L. (1988) Biochemistry 27, 8545–8553.
208. Yee, V.C., Pedersen, L.C., Le Trong, I., Bishop, P.D., 237. Miller, K.I., Cuff, M.E., Lang, W.F., Varga-Weisz, P.,
Stenkamp, R.E., & Teller, D.C. (1994) Proc. Natl. Acad. Field, K.G., & van Holde, K.E. (1998) J. Mol. Biol. 278,
Sci. U.S.A. 91, 7296–7300. 827–842.
209. Waldrop, G.L., Rayment, I., & Holden, H.M. (1994) 238. Bhandari, V., Palfree, R.G., & Bateman, A. (1992) Proc.
Biochemistry 33, 10249–10256. Natl. Acad. Sci. U.S.A. 89, 1715–1719.
210. Levine, M., Muirhead, H., Stammers, D.K., & Stuart, 239. Manning, A.M., Trotman, C.N., & Tate, W.P. (1990)
D.I. (1978) Nature 271, 626–630. Nature 348, 653–656.
211. Richardson, J.S. (1981) Adv. Protein Chem. 34, 167– 240. Niu, X.D., Browning, K.S., Behal, R.H., & Reed, L.J.
339. (1988) Proc. Natl. Acad. Sci. U.S.A. 85, 7546–7550.
212. Ito, N., Phillips, S.E.V., Yadav, K.D.S., & Knowles, P.F. 241. Terry, A.S., Poulter, L., Williams, D.H., Nutkins, J.C.,
(1994) J. Mol. Biol. 238, 794–814. Giovannini, M.G., Moore, C.H., & Gibson, B.W. (1988)
213. Wierenga, R.K., Drenth, J., & Schulz, G.E. (1983) J. Mol. J. Biol. Chem. 263, 5745–5751.
Biol. 167, 725–739. 242. Pepinsky, R.B., Tizard, R., Mattaliano, R.J., Sinclair,
214. Hecht, H.J., Kalisz, H.M., Hendle, J., Schmid, R.D., & L.K., Miller, G.T., Browning, J.L., Chow, E.P., Burne, C.,
Schomburg, D. (1993) J. Mol. Biol. 229, 153–172. Huang, K.S., Pratt, D., et al. (1988) J. Biol. Chem. 263,
215. Olsen, L.R., & Roderick, S.L. (2001) Biochemistry 40, 10799–10811.
1913–1921. 243. Han, S., Eltis, L.D., Timmis, K.N., Muchmore, S.W., &
216. Marcotte, E.M., Pellegrini, M., Yeates, T.O., & Bolin, J.T. (1995) Science 270, 976–980.
Eisenberg, D. (1999) J. Mol. Biol. 293, 151–160. 244. Labeit, S., & Kolmerer, B. (1995) J. Mol. Biol. 248,
217. Lang, D., Thoma, R., Henn-Sax, M., Sterner, R., & 308–315.
Wilmanns, M. (2000) Science 289, 1546–1550. 245. Speicher, D.W., & Marchesi, V.T. (1984) Nature 311,
218. Hocker, B., Beismann-Driemeyer, S., Hettwer, S., 177–180.
Lustig, A., & Sterner, R. (2001) Nat. Struct. Biol. 8, 32–36. 246. Dubreuil, R.R., Byers, T.J., Sillman, A.L., Bar-Zvi, D.,
219. McLachlan, A.D. (1979) J. Mol. Biol. 128, 49–79. Goldstein, L.S., & Branton, D. (1989) J. Cell Biol. 109,
220. Ploegman, J.H., Drent, G., Kalk, K.H., & Hol, W.G. 2197–2205.
(1978) J. Mol. Biol. 123, 557–594. 247. Winograd, E., Hume, D., & Branton, D. (1991) Proc.
221. Nyunoya, H., & Lusty, C.J. (1983) Proc. Natl. Acad. Sci. Natl. Acad. Sci. U.S.A. 88, 10788–10791.
U.S.A. 80, 4629–4633. 248. Sahr, K.E., Laurila, P., Kotula, L., Scarpa, A.L., Coupal,
222. Roderick, S.L., & Matthews, B.W. (1993) Biochemistry E., Leto, T.L., Linnenbach, A.J., Winkelmann, J.C.,
32, 3907–3912. Speicher, D.W., Marchesi, V.T., Curtis, P.J., & Forget,
223. Cirilli, M., Zheng, R., Scapin, G., & Blanchard, J.S. (1998) B.G. (1990) J. Biol. Chem. 265, 4434–4443.
Biochemistry 37, 16452–16458. 249. Grum, V.L., Li, D., MacDonald, R.I., & Mondragon, A.
224. Campbell, R.E., Mosimann, S.C., van De Rijn, I., (1999) Cell 98, 523–535.
Tanner, M.E., & Strynadka, N.C. (2000) Biochemistry 250. Pascual, J., Pfuhl, M., Walther, D., Saraste, M., & Nilges,
39, 7012–7023. M. (1997) J. Mol. Biol. 273, 740–751.
225. Wilmanns, M., Priestle, J.P., Niermann, T., & Jansonius, 251. Yan, Y., Winograd, E., Viel, A., Cronin, T., Harrison, S.C.,
J.N. (1992) J. Mol. Biol. 223, 477–507. & Branton, D. (1993) Science 262, 2027–2030.
226. Schwab, D.A., & Wilson, J.E. (1989) Proc. Natl. Acad. Sci. 252. Speicher, D.W., Morrow, J.S., Knowles, W.J., &
U.S.A. 86, 2563–2567. Marchesi, V.T. (1982) J. Biol. Chem. 257, 9093–9101.
227. Kurokawa, H., Mikami, B., & Hirose, M. (1995) J. Mol. 253. Renault, L., Nassar, N., Vetter, I., Becker, J., Klebe, C.,
Biol. 254, 196–207. Roth, M., & Wittinghofer, A. (1998) Nature 392, 97–101.
228. Lindley, P.F., Bajaj, M., Evans, R.W., Garratt, R.C., 254. Oubrie, A., Rozeboom, H.J., Kalk, K.H., Duine, J.A., &
Hasnain, S.S., Jhoti, H., Kuser, P., Neu, M., Patel, K., et Dijkstra, B.W. (1999) J. Mol. Biol. 289, 319–333.
al. (1993) Acta Crystallogr., Sect. D: Biol. Crystallogr. 255. Sondek, J., Bohm, A., Lambright, D.G., Hamm, H.E., &
D49, 292–304. Sigler, P.B. (1996) Nature 379, 369–374.
229. Jaskolski, M., Miller, M., Rao, J.K., Leis, J., & Wlodawer, 256. Habazettl, J., Gondol, D., Wiltscheck, R., Otlewski, J.,
A. (1990) Biochemistry 29, 5889–5898. Schleicher, M., & Holak, T.A. (1992) Nature 359,
230. Newman, M., Watson, F., Roychowdhury, P., Jones, H., 855–858.
Badasso, M., Cleasby, A., Wood, S.P., Tickle, I.J., & 257. Onesti, S., Brick, P., & Blow, D.M. (1991) J. Mol. Biol.
Blundell, T.L. (1993) J. Mol. Biol. 230, 260–283. 217, 153–176.
404 Evolution
258. Saper, M.A., Bjorkman, P.J., & Wiley, D.C. (1991) J. Mol. 286. Stetefeld, J., Mayer, U., Timpl, R., & Huber, R. (1996) J.
Biol. 219, 277–319. Mol. Biol. 257, 644–657.
259. Wright, C.S. (1992) J. Biol. Chem. 267, 14345–14352. 287. Tavares, G.A., Beguin, P., & Alzari, P.M. (1997) J. Mol.
260. Prince, J.T., McGrath, K.P., DiGirolamo, C.M., & Biol. 273, 701–713.
Kaplan, D.L. (1995) Biochemistry 34, 10879–10885. 288. Spinelli, S., Fierobe, H.P., Belaich, A., Belaich, J.P.,
261. Liou, Y.C., Thibault, P., Walker, V.K., Davies, P.L., & Henrissat, B., & Cambillau, C. (2000) J. Mol. Biol. 304,
Graham, L.A. (1999) Biochemistry 38, 11415–11424. 189–200.
262. Hasson, M.S., Muscate, A., McLeish, M.J., Polovnikova, 289. Foord, R., Taylor, I.A., Sedgwick, S.G., & Smerdon, S.J.
L.S., Gerlt, J.A., Kenyon, G.L., Petsko, G.A., & Ringe, D. (1999) Nat. Struct. Biol. 6, 157–165.
(1998) Biochemistry 37, 9918–9930. 290. Michaely, P., & Bennett, V. (1993) J. Biol. Chem. 268,
263. Arjunan, P., Umland, T., Dyda, F., Swaminathan, S., 22703–22709.
Furey, W., Sax, M., Farrenkopf, B., Gao, Y., Zhang, D., & 291. Sutton, R.B., Davletov, B.A., Berghuis, A.M., Sudhof,
Jordan, F. (1996) J. Mol. Biol. 256, 590–600. T.C., & Sprang, S.R. (1995) Cell 80, 929–938.
264. Muller, Y.A., Schumacher, G., Rudolph, R., & Schulz, 292. Mikol, V., Baumann, G., Keller, T.H., Manning, U., &
G.E. (1994) J. Mol. Biol. 237, 315–335. Zurini, M.G. (1995) J. Mol. Biol. 246, 344–355.
265. Lobel, P., Dahms, N.M., & Kornfeld, S. (1988) J. Biol. 293. Tong, L., Warren, T.C., King, J., Betageri, R., Rose, J., &
Chem. 263, 2563–2570. Jakes, S. (1996) J. Mol. Biol. 256, 601–610.
266. Wang, K., McClure, J., & Tu, A. (1979) Proc. Natl. Acad. 294. Waksman, G., Kominos, D., Robertson, S.C., Pant, N.,
Sci. U.S.A. 76, 3698–3702. Baltimore, D., Birge, R.B., Cowburn, D., Hanafusa, H.,
267. Pan, K.-M., Damodaran, S., & Greaser, M.L. (1994) Mayer, B.J., Overduin, M., et al. (1992) Nature 358,
Biochemistry 33, 8255–8261. 646–653.
268. Labeit, S., & Kolmerer, B. (1995) Science 270, 293–296. 295. Hatada, M.H., Lu, X., Laird, E.R., Green, J.,
269. Linke, W.A., Ivemeyer, M., Olivieri, N., Kolmerer, B., Morgenstern, J.P., Lou, M., Marr, C.S., Phillips, T.B.,
Ruegg, J.C., & Labeit, S. (1996) J. Mol. Biol. 261, 62–71. Ram, M.K., Theriault, K., Zoller, M.J., & Karas, J.L.
270. Politou, A.S., Gautel, M., Pfuhl, M., Labeit, S., & Pastore, (1995) Nature 377, 32–38.
A. (1994) Biochemistry 33, 4730–4737. 296. Musacchio, A., Saraste, M., & Wilmanns, M. (1994) Nat.
271. Doolittle, R.F. (1985) Trends Biochem. Sci. (Pers. Ed.) 10, Struct. Biol. 1, 546–551.
233–237. 297. Maignan, S., Guilloteau, J.P., Fromage, N., Arnoux, B.,
272. Doolittle, R.F. (1989) in Prediction of Protein Structure Becquart, J., & Ducruix, A. (1995) Science 268, 291–
and the Principles of Protein Conformation (Fasman, G. 293.
D., Ed.) pp 599–623, Plenum, New York. 298. Stec, B., Yamano, A., Whitlow, M., & Teeter, M.M.
273. Kretsinger, R.H., & Nockolds, C.E. (1973) J. Biol. Chem. (1997) Acta Crystallogr., Sect. D, Pt. 2 53, 169–178.
248, 3313–3326. 299. Pennica, D., Holmes, W.E., Kohr, W.J., Harkins, R.N.,
274. Essen, L.O., Perisic, O., Cheung, R., Katan, M., & Vehar, G.A., Ward, C.A., Bennett, W.F., Yelverton, E.,
Williams, R.L. (1996) Nature 380, 595–602. Seeburg, P.H., Heyneker, H.L., Goeddel, D.V., & Collen,
275. Wang, J.H., Yan, Y.W., Garrett, T.P., Liu, J.H., Rodgers, D. (1983) Nature 301, 214–221.
D.W., Garlick, R.L., Tarr, G.E., Husain, Y., Reinherz, 300. Claeys, H., Sottrup-Jensen, L., Zajdel, M., Petersen,
E.L., & Harrison, S.C. (1990) Nature 348, 411–418. T.E., & Magnusson, S. (1976) FEBS Lett. 61, 20–24.
276. Su, X.D., Gastinel, L.N., Vaughn, D.E., Faye, I., Poon, P., 301. Bottomley, M.J., Collard, M.W., Huggenvik, J.I., Liu, Z.,
& Bjorkman, P.J. (1998) Science 281, 991–995. Gibson, T.J., & Sattler, M. (2001) Nat. Struct. Biol. 8,
277. Holden, H.M., Ito, M., Hartshorne, D.J., & Rayment, I. 626–633.
(1992) J. Mol. Biol. 227, 840–851. 302. Timm, D., Salim, K., Gout, I., Guruprasad, L.,
278. Jones, E.Y., Harlos, K., Bottomley, M.J., Robinson, R.C., Waterfield, M., & Blundell, T. (1994) Nat. Struct. Biol. 1,
Driscoll, P.C., Edwards, R.M., Clements, J.M., 782–788.
Dudgeon, T.J., & Stuart, D.I. (1995) Nature 373, 303. Gibson, T.J., Hyvonen, M., Musacchio, A., Saraste, M.,
539–544. & Birney, E. (1994) Trends Biochem. Sci. 19, 349–353.
279. Schneider, R., Schneider-Scherzer, E., Thurnher, M., 304. Petersen, T.E., Thogersen, H.C., Skorstengaard, K.,
Auer, B., & Schweiger, M. (1988) EMBO J. 7, 4151–4156. Vibe-Pedersen, K., Sahl, P., Sottrup-Jensen, L., &
280. Kobe, B., & Deisenhofer, J. (1993) Nature 366, 751–756. Magnusson, S. (1983) Proc. Natl. Acad. Sci. U.S.A. 80,
281. Crowder, S.M., Kanaar, R., Rio, D.C., & Alber, T. (1999) 137–141.
Proc. Natl. Acad. Sci. U.S.A. 96, 4892–4897. 305. Kornblihtt, A.R., Umezawa, K., Vibe-Pedersen, K., &
282. Graves, B.J., Crowther, R.L., Chandran, C., Rumberger, Baralle, F.E. (1985) EMBO J. 4, 1755–1759.
J.M., Li, S., Huang, K.S., Presky, D.H., Familletti, P.C., 306. Baron, M., Norman, D.G., & Campbell, I.D. (1991)
Wolitzky, B.A., & Burns, D.K. (1994) Nature 367, Trends Biochem. Sci. 16, 13–17.
532–538. 307. Pickford, A.R., Smith, S.P., Staunton, D., Boyd, J., &
283. Sasaki, T., Kostka, G., Gohring, W., Wiedemann, H., Campbell, I.D. (2001) EMBO J. 20, 1519–1529.
Mann, K., Chu, M.L., & Timpl, R. (1995) J. Mol. Biol. 245, 308. Gorlich, D., Prehn, S., Laskey, R.A., & Hartmann, E.
241–250. (1994) Cell 79, 767–778.
284. Dahlback, B., Hildebrand, B., & Linse, S. (1990) J. Biol. 309. Conti, E., Uy, M., Leighton, L., Blobel, G., & Kuriyan, J.
Chem. 265, 18481–18489. (1998) Cell 94, 193–204.
285. Huang, L.H., Cheng, H., Pardi, A., Tam, J.P., & Sweeney, 310. Muller, Y.A., Ultsch, M.H., & de Vos, A.M. (1996) J. Mol.
W.V. (1991) Biochemistry 30, 7402–7409. Biol. 256, 144–159.
References 405
311. Streuli, M., Krueger, N.X., Tsai, A.Y., & Saito, H. (1989) Vagin, A.A., Grebenko, A.I., Borisov, V.V., Bartels, K.S.,
Proc. Natl. Acad. Sci. U.S.A. 86, 8698–8702. Fita, I., & Rossmann, M.G. (1986) J. Mol. Biol. 188,
312. Dickinson, C.D., Veerapandian, B., Dai, X.P., Hamlin, 49–61.
R.C., Xuong, N.H., Ruoslahti, E., & Ely, K.R. (1994) J. 339. Murthy, M.R., Reid, T.J.d., Sicignano, A., Tanaka, N., &
Mol. Biol. 236, 1079–1092. Rossmann, M.G. (1981) J. Mol. Biol. 152, 465–499.
313. Tsujishita, Y., & Hurley, J.H. (2000) Nat. Struct. Biol. 7, 340. Faber, H.R., & Matthews, B.W. (1990) Nature 348,
408–414. 263–266.
314. Gomis-Ruth, F.X., Gohlke, U., Betz, M., Knauper, V., 341. Stehle, T., Ahmed, S.A., Claiborne, A., & Schulz, G.E.
Murphy, G., Lopez-Otin, C., & Bode, W. (1996) J. Mol. (1991) J. Mol. Biol. 221, 1325–1344.
Biol. 264, 556–566. 342. Ziegler, G.A., Vonrhein, C., Hanukoglu, I., & Schulz,
315. Faber, H.R., Groom, C.R., Baker, H.M., Morgan, W.T., G.E. (1999) J. Mol. Biol. 289, 981–990.
Smith, A., & Baker, E.N. (1995) Structure 3, 551– 343. Doolittle, R.F., Goldbaum, D.M., & Doolittle, L.R.
559. (1978) J. Mol. Biol. 120, 311–325.
316. Jenne, D., & Stanley, K.K. (1987) Biochemistry 26, 344. Williams, R.C. (1981) J. Mol. Biol. 150, 399–408.
6735–6742. 345. Yang, Z., Kollman, J.M., Pandi, L., & Doolittle, R.F.
317. Pawson, T. (1995) Nature 373, 573–580. (2001) Biochemistry 40, 12515–12523.
318. Engel, J., Efimov, V.P., & Maurer, P. (1994) 346. Donovan, J.W., & Mihalyi, E. (1974) Proc. Natl. Acad.
Development, Suppl. (Evol. Dev. Mech.), 35–42. Sci. U.S.A. 71, 4125–4128.
319. Evdokimov, A.G., Anderson, D.E., Routzahn, K.M., & 347. Novokhatny, V.V., Kudinov, S.A., & Privalov, P.L. (1984)
Waugh, D.S. (2001) J. Mol. Biol. 312, 807–821. J. Mol. Biol. 179, 215–232.
320. Wu, H., Maciejewski, M.W., Marintchev, A., Benashski, 348. Rudolph, R., Siebendritt, R., Nesslauer, G., Sharma,
S.E., Mullen, G.P., & King, S.M. (2000) Nat. Struct. Biol. A.K., & Jaenicke, R. (1990) Proc. Natl. Acad. Sci. U.S.A.
7, 575–579. 87, 4625–4629.
321. Liou, Y.-C., Tocilj, A., Davies, P.L., & Jia, Z. (2000) 349. Betton, J.M., Desmadril, M., Mitraki, A., & Yon, J.M.
Nature 406, 322–325. (1984) Biochemistry 23, 6654–6661.
322. White, C.E., Hunter, M.J., Meininger, D.P., Garrod, S., 350. Tauler, A., Rosenberg, A.H., Colosia, A., Studier, F.W.,
& Komives, E.A. (1996) Proc. Natl. Acad. Sci. U.S.A. 93, & Pilkis, S.J. (1988) Proc. Natl. Acad. Sci. U.S.A. 85,
10177–10182. 6642–6646.
323. Kumar, A., Roach, C., Hirsh, I.S., Turley, S., deWalque, 351. Herold, M., Leistler, B., Hage, A., Luger, K., & Kirschner,
S., Michels, P.A., & Hol, W.G. (2001) J. Mol. Biol. 307, K. (1991) Biochemistry 30, 3612–3620.
271–282. 352. Funk, W.D., MacGillivray, R.T., Mason, A.B., Brown,
324. Essen, L.O., Perisic, O., Katan, M., Wu, Y., Roberts, M.F., S.A., & Woodworth, R.C. (1990) Biochemistry 29,
& Williams, R.L. (1997) Biochemistry 36, 1704–1718. 1654–1660.
325. Ferguson, K.M., Lemmon, M.A., Schlessinger, J., & 353. Maduke, M., Williams, C., & Miller, C. (1998)
Sigler, P.B. (1995) Cell 83, 1037–1046. Biochemistry 37, 1315–1321.
326. Kretsinger, R.H., & Nakayama, S. (1993) J. Mol. Evol. 36, 354. Kay, L.E., Forman-Kay, J.D., McCubbin, W.D., & Kay,
477–488. C.M. (1991) Biochemistry 30, 4323–4333.
327. Kobe, B., & Deisenhofer, J. (1994) Trends Biochem. Sci. 355. Robien, M.A., Clore, G.M., Omichinski, J.G., Perham,
19, 415–421. R.N., Appella, E., Sakaguchi, K., & Gronenborn, A.M.
328. Tornero, P., Mayda, E., Gomez, M.D., Canas, L., (1992) Biochemistry 31, 3463–3471.
Conejero, V., & Vera, P. (1996) Plant J. 10, 315–330. 356. Spraggon, G., Applegate, D., Everse, S.J., Zhang, J.Z.,
329. Logsdon, J.M., Jr., Tyshenko, M.G., Dixon, C., D-Jafari, Veerapandian, L., Redman, C., Doolittle, R.F., &
J., Walker, V.K., & Palmer, J.D. (1995) Proc. Natl. Acad. Grieninger, G. (1998) Proc. Natl. Acad. Sci. U.S.A. 95,
Sci. U.S.A. 92, 8507–8511. 9099–9104.
330. de Souza, S.J., Long, M., Klein, R.J., Roy, S., Lin, S., & 357. Missiakas, D., Betton, J.M., Minard, P., & Yon, J.M.
Gilbert, W. (1998) Proc. Natl. Acad. Sci. U.S.A. 95, (1990) Biochemistry 29, 8683–8689.
5094–5099. 358. Vita, C., Fontana, A., & Jaenicke, R. (1989) Eur. J.
331. Stoltzfus, A., Spencer, D.F., Zuker, M., Logsdon, J.M., Biochem. 183, 513–518.
Jr., & Doolittle, W.F. (1994) Science 265, 202–207. 359. Vita, C., Fontana, A., & Chaiken, I.M. (1985) Eur. J.
332. Rzhetsky, A., Ayala, F.J., Hsu, L.C., Chang, C., & Biochem. 151, 191–196.
Yoshida, A. (1997) Proc. Natl. Acad. Sci. U.S.A. 94, 360. Trexler, M., & Patthy, L. (1983) Proc. Natl. Acad. Sci.
6820–6825. U.S.A. 80, 2457–2461.
333. Wetlaufer, D.B. (1973) Proc. Natl. Acad. Sci. U.S.A. 70, 361. Wilhelm, O.G., Jaskunas, S.R., Vlahos, C.J., & Bang, N.U.
697–701. (1990) J. Biol. Chem. 265, 14606–14611.
334. Rose, G.D. (1979) J. Mol. Biol. 134, 447–470. 362. Moreau, M., de Cock, E., Fortier, P.L., Garcia, C.,
335. Ollis, D.L., Brick, P., Hamlin, R., Xuong, N.G., & Steitz, Albaret, C., Blanquet, S., Lallemand, J.Y., & Dardel, F.
T.A. (1985) Nature 313, 762–766. (1997) J. Mol. Biol. 266, 15–22.
336. Mande, S.S., Sarfaty, S., Allen, M.D., Perham, R.N., & 363. Daubner, S.C., Schrimsher, J.L., Schendel, F.J., Young,
Hol, W.G. (1996) Structure 4, 277–286. M., Henikoff, S., Patterson, D., Stubbe, J., & Benkovic,
337. Wilson, I.A., Skehel, J.J., & Wiley, D.C. (1981) Nature S.J. (1985) Biochemistry 24, 7059–7062.
289, 366–373. 364. Worrall, D.M., & Tubbs, P.K. (1983) Biochem. J. 215,
338. Vainshtein, B.K., Melik-Adamyan, W.R., Barynin, V.V., 153–157.
406 Evolution
365. Weis, W.I., Brunger, A.T., Skehel, J.J., & Wiley, D.C. 387. De Bondt, H.L., Rosenblatt, J., Jancarik, J., Jones, H.D.,
(1990) J. Mol. Biol. 212, 737–761. Morgan, D.O., & Kim, S.H. (1993) Nature 363, 595–602.
366. Banner, D.W., Bloomer, A., Petsko, G.A., Phillips, D.C., 388. Zhang, F., Strand, A., Robbins, D., Cobb, M.H., &
& Wilson, I.A. (1976) Biochem. Biophys. Res. Commun. Goldsmith, E.J. (1994) Nature 367, 704–711.
72, 146–155. 389. Kack, H., Sandmark, J., Gibson, K., Schneider, G., &
367. Beese, L.S., Derbyshire, V., & Steitz, T.A. (1993) Science Lindqvist, Y. (1999) J. Mol. Biol. 291, 857–876.
260, 352–355. 390. Thompson, T.B., Garrett, J.B., Taylor, E.A.,
368. Zhao, B., Janson, C.A., Amegadzie, B.Y., D’Alessio, K., Meganathan, R., Gerlt, J.A., & Rayment, I. (2000)
Griffin, C., Hanning, C.R., Jones, C., Kurdyla, J., Biochemistry 39, 10662–10676.
McQueney, M., Qiu, X., Smith, W.W., & Abdel-Meguid, 391. Thoden, J.B., Firestine, S., Nixon, A., Benkovic, S.J., &
S.S. (1997) Nat. Struct. Biol. 4, 109–111. Holden, H.M. (2000) Biochemistry 39, 8791–8802.
369. Huang, S., Xue, Y., Sauer-Eriksson, E., Chirica, L., 392. Steinbacher, S., Baxa, U., Miller, S., Weintraub, A.,
Lindskog, S., & Jonsson, B.H. (1998) J. Mol. Biol. 283, Seckler, R., & Huber, R. (1996) Proc. Natl. Acad. Sci.
301–310. U.S.A. 93, 10584–10588.
370. Zhang, C., & DeLisi, C. (1998) J. Mol. Biol. 284, 393. Xia, Z., Dai, W., Zhang, Y., White, S.A., Boyd, G.D., &
1301–1305. Mathews, F.S. (1996) J. Mol. Biol. 259, 480–501.
371. Murzin, A.G., Brenner, S.E., Hubbard, T., & Chothia, C. 394. Crennell, S.J., Garman, E.F., Philippon, C., Vasella, A.,
(1995) J. Mol. Biol. 247, 536–540. Laver, W.G., Vimr, E.R., & Taylor, G.L. (1996) J. Mol.
372. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Biol. 259, 264–280.
Swindells, M.B., & Thornton, J.M. (1997) Structure 5, 395. Barbosa, J.A., Smith, B.J., DeGori, R., Ooi, H.C.,
1093–1108. Marcuccio, S.M., Campi, E.M., Jackson, W.R.,
373. Holm, L., & Sander, C. (1996) Science 273, 595–603. Brossmer, R., Sommer, M., & Lawrence, M.C. (2000) J.
374. Kull, F.J., Sablin, E.P., Lau, R., Fletterick, R.J., & Vale, Mol. Biol. 303, 405–421.
R.D. (1996) Nature 380, 550–555. 396. Khurana, S., Powers, D.B., Anderson, S., & Blaber, M.
375. Pakhomova, S., Kobayashi, M., Buck, J., & Newcomer, (1998) Proc. Natl. Acad. Sci. U.S.A. 95, 6768–6773.
M.E. (2001) Nat. Struct. Biol. 8, 447–451. 397. Uhlin, U., & Eklund, H. (1994) Nature 370, 533–539.
376. Morais, M.C., Zhang, W., Baker, A.S., Zhang, G., 398. Stec, B., & Lebioda, L. (1990) J. Mol. Biol. 211, 235–
Dunaway-Mariano, D., & Allen, K.N. (2000) 248.
Biochemistry 39, 10385–10396. 399. Chelvanayagam, G., Heringa, J., & Argos, P. (1992) J.
377. Nakatsu, T., Kato, H., & Oda, J. (1998) Nat. Struct. Biol. Mol. Biol. 228, 220–242.
5, 15–19. 400. Romao, M.J., Kolln, I., Dias, J.M., Carvalho, A.L.,
378. Jones, E.Y., Stuart, D.I., & Walker, N.P. (1989) Nature Romero, A., Varela, P.F., Sanz, L., Topfer-Petersen, E.,
338, 225–228. & Calvete, J.J. (1997) J. Mol. Biol. 274, 650–660.
379. Spurlino, J.C., Lu, G.Y., & Quiocho, F.A. (1991) J. Biol. 401. Varela, P.F., Romero, A., Sanz, L., Romao, M.J., Topfer-
Chem. 266, 5202–5219. Petersen, E., & Calvete, J.J. (1997) J. Mol. Biol. 274,
380. Lamzin, V.S., Aleshin, A.E., Strokopytov, B.V., 635–649.
Yukhnevich, M.G., Popov, V.O., Harutyunyan, E.H., & 402. Deisenhofer, J. (1981) Biochemistry 20, 2361–2370.
Wilson, K.S. (1992) Eur. J. Biochem. 206, 441–452. 403. Juy, M., Amit, A.G., Alzari, P.M., Poljak, R.J., Claeyssens,
381. Zhang, C., & Kim, S.H. (2000) J. Mol. Biol. 299, M., Beguin, P., & Aubert, J.P. (1992) Nature 357, 89–91.
1075–1089. 404. Schreuder, H., Tardif, C., Trump-Kallmeyer, S.,
382. Aleshin, A., Golubev, A., Firsov, L.M., & Honzatko, R.B. Soffientini, A., Sarubbi, E., Akeson, A., Bowlin, T.,
(1992) J. Biol. Chem. 267, 19291–19298. Yanofsky, S., & Barrett, R.W. (1997) Nature 386,
383. Cordes, M.H., Burton, R.E., Walsh, N.P., McKnight, C.J., 194–200.
& Sauer, R.T. (2000) Nat. Struct. Biol. 7, 1129–1132. 405. Vigers, G.P., Anderson, L.J., Caffes, P., & Brandhuber,
384. Ollis, D.L., Cheah, E., Cygler, M., Dijkstra, B., Frolow, B.J. (1997) Nature 386, 190–194.
F., Franken, S.M., Harel, M., Remington, S.J., Silman, I., 406. Keitel, T., Simon, O., Borriss, R., & Heinemann, U.
Schrag, J., et al. (1992) Protein Eng. 5, 197–211. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 5287–5291.
385. Narayana, S.V., Carson, M., el-Kabbani, O., Kilpatrick, 407. Vaughn, D.E., Rodriguez, J., Lazebnik, Y., & Joshua-Tor,
J.M., Moore, D., Chen, X., Bugg, C.E., Volanakis, J.E., & L. (1999) J. Mol. Biol. 293, 439–447.
DeLucas, L.J. (1994) J. Mol. Biol. 235, 695–708. 408. Hill, C.P., Osslund, T.D., & Eisenberg, D. (1993) Proc.
386. Goldberg, J.D., Yoshida, T., & Brick, P. (1994) J. Mol. Natl. Acad. Sci. U.S.A. 90, 5167–5171.
Biol. 236, 1123–1140. 409. Kraulis, P.J. (1991) J. Applied Crystallogr. 24, 946–950.
Chapter 8
Counting Polypeptides
Almost all of the proteins found in a living organism are protein are separated and catalogued analytically by
multimeric proteins. A multimeric protein is a protein electrophoresis on polyacrylamide gels cast in solutions
containing more than one folded polypeptide. Each of of the detergent sodium dodecyl sulfate. The separation
these folded polypeptides was originally synthesized by a that is effected by these polyacrylamide gels relies on
ribosome from messenger RNA that encoded a sequence their ability to sieve the unfolded polypeptides. The con-
of amino acids of a precise, finite length. These polypep- stituent polypeptides of a protein are separated prepara-
tides folded into defined conformations and were post- tively by chromatography that depends either on sieving
translationally modified. Each of the folded, posttrans- or on ion exchange of the unfolded, dissociated poly-
lationally modified polypeptides in a multimeric protein mers. The separated polypeptides are shown to be
is one of its subunits. Usually, only a specific and well- homogeneous and unique by peptide mapping.
defined number of these subunits are gathered together The major weaknesses of the original approach to
to form the macromolecular complex that is the finished, defining the stoichiometry of the subunits of a protein
existing molecule of the multimeric protein. An were the extreme care with which the initial measure-
oligomeric protein is a multimeric protein with a fixed, ments of the molar mass of the intact protein had to be
invariant number of subunits. In a few instances, such as performed and the unreliability of the assessments of the
the proteins actin, keratin, and collagen, a large and mass ratios among the constituent polypeptides and of
undefined number of subunits combine to form a poly- their molar masses. The present approach to defining the
meric protein, which in theory could continue to poly- stoichiometry of the subunits of a protein avoids these
merize indefinitely. A polymeric protein is a protein with problems. The individual polypeptides composing a pro-
many subunits, the number of which varies from mole- tein are still separated and catalogued by electrophoresis
cule to molecule of that protein. Polymeric proteins are and shown to be unique and homogeneous by peptide
the exception; most proteins are oligomeric. mapping. The length of each of the constituent polypep-
The stoichiometry of the subunits of a protein is tides is assessed either by the electrophoresis itself or,
the number of each type of folded, posttranslationally preferably, by sequencing the appropriate cDNA. Any
modified polypeptide that are combined to produce the glycosylation is quantified analytically. The number of
specific structure. At this level of definition, each of the each polypeptide composing the intact protein is deter-
polypeptides is identified only by its length. The length mined by covalently cross-linking the protein to various
of a polypeptide is the number of amino acids it con- degrees of completion and identifying the various inter-
tains, naa, an integer that is either dimensionless or has mediate covalent complexes and the limit complex.
the units of amino acids (molecule of polypeptide)–1 or The different subunits in an oligomeric protein are
moles of amino acids (mole of polypeptide)–1. The length defined by their lengths and distinguished by assigning
of a polypeptide is usually a precisely known quantity them consecutive letters of the Greek alphabet. For
because its amino acid sequence is usually available and example, deoxyhemoglobin at its normal concentrations
any posttranslational modifications have usually been is constructed from two subunits, a and b, each present
defined. in two copies to produce the complex (ab)2. Nicotinic
A great deal of effort has been expended in discov- acetylcholine receptor has the composition a2bgd;
ering the stoichiometries of the subunits of proteins. The L-lactate dehydrogenase, (a2)2; DNA-directed RNA poly-
original approach to this information was to determine merase from Escherichia coli, a2bgd; and 2-dehydro-
the molar mass of the intact protein, to separate the indi- 3-deoxy-phosphogluconate aldolase, a3. The grouping of
vidual polypeptides composing the protein, to quantify subunits into subsets, for example, the groups of two
the mass ratio among the various polypeptides, and to subunits in L-lactate dehydrogenase, arises from the
determine the molar mass of each of the separated symmetries in which the subunits are arranged within
polypeptides. The measurement of the molar masses of the intact molecule of an oligomeric protein.
intact proteins was at one time a major area of biophysi- Some oligomeric proteins, when they are dissolved
cal research, but this pursuit presently attracts much less at certain concentrations, are mixtures of two different
attention. combinations of subunits in equilibrium with each
The individual subunits composing an intact native other. For example, oxygenated hemoglobin is an equi-
408 Counting Polypeptides
librium mixture of ab dimers and (ab)2 tetramers. Most tion equilibrium, and light scattering because the units
oligomeric proteins, however, have a particular compo- on the final quantity are usually grams mole–1.
sition of subunits that does not vary unless harsh condi- Because both sedimentation equilibrium and light
tions are applied. A solution of a pure oligomeric protein scattering are alternative measurements of osmotic pres-
will usually be monodisperse. A monodisperse solution sure, all three techniques determine the same funda-
of a particular macromolecule is a solution in which mental physical property of a protein in a solution,
every one of those macromolecules is of the same size namely, its chemical potential. Osmotic pressure is a col-
and shape, and each is compact and unique and remains ligative property of the solution. A colligative property of
dissolved and unassociated with its neighbors. a solute is a physical property that is a function only of
When the map of electron density from a crystal of the moles of independent particles of that solute in a
an oligomeric protein is examined, the complete mole- standard volume of the solution. If the solution were
cule can be discerned. It is recognized as a large, inde- monodisperse, if the only osmotically active particles
pendent feature in the map that is formed from several present were individual molecules of the protein, and if
folded tubes of electron density. Although not always the concentration of the protein in grams centimeter–3
essential, it is reassuring to know before the map is exam- were known precisely, the measured molar concentra-
ined how many subunits are combined to produce the tion could be used to calculate the molar mass of the pro-
protein that has been crystallized. This determination tein.
can be made by a combination of sequencing, molecular Osmotic pressure is the pressure exerted by imper-
sieving, and cross-linking. It is now so routine to do this meant solutes when a solution containing those imper-
that few oligomeric proteins the subunit stoichiometry of meant solutes is separated by a semipermeable
which have not already been established are examined membrane from another solution identical in every way
crystallographically. to the first except that it lacks the impermeant solutes. A
solute is impermeant to a particular barrier if it cannot
pass through that barrier. A semipermeable membrane
Molar Mass1,2 is a membrane through which all of the components of
the two solutions can pass freely except the impermeant
The only indubitably reliable method for determining solutes. The chemical and physical properties of the
the length and amino acid composition of a polypeptide, membrane define which solutes are permeant and which
and consequently, its precise molar mass, is to sequence are impermeant, which are osmotically silent and which
it correctly. At the moment, the sequences of a large array are osmotically active, respectively. In the case of exper-
of readily available polypeptides are known, and they imental measurements of the osmotic pressure of solu-
form a collection of standards each of the lengths of tions of proteins, a membrane is chosen that is porous to
which, naa, is an exact quantity. It is often, but not always, the small molecules in the solution but the pores of
the case that the amino acid sequence of a newly purified which are too small to pass the molecules of protein. This
polypeptide is known from the nucleotide sequence of its is usually a sheet formed from a polymeric material that
cDNA before enough of it becomes available to study its has spaces between the strands of polymer wide enough
physical properties in detail. This situation has inverted to pass small molecules and ions but too narrow to pass
the classical strategy of physical measurements in which macromolecules.
the molar mass of a protein was one of the ultimate dis- Pressure is the force that results from the tendency
coveries rather than something precisely known from the of molecules to expand the confines in which they are
beginning. Unfortunately, however, the extensive effort contained and fill a larger volume. The pressure they
expended in determinations of molar mass, although exert is a direct measurement of the chemical potential
expended decades ago, still influences the importance of a population of molecules. When molecules such as
attached to molar mass. impermeant solutes exert an osmotic pressure, they
The molar mass of a protein, Mprot, is the number of cannot expand their confines, as can gases, by entering
grams in a mole of that protein. The molecular mass of a the vacant space above the solution because they are
protein is the mass of a single molecule of that protein held by intermolecular forces within a condensed phase.
expressed in relative units that are referred to as either The volume of solution in which they are confined, how-
atomic mass units (amu) or daltons (Da). Both an atomic ever, can expand if solution from the other side passes
mass unit and a dalton (1.6606 ¥ 10–24 g) are 1§12 the mass across the semipermeable membrane into the solution
of carbon isotope 12. Because Avogadro’s number is the containing the impermeant solutes. This is the liquid
number of carbon atoms of isotope 12 in 12 g of carbon analogy to a balloon expanding to fill more space. Just as
atoms of isotope 12, the numerical values of molar mass the balloon expands until the external pressure is equiv-
and molecular mass for the same molecule are the same, alent to the internal pressure of the trapped gas, the solu-
but not the units attached to the numbers. Molar mass is tion containing the impermeant molecules expands until
the quantity that is determined by the measurements of an external pressure equal to its osmotic pressure is
physical behavior such as osmotic pressure, sedimenta- applied to it. Operationally, the osmotic pressure of a
Molar Mass 409
solution is the external pressure that must be applied to The molar concentration of the protein in the solu-
the solution containing the impermeant solute to pre- tion cannot be known if the molar mass of the protein is
vent any expansion in its volume by the net movement of unknown, but the concentration of the protein in the
fluid through the semipermeable membrane. In an solution must be known in some type of units. From the
apparatus for measuring osmotic pressure, the differ- ultraviolet spectrum of the solution, the concentration
ence in pressure between the chamber containing the (moles centimeter–3) of the tryptophan and tyrosine in
protein and the chamber lacking the protein can be the protein might be known.5 From a colorimetric assay,
maintained and continuously monitored with a pressure the concentration (moles centimeter–3) of peptide bonds
transducer.3 in the solution might be known.6 From total amino acid
It can be shown4 that the osmotic pressure exerted analysis,7 the concentration (moles centimeter–3) of
by a solution of impermeant solutes is formally equivalent amino acids in the solution might be known. From a dry
to the pressure exerted on the walls of a container filled weight measurement of the protein, the grams of dry
with gases that are impermeant to those walls. The mole- weight of protein in a milliliter of solution (grams cen-
cules of the impermeant solute are formally equivalent to timeter–3) might be known. Regardless of the units, the
the molecules of the gas, and those of the solvent and per- value for the concentration of protein in the units of the
meant solutes are as physically silent as the vacuum in quantity measured can be designated as Cprot (units cen-
which the gas is suspended. The ideal gas law is timeter–3). It follows that
nm RT
P = (8–1) W [ protein i ] = C prot (8–5)
V
where W is a constant of proportionality.
where P is the pressure (newtons centimeter–2) exerted
The units on W are moles of tryptophan and tyrosine
by the gas, R is the gas constant (831.5 N cm K–1 mol–1), T
(mole of protein)–1, moles of peptide bonds (mole of pro-
is the temperature (kelvins), nm is the number of moles of
tein)–1, moles of amino acids (mole of protein)–1, or grams
the gas, and V is the volume (centimeters3) in which it is
of dry weight (mole of protein)–1, respectively. It is only
confined. The pressure exerted by a real, nonideal gas,
coincidental that the last is usually chosen. It is this exer-
however, is
cise that defines what a colligative property is and illus-
( ) ( )
2 3 trates that osmotic pressure does not measure molar mass
nm nm nm
P = RT + B + C . . . (8–2) directly. The final result can be always traced back to an
V V V independent measurement of the concentration of the pro-
tein. When Equation 8–5 is incorporated into Equation 8–4
where nmV –1 is the concentration (moles centimeter–3) of
the gas and the coefficients B, C, and so forth are referred P RT
lim = (8–6)
to, respectively, as the second virial coefficient, the third C prot Æ 0 C prot W
virial coefficient, and so forth. The virial coefficients
provide the necessary corrections for the behavior of the and the intercept of P/Cprot as Cprot is decreased to 0 pro-
nonideal gas due to the specific properties that make it vides the value of W (Figure 8–1).8* If the units of the con-
nonideal, such as the finite dimensions of the molecules centration Cprot were grams of protein centimeter–3 and
that fill, or exclude, some of the volume of the container, the only osmotically active species present were mole-
the intermolecular forces between the molecules of the cules of the protein of interest in a monodisperse solu-
gas, and the tendency of molecules of the gas to dimerize tion, W would be the molar mass of the protein in grams
or polymerize. For all of the same reasons,4 the osmotic mole–1.
pressure P (newtons centimeter–2), exerted by a non- The complete equation describing the actual
ideal, impermeant solute S is behavior of the osmotic pressure at low concentrations
of protein is
P = RT ([ S ] + B [ S ]2 + C [ S ]3 . . . ) (8–3)
Equation 8–4 states that, at low enough concentrations of * The units of concentration used by the authors for the data in
protein, the osmotic pressure observed will be directly Figures 8–1, 8–2, and 8–5 were grams centimeter–3. The established
proportional to the molar concentration of the protein. symbol for concentration in units of mass volume–1 is g.
410 Counting Polypeptides
tions of the protein, bovine serum albumin, are in the the two solutions. Suppose that the only electrolyte in the
range where only the first virial coefficient, B, is signifi- solution is KCl, the solution in the compartment con-
cant, and the results are presented as if the equation were taining the protein is designated a, and the solution in
the other compartment is designated b. To preserve elec-
P
C prot
= RT(1
W
+ B C prot ) (8–8)
troneutrality
[ K + ]b = [ Cl – ]b (8–9)
[ K + ]a [ Cl – ]a = [ K + ]b [ Cl
–
]b (8–12)
6
at equilibrium. Combining these equations
( )
P / g prot (N cm g –1)
"
+ +
6 i [ protein i ]
5 [ K ]b = [K ]a 1 + (8–13)
[ K + ]a
( )
"
– – 6 i [ protein i ]
4 [ Cl ]b = [ Cl ]a 1 – (8–14)
[ Cl – ]a
( )
osmotic pressure of the solution was the external pressure that had 6i 2 [ protein i ]
to be applied to the solution of protein, in excess of the atmos- lim P @ RT [ protein i ] 1 +
pheric pressure on the solution lacking the protein, to prevent its [ protein i ] Æ 0 4[ KCl ]
expansion. The excess pressure was measured with a toluene
manometer, and millimeters of toluene was converted to newtons (8–15)
centimeter–2. The concentration of protein, following the measure-
ment, in each of the solutions containing protein was determined For molecules with large values of 6 i, such as nucleic
by a dry weight analysis that was corrected for the weight of other acids, this approximation fails badly.
solutes present. The solutions contained 0.15 M NaCl as support-
ing electrolyte. The behaviors of the osmotic pressure at pH 7.0 (3) From Equation 8–15 it can be seen that the Donnan
and pH 5.37 (2) are shown. Adapted with permission from ref 8. effect is expressed in the second virial coefficient, B, as
Copyright 1946 American Chemical Society. demonstrated in Figure 8–1; but when [KCl] > 56 i2[pro-
Molar Mass 411
tein i], the Donnan effect has less than a 5% effect on the from which the exact relationship2,10,11 follows by
osmotic pressure and as [protein i] approaches 0, the rearrangement
Donnan effect also approaches 0. For the measurements
( )( ) ( )
presented in Figures 8–2 and 8–5, but not those in d lnC prot w2 !r dP –1
Figure 8–1, the addition of 0.1 M KCl would satisfy this =
dr 2 2 !C prot dC prot
inequality. T, P, m
Another way to consider the imbalances of coun- (8–17)
terions created by the Donnan effect is to assume that the
charge on the impermeant protein creates an electrical where Cprot is the concentration of protein expressed in
potential and the permeant ions redistribute in response any units, r is the distance (centimeters) of any point in
to this potential. This Donnan potential complicates the chamber from the center of the rotor, w is the angu-
measurements performed by sedimentation equilibrium lar velocity (radians second–1) of the rotor, r is the den-
and light scattering as well as osmotic pressure. In the sity (grams centimeter–3) of the solution of protein, and
absence of added electrolytes, gradients or discontinu- (!r/!Cprot)T,P,m is the change in density of the solution as a
ities of electrical potential would form during all three of function only of the change in the concentration of pro-
these procedures as the concentration of the protein tein. Because at sedimentation equilibrium the chemical
varies within the chambers of the apparatuses. These gra- potential through the entire solution must be the same,
dients or discontinuities of electrical potential, because this change in the density of the solution is at constant
they arise from the Donnan effect, can be eliminated in chemical potential of solvent and all solutes other than
the case of most proteins by adding a simple electrolyte the protein, as indicated by the subscript m.
such as KCl to the solution to a concentration of around Because the derivative on the left in Equation 8–17
0.1 M. One way to understand the effect of adding salt is is that of the natural logarithm of the concentration of
to imagine that the increase in ionic strength decreases protein, it will have the same numerical value regardless
significantly the thickness of the ionic double layer of the units chosen to express the concentration of the
(Equations 1–71 and 1–77) so that it encloses the mole- protein, and the units of concentration cancel on the
cule of protein tightly, effectively neutralizes its charge, right. Because
and turns it into an apparently neutral macromolecule.
The added electrolyte also eliminates any local gradients
of electrical potential caused either by separating two
phases by a semipermeable membrane,1 as in measure-
dP
dC prot
= RT
1
W (
+ 2B C prot + 3C C prot2 . . . )
ments of osmotic pressure, or by differential gravitational (8–18)
forces exerted on ions of unlike mass, as in sedimentation
equilibrium.10 In addition to an electrolyte, a buffer may and
also be added to the solution to maintain the pH.
When a solution of protein is submitted to sedi-
mentation equilibrium, it is placed in a chamber within
the strong centrifugal field created by a spinning rotor,
lim
C prot Æ 0
( 1
W
+ 2B C prot + 3C C prot2 . . . ) =
1
W
and its distribution through the chamber is allowed to (8–19)
reach equilibrium. Because the centrifugal force in this
chamber in the rotor is a function only of the radial dis- Equation 8–17 can be combined with Equations 8–18 and
tance r from the center of rotation, the distribution of the 8–19 and rearranged to give2,10
protein at equilibrium is a function only of r. At equilib-
w2
( )
rium, the centrifugal force upon the protein at each point d lnC prot !r
in the solution is equal but opposite in sign to the force lim = W
C prot Æ 0 dr 2 2RT !C prot
of diffusion that arises from the gradient of its concen- T, P, m
tration. If negligible gradients of electrical potential form (8–20)
because sufficient electrolyte has been added to the solu-
tion, the protein will redistribute until the differential of As predicted by Equation 8–20, when a mono-
the chemical potential it experiences at each position in disperse solution of a particular protein is submitted to
the chamber balances the differential of the centrifugal centrifugation and the distribution of that protein is
potential that it experiences. For a protein, the equality allowed to reach equilibrium, the gradient of concentra-
produced by this balance can be expressed as tion that forms is such that a plot of ln Cprot against r2 is a
straight line (Figure 8–2).12 From the slope of this plot
( dP
dC prot )( dC prot
dr ) = w 2 r C prot
( !r
!C prot )
T, P, m
and a value for (!r/!Cprot)T,P,m, the value of W can be cal-
culated.
The independent measurement of the partial deriv-
(8–16) ative (!r/!Cprot)T,P,m in Equation 8–20 requires the tabula-
412 Counting Polypeptides
g prot ( mg cm –3 )
guanidinium chloride.16
50
The units on the term (!r/!Cprot)T,P,m, as is the case
log g prot
( !r
!g prot )
T, P, m
@ 1 – 7prot rsol (8–21)
from the slope of the line by less than 0.1% of its value.
Consequently, it is not surprising that the molar mass
calculated12 from Figure 8–2, 64,500 g mol–1, agrees quite
closely with the actual value of the molar mass of bovine
is made, where gprot is the concentration of protein in serum albumin, 66,430 g mol–1, a value that was subse-
units of grams centimeter–3, 7prot is the partial specific quently established by its amino acid sequence.19
volume (centimeters3 gram–1) of the protein in the par- Uncoiled polypeptides or highly charged macromole-
ticular solution chosen, and rsol is the density (grams cules such as DNA, however, have much larger virial
centimeter–3) of the solution in the absence of the pro- coefficients, and sedimentation equilibrium of such
tein. Unfortunately, because the decision to use this species can be significantly affected by those virial coef-
approximation was made for other reasons, many inves- ficients.
tigators are unaware that their use of the right-hand term At the present time, instruments that register the
in Equation 8–21 is only an approximation. This lack of concentration of protein by its absorbance at 280 nm are
Molar Mass 413
D Abs
in Figure 8–2 demonstrates, because the concentration
of protein is so low, the virial coefficients can usually be
ignored, but it is nevertheless prudent to perform meas- –0.02
urements at several different concentrations of protein
or several different angular velocities of the rotor or both 0.6
to validate their insignificance.20,21
If the virial coefficients can be disregarded so that
the limit can be assumed, Equation 8–20 can be inte-
grated, and because the concentration of protein (Cprot) is
Absorbance at 280 nm
directly proportional to absorbance at 280 nm (A280)
0.4
( )
2
w M prot !r
A 280 = A 0,280 exp
2RT !C prot
(r 2 – r02 )
T, P, m
(8–23)
0.2
where A0,280 is the absorbance the solution has at a refer-
ence position within the cell at which, by definition,
r = r0. This equation is then fit by nonlinear least squares
to the distribution of absorbance as a function of radius
(Figure 8–3).15,22,23 From the fit, a numerical value for the
quantity w2Mprot(!r/!Cprot)/2RT is obtained. While w2, R, 0.0
and T are known precisely, Mprot and (!r/!Cprot)T,P,m, or its 6.7 6.9 7.1
surrogate 7prot (Equation 8–21) may or may not be. Radius (cm)
Usually, independent measurements of either Figure 8–3: Sedimentation equilibrium of a recombinant form of
(!r/!Cprot)T,P,m or 7prot are made because the reason for the the p51 subunit (M = 49,660 g mol–1, naa = 426) of RNA-directed
experiment is to determine Mprot. In the case of the p51 DNA polymerase from human immunodeficiency virus 1.15 The
subunit of RNA-directed DNA polymerase from human optical cell in the rotor was loaded with a solution of the protein
immunodeficiency virus 1 (Figure 8–3), however, which with an initial absorbance at 280 nm of 0.29. Centrifugation was
performed for 74 h at 12,000 rpm to reach equilibrium. The distri-
is a monomer containing a single subunit, its molar mass bution of absorbance at 280 nm over the cell is plotted in the lower
Mprot (49,660 g mol–1) was already known precisely from panel as a function of radius (centimeters) from the center of rota-
its amino acid sequence, and the purpose of the meas- tion. The line drawn through the points is a nonlinear least-squares
urement of sedimentation equilibrium was to estimate fit of Equation 8–23 to the data, using as the reference position r0
(!r/!gprot)T,P,m (0.225), a quantity that was needed for the the point of highest measured absorbance, A0,280, at the bottom of
the optical cell (to the right of the graph). The upper panel is the
later experiments reported. It is also possible to use the deviation of the experimental absorbance (DAbs) for each point
sedimentation equilibrium of a protein of known molar from the value of the fit at that point. The deviations are distributed
mass at different concentrations of another solute to at random around a value of 0.
determine a value for the preferential hydration of the
protein, (!mH2O/!mprot)T,P,m, in the presence of that a and b, are 81.73 and 72.92 kg mol–1 respectively; and
solute.24 the measured mass ratio of the two polypeptides in the
From the molar mass of a subunit of a protein, intact protein is 1.18 g g–1.
which has been determined precisely by sequencing its There is some confusion between the use in the
constituent polypeptide or polypeptides, and the molar present instance of a centrifugal field to create a gradient
mass of the intact protein, which has been estimated by of the molar concentration of a protein and its use to
sedimentation equilibrium, the number of subunits in measure the sedimentation coefficient, a hydrodynamic
an intact protein can be assessed. For example, by sedi- property of an individual molecule of the protein.
mentation equilibrium, the molar mass of the UvsY Because the centrifugal potential at each point in the
recombination protein from bacteriophage T4 has been chamber can be calculated directly, at equilibrium the
estimated to be 98.6 ± 3.9 kg mol–1 25 and the molar mass chemical potential of the solute, and hence its molar
of each of its identical constituent polypeptides is concentration at each point, can also be calculated. In
15.84 kg mol–1. The molar mass of carbon-monoxide contrast to this measurement at equilibrium, a measure-
dehydrogenase from Moorella thermoacetica was esti- ment of a hydrodynamic property of a molecule of pro-
mated to be 300 ± 30 kg mol–1 by sedimentation equilib- tein is, as the name implies, a kinetic measurement. In
rium;26 the molar masses of its constituent polypeptides, such a kinetic measurement, the rate of movement of the
414 Counting Polypeptides
molecule of protein under an applied force is measured. of this effect is that the distribution of the concentration
Free electrophoretic mobility is an example of such a of protein as a function of radius is perturbed by the
hydrodynamic property. The confusion arises because, equilibrium between the two forms of the protein.
in addition to its use in sedimentation equilibrium, cen- The distribution of the concentration of protein in
trifugal force can be used to move a molecule of protein the chamber as a function of the radial distance from the
through a solution. This use of centrifugal force is unre- center of rotation can be fit by numerical analysis with an
lated to the use of centrifugal force to create a gradient of equation incorporating both the centrifugal potential and
concentration. The confusion also arises because the the perturbation caused by the equilibrium between the
same instrument, an analytical ultracentrifuge, is used to two forms of the protein15 to obtain simultaneously esti-
make each of the measurements, even though they are mates of both the molar masses of the two forms and the
unrelated to each other. numerical value of the equilibrium constant. For exam-
The distribution of concentration as a function of ple, the distribution of the concentration of the p66 sub-
radius at sedimentation equilibrium is often used to unit of RNA-directed DNA polymerase from human
obtain a value for the dissociation constant between two immunodeficiency virus 1 at sedimentation equilibrium
oligomeric states of a protein that are in equilibrium is consistent with the existence in the solution of an
with each other. For example, the protein could be an uncomplicated equilibrium between a monomer and a
equilibrium mixture of a monomer and a dimer dimer of the subunit with a dissociation constant of
2 ¥ 10–5 M,15 that of the subunit of chaperonin GroES is
a 2 1 2a (8–24) consistent with an uncomplicated equilibrium between
monomer and heptamer of the subunit with a dissocia-
for which there is a dissociation constant tion constant of 1 ¥ 10–38 M6,27 and that of the subunit of
CTP synthase from E. coli is consistent with an equilib-
[a ]2 rium among monomer, dimer, and tetramer of the sub-
K d,a 2 = (8–25) unit.28 The dissociation constant (3 mM) for the
[a 2 ] equilibrium between monomers and dimers of DNA heli-
case II measured by sedimentation equilibrium29 agreed
To estimate the numerical value of this dissociation con- with that (1.4 mM) estimated by counting monomers and
stant, the mass-average molar masses can be calculated dimers directly in an atomic force microscope.30
at selected radii from the respective slopes of the distri- The perturbations of the distributions of concen-
bution of concentration at those radii at sedimentation tration at sedimentation equilibrium caused by such
equilibrium (Equation 8–20), and a plot of molar mass equilibria are slight, so a detailed analysis of the devia-
against concentration of protein can be fit by an equa- tions of the data from the curve that has been fit to them
tion incorporating the equilibrium between monomer must be made to insure that those deviations are at
and dimer.22 random (Figure 8–3) rather than systematic. Better yet,
Usually, however, a value for the dissociation con- the measurements should be performed at several differ-
stant is extracted directly from the distribution of the ent concentrations of protein or several different rotor
concentration of protein as a function of radius at sedi- speeds or both to demonstrate that the same value of the
mentation equilibrium. The purpose of creating the cen- measured dissociation constant is consistent with each
trifugal field is to form predictably a continuous gradient of the distributions of protein observed under these dif-
in the molar concentration of the protein. If the protein ferent conditions.31,32
is involved in an equilibrium between two forms that Sedimentation equilibrium can also be used to rule
have different stoichiometries of subunits, the ratio out the existence of an equilibrium among oligomers and
between the molar concentrations of the two forms at a demonstrate that there is only one form of the protein
particular position in the sample will be a function of the present in the solution33 or that there are two forms of the
total concentration of protein, [protein]TOT, at that posi- protein present with different stoichiometries of sub-
tion. For example, in the case of an equilibrium between units that are not in equilibrium with each other and that
monomer and dimer are distributing independently of each other.34 Again, in
( )
order to bolster the conclusion that no equilibration
[a 2 ] 2[protein]TOT " among oligomers is occurring, it should be demon-
= * – – " (8–26) strated that the calculated molar mass or masses are
[a ] K d,a 2
affected neither by changing the concentration of the
protein added to the cell nor by changing the speed of the
Because the molar concentration of particles of protein rotor.20,21
at any position in the sample is [a] + [a2], the equilibrium Light scattering is a property of any fluid. It arises
between monomer and dimer affects the chemical because a fluid is a collection of molecules undergoing
potential of the protein and hence the balance between random movements rather than a rigid solid or a uniform
chemical potential and centrifugal potential. The result continuum of electrons. Scattered light emerges from a
Molar Mass 415
fluid at all angles to the incident direction of a beam of (centimeter3 of solution)–1] and that scattered by an iden-
light passing through the fluid. The source of this scat- tical solution (also in photons second–1 centimeter–3) not
tered light is the electrons in the fluid that oscillate in containing protein, and it is reported relative to the
response to the alternating electric field of the light and intensity of the incident light, I0 (photons second–1), of
in turn emit light. The magnitude of the susceptibility of which it is a very small fraction. It can be shown1,35,36 that
electrons in a molecule to this phenomenon arises from
( ) ( )
their respective polarizabilities, which are reflected in r 2 iq !ñ
2 –1
dP
the refractive index of the molecule. The scattered light is lim = R T K C prot
emitted in directions other than that of the incident light.
q Æ0 I 0 (1 + cos2 q ) !C prot
T, P, m
dC prot
The emission of scattered light from a solution (8–27)
arises from regional fluctuations in polarizability on a
scale smaller than the wavelength of the light. If a fluid where ñ is the refractive index (dimensionless) of the
were a uniform, unfluctuating distribution of electrons, solution of protein and the optical constant K (moles
the scattered light from its constituent electrons would centimeter–4) is defined by
always be canceled by interference and hence no emis-
sion would result. The fluctuations producing the net 2 p 2 ñ 02
scattering are related to local fluctuations in the concen- K ? (8–28)
trations of the components of the solution and hence to l 04 NA
the chemical potential of those components. The major-
ity of the electrons in a solution of protein are on mole- where ñ0 is the refractive index of the solution in the
cules of water. Fluctuations in the local concentrations of absence of the protein, l0 is the wavelength (centimeters)
water are the major contributors to the light scattered of the light in a vacuum, and NA is Avogadro’s number
from the solution in the absence of the protein. When (6.022 ¥ 1023 mol–1). It should be noted that the units on
protein is present, the scattering arising from the mole- Cprot, the concentration of protein, cancel in Equation
cules of protein in the solution is in addition to this back- 8–27. This cancellation again illustrates that the concen-
ground scattering. tration of protein can be expressed in any units. Equation
The scattering of a beam of collimated, unpolarized 8–27 also illustrates that light scattering is a measure-
light is measured by placing a detector at an angle q to ment of the partial derivative of the chemical potential
the beam of unscattered light passing through the and hence the osmotic pressure of the solution of pro-
sample (Figure 8–4) and at a distance r (centimeters) tein. The limit in Equation 8–27 is taken to eliminate any
from the sample. The incremental scattering, iq, is the optical interference that might arise if the dimensions of
difference in intensity between light scattered by a unit the protein are close to the magnitude of the wavelength
volume of the solution of protein [photons second–1 of the light.
The partial derivative (!ñ/!Cprot)T,P,m is the change in
the refractive index of the solution as only the concen-
z tration of protein is increased, at constant chemical
potential of the other solutes such as electrolytes and
buffers. Each of the solutions of protein used to make the
y
determination of (!ñ/!Cprot)P,m, as well as the solution
source sample used in the determination of the light scattering itself,
of A should be equilibrated by dialysis at the appropriate
x
incident q osmotic pressure against a solution identical except for
B the protein to obtain a constant chemical potential of the
light r
f C other solutes throughout. A procedure for measuring the
required refractive indices at constant chemical poten-
detector tials of diffusible components has been devised.2,37
At the present time, the light source usually used for
D measurements of light scattering is a laser; and, if it is not
already polarized, the light is passed through a polarizer.
Figure 8–4: Angular dependence of scattered light. The angular
dependence of scattered light is related to a coordinate system in The intensity of the scattered light from a source of
which the x axis is along the beam of incident light, the z axis is par- polarized light has a different angular dependence than
allel to the electric vector of the light from a polarized source, and that from a source of unpolarized light. The oscillating
the origin is the center of the sample. The angle q is the angle ABC electric vector of the polarized light defines the z axis of a
where A is a point on the x axis beyond the sample, AB is along coordinate system with the sample at the origin (Figure
the x axis, B is the origin, and C is the position of the detector. The
angle f is the angle DBC where D is any point on the z axis, B is 8–4). As defined, the x,y plane is normal to the oscillating
the origin, and C is the position of the detector. The distance r is the electric vector. If f is the angle between the z axis and the
distance from the origin to the detector. ray of scattered light entering the detector, then
416 Counting Polypeptides
( ) ( )
In Figure 8–5, two experiments with bovine serum
r 2iq !ñ
2
dP
–1
lim = R T K C prot albumin under different conditions, with and without
q Æ0 I 0 (2 sin2 f ) !C prot T, P, m
dC prot added electrolyte, are presented. As with the measure-
ments shown in Figure 8–1, it can be seen that the virial
(8–29)
coefficient, B, changes appreciably with changes in con-
ditions; in this case it even inverts in sign because the
where q is still the angle between the beam of unscat-
protein is participating in a concentration-dependent
tered light emerging from the sample and the ray of scat-
oligomerization in the absence of electrolyte. Bovine
tered light entering the detector. If the detector is
serum albumin readily self-associates to form adventi-
confined to the x,y plane (Figure 8–4), the angle f is
tious dimers, trimers, and higher oligomers.40 The inter-
always 90 ∞ and sin2 f = 1. In this configuration, to per-
cepts, however, again remain the same. It has been
form the necessary extrapolation to q = 0, the angle q can
shown that under the same conditions with the same
be varied over all values without changing the angle f.
protein, the same values of the second virial coefficient
When unpolarized light is used, cos2 q changes continu-
are obtained by measurements of either osmotic pres-
ously as the limit of q Æ 0 is taken, and this complicates
sure or light scattering.41 This result demonstrates that
the extrapolation.
the virial coefficient is a property of the solution itself
It is convenient to define a quantity, Rq, known as
rather than the method of measurement, and it provides
Rayleigh’s ratio (centimeters–1), to eliminate the dimen-
further evidence that these techniques are both measur-
sions of the apparatus from the calculation. For unpolar-
ing the same property of the solution. From the extrapo-
ized light
lations in Figure 8–5, the molar mass of bovine serum
albumin was estimated to be 70,200 g mol–1, which is
r 2 iq
Rq ? (8–30) within 6% of the actual value of 66,430 g mol–1.
I 0 (1 + cos2 q ) From an examination of Equation 8–32, it is clear
again that the units chosen for Cprot, the concentration of
and for polarized light when f = 90 ∞ protein (units centimeter–3), determine the units (units
mole–1) of the parameter W. If the concentration of pro-
tein is known in grams centimeter–3, the units on W will
r 2 iq
Rq ? (8–31) be grams mole–1, or molar mass. The determination of
2I 0 the molar mass, however, will only be as accurate as the
measurement of the concentration of protein.
When Equation 8–27 or 8–29 is combined with Equation Electrospray mass spectrometry42 is widely used to
8–30 or 8–31, respectively, as well as Equations 8–18 and
8–19
( )
2
(g cm–2)
!ñ
lim Rq = K C prot W (8–32)
q Æ0 !C prot 22
C prot Æ 0 T, P, m
g prot (R90,u)–1
( ) ( )
–2 as a function of the concentration of protein (gprot). Measurements
1 !ñ 1
+ 2B C prot + 3C C prot2 . . . were made in 0.15 M sodium chloride (upper line) or in water
K !C prot W (lower line) of solutions prepared by diluting an isoionic solution of
T, P, m
albumin into the appropriate solutions. Adapted with permission
(8–33) from ref 39. Copyright 1954 American Chemical Society.
Molar Mass 417
determine the molecular masses of proteins. Individual present.46 Electrospray mass spectrometry was used to
vaporized molecules of protein, each molecule with its verify that each member of a set of site-directed mutants
own particular charge, are submitted to mass spectrom- of dethiobiotin synthase from E. coli contained the
etry (Figure 8–6).43 From the envelope of individual intended amino acid replacement.47 Mass spectrometry
peaks, precise estimates of the mass of the molecule of can also discover errors in a sequence. For example, in
protein can be calculated. For example, it was possible to the case of subunit I of bovine ubiquinol–cytochrome-c
show that the molecular mass of the blue copper protein reductase, electrospray mass spectrometry indicated cor-
rusticyanin from Thiobacillus ferrooxidans was rectly that the published amino acid sequence of the pro-
16,552 Da, which is within 1 Da of the mass calculated tein was missing 27% of its amino acids.46 Intact molecules
from its amino acid sequence.44 of a protein containing several subunits (Figure 8–6) can
Major applications of mass spectrometry are the be vaporized to obtain the molecular mass of the
analysis of posttranslational modification, the verifica- oligomer,48 and membrane-bound proteins can be vapor-
tion of the integrity of a preparation of protein, and the ized after the phospholipid in which they are normally
assessment of the number of subunits in an oligomer. For dissolved has been removed from them.49
example, in the case of rusticyanin the conclusion drawn Now that the sequences of so many proteins are
from the mass spectrometric analysis was not that the known, as well as the stoichiometries of their subunits,
molecular mass of the protein was 16,552 Da, which was precise values of molar mass can be calculated for a large
already known, but that the protein lacked posttransla- array of proteins from their atomic compositions. These
tional modifications. In the case of the L1 metallo-b lac- can be compared with values determined by osmotic
tamase of Stenotrophomonas maltophilia, mass pressure, sedimentation equilibrium, and light scatter-
spectrometric analysis demonstrated that the normal ing before the actual molar masses were known (Table
posttranslational removal of the 21 amino-terminal 8–1). By and large, the agreement between the actual
amino acids had occurred,45 and in the case of subunit V values of molar mass and the measured values is quite
of ubiquinol–cytochrome-c reductase, mass spectromet- close, and this in itself validates the methods.
ric analysis demonstrated that the iron–sulfur cluster was The problem with molar mass, no matter how accu-
rately it can be determined, is that it means very little to
most people. Once it was clear that proteins were poly-
T mers of amino acids, sometimes posttranslationally
T +31
100 +32 modified, the reason behind all determinations of molar
mass has been to estimate the number of amino acids
Relative ion abundance
Table 8–1: Comparison of the Actual Molar Masses of Selected Proteins and the Molar Masses Determined by Light
Scattering, Osmotic Pressure, and Sedimentation Equilibrium
a
The actual molar mass of each protein was calculated from the amino acid sequences of its constituent polypeptides and their stoichiometry in the complex.
Table 8–2: Tabulation of the Mean Grams (Mole of Amino Acid)–1 in a Set of Proteins
a
Proteins composed of one or more polypeptides, the sequences of which were available in the Swissprot data bank, were chosen as examples. An attempt was made to
include examples of all types of proteins, but extremely unusual proteins such as collagen were avoided. The constituent polypeptides the compositions of which were
used are indicated. bThe lengths of the polypeptides chosen for analysis are presented in numbers of amino acids. cCalculated by dividing the molar mass of the protein
portion of the polypeptide or polypeptides by the length or combined lengths, respectively.
Molar Mass 419
ments has always relied ultimately upon the ability of the Becerra, S.P., Kumar, A., Lewis, M.S., Widen, S.G., Abbotts, J.,
investigator to make an accurate measurement of dry Karawya, E.M., Hughes, S.H., Shiloach, J., Wilson, S.H., & Lewis,
M.S. (1991) Protein–protein interactions of HIV-1 reverse tran-
weight. All values of molar mass can be traced back to
scriptase: implication of central and C-terminal regions in sub-
such a determination. The difficulties involved in meas- unit binding, Biochemistry 30, 11707–11719.
urements of dry weight have been noted,18 and more
than anything else, the accuracy of the values in Table
8–1 are a testimony to the careful measurements of this Problem 8–1: Calculate the molar mass of bovine
quantity. Accurate dry weight measurement, however, ribonuclease from its sequence.
requires more protein than is usually available, and other
measures of protein concentration have unfortunately Problem 8–2: Calculate the molar masses of these pro-
but necessarily supplanted it. Ironically, when the teins that have the following incremental osmotic pres-
amount of protein is in short supply, the most accurate sures at 25 ∞C.
method for assessing its concentration is quantitative
amino acid analysis, which is a measure of moles of
amino acids centimeter–3, and the use of A280 to follow P
protein lim
protein in sedimentation equilibrium actually is a meas- g prot Æ 0 g prot
ure of the concentration of tyrosine and tryptophan in L-lactatedehydrogenase 53
173 dyne cm–2 (g of protein)–1 L
the solution. b-lactoglobulin77 0.722 cm of H2O (g of protein)–1*
One of the most peculiar manifestations of the abid- bovine serum albumin8 3.57 N cm (g of protein)–1
ing infatuation with molecular mass is the habit of
* These are the centimeters that the level of the solution containing
naming proteins on the basis of estimates of the molecu-
the protein rose above the level of the solution lacking the protein
lar mass performed by electrophoresis of their complexes as a result of the expansion of the former at the expense of the
with dodecyl sulfate. For example, protein p27 from latter. This additional layer of solution exerts a pressure because of
simian retrovirus SRV-1 has an actual molecular mass of the force of gravity. The units of centimeters are converted into
24.73 kDa,73 and protein p56 from murine lymphoma has newtons centimeters–2 by multiplying by the density of the solution
lacking the protein (grams centimeter–3) and the gravitational
an actual molecular mass of 57.82 kDa.74 It is unclear what
acceleration (980.6 cm s–2 at sea level, 45 ∞ latitude) felt by the
will be done when two-digit numbers run out. excess fluid on top of the solution of protein. The density of the
Part of the description of a particular protein is an solution has already been converted into the density of water.
enumeration of the length of each of the polypeptides
from which it is composed and the number of each sub-
unit that it contains. At one time this information could Problem 8–3: The isoelectric pH of bovine serum albu-
be most conveniently learned by ascertaining both the min in a solution of 0.15 M NaCl is 5.37.78 From the data
molar mass of the entire protein and the molar mass of in Figure 8–1, calculate the mean net charge number 6 SA
the isolated individual polypeptides. The history of this on the serum albumin at pH 7.0 by assuming that the dif-
quest is interesting but beyond the scope of the present ference in slope between the two lines is due entirely to
discussion. In two celebrated instances, that of aspartate the Donnan effect.
carbamoyltransferase and that of fructose-bisphosphate
aldolase,75,76 disagreements arose over the results from Problem 8–4: The second virial coefficient, B, for the
such measurements. These particular disagreements osmotic pressure of lysozyme from Gallus gallus in solu-
coincided with the development of the two techniques tions of (NH4)4SO4 varies as a function of pH and ionic
that have supplanted almost entirely the earlier methods strength, Ic:3
of determining molar mass that were just described.
These newer procedures are both based on the elec-
pH Ic (M) second virial
trophoresis of complexes between polypeptides and coefficient
dodecyl sulfate upon gels of polyacrylamide. In one pro- (cm3 mmol g–2)
cedure, sieving is used to display the different types of 4 1 –140 ± 23
polypeptides in the protein and provide estimates of the 7 1 –198 ± 15
length of each. In the other procedure, patterns of cova- 8 1 –307 ± 21
lently cross-linked polypeptides separated by elec- 4 3 –396 ± 19
7 3 –423 ± 34
trophoresis are used to count the number of each 8 3 –446 ± 26
polypeptide present in the whole protein.
(C) What limit do these measurements place on the Problem 8–7: Calculate the molar mass of aspartate
isoelectric point of lysozyme? kinase I–homoserine dehydrogenase I from the data in
(D) What limit do these measurements place on the this figure (adapted with permission from ref 80; copy-
second virial coefficient at the isoelectric pH for right 1968 Springer-Verlag).
lysozyme?
(E) Why does the second virial coefficient have
The refractive index ñ0 of the solution without the pro- their total length. Pitt-Rivers and Impiombato85 observed
tein was 1.333, the increment of the refractive index with that, within a series of globular, water-soluble proteins,
concentration [(!ñ/!gprot)T,P,m] for the protein at l0 = each of the constituent polypeptides would bind 0.54 ±
436 nm was 0.1883 cm3 g–1, and the temperature was 0.01 molecule of dodecyl sulfate for every amino acid in
25 ∞C. its sequence. The important point, however, is not the
numerical value of this ratio but the fact that it is constant
(A) Determine graphically the limit
(less than 2% variation) regardless of the protein exam-
g prot ined, as long as it is of the usual water-soluble, globular
lim variety and does not have significant segments of its
g prot Æ 0 R 90 sequence enriched in acidic amino acids and lacking basic
amino acids.86 This regularity in the binding of dodecyl
(B) Assume that the Rayleigh ratio, Rq, does not vary sulfate, however, is observed only when all of the cystines
significantly with variation in q for this small pro- in the proteins, if there were any, have been cleaved.85
tein at this long wavelength and that Usually this is done by disulfide interchange with a small
thiol (Figure 3–20). The constant ratio between bound
g prot g prot dodecyl sulfate and the number of amino acids presum-
lim = lim ably results from the fact that proteins displaying this
q Æ0 Rq g prot Æ 0 R 90
g prot Æ 0 behavior all have similar compositions of amino acids.
Proteins with peculiar compositions,87 an excess of
charged side chains,86 or an excess of hydrophobic side
and use the value for this limit to estimate the chains88 behave anomalously.
molar mass of ovalbumin. The complexes that form between dodecyl sulfate
and polypeptides are extended structures and have been
Problem 8–9: Calculate the mean molar mass of an
variously described as cylindrical rods the length of
amino acid from the amino acid composition of the set of which is directly proportional to the length of the
proteins in Table 7–4. polypeptide89 or micellar pearls of dodecyl sulfate on a
string of the flexible polypeptide.90 No definitive descrip-
tion of their structure is available, but there is no evi-
Electrophoresis on Gels of Polyacrylamide Cast dence that the dodecyl sulfate in these complexes is
in Solutions of Dodecyl Sulfate present in discrete packets of 100 molecules of detergent,
as would be expected if the micelles present in the
The sodium salt of dodecyl sulfate (H25C12OSO3–) is a deter- absence of the protein were simply incorporated intact
gent widely used commercially to dissolve nonpolar sub- into a long string upon the unfolded polypeptide.
stances in water. It accomplishes this purpose by forming As with nucleic acids and presumably for the same
micelles. A micelle of dodecyl sulfate at moderate ionic reasons, the complexes between dodecyl sulfate and
strength (0.2 M) contains about 100 of the anions in an those polypeptides that bind a constant ratio of this
oblate ellipsoid that is 3 nm across at its minor axis.82,83 All strongly anionic detergent all display the same free elec-
of the anionic sulfonates are at the surface of the ellipsoid trophoretic mobility, (–2.62 ± 0.04) ¥ 10–4 cm2 s–1 V–1,
and the hydrocarbon is in the center. It is the hydrocar- regardless of the length of the polypeptide.90 In the case
bon core of the micelle that dissolves individual molecules of nucleic acids, the invariance of the free electrophoretic
of a nonpolar substance, producing the detergent proper- mobility with length results from the uniform distribu-
ties. Although dodecyl sulfate must be present at concen- tion of negative charge along the regular polymer, and
trations high enough to form micelles in order to interact presumably this is also a necessary condition met by the
with proteins, the complexes that result between the complexes between dodecyl sulfate and polypeptides.
anions of dodecyl sulfate and polypeptides do not seem to With nucleic acids, however, this is a covalently con-
involve discrete micelles. ferred, intrinsic property of the phosphodiesters of the
When sodium dodecyl sulfate84 is added to a solu- backbone rather than the fortuitous and less reliable
tion of protein at a concentration greater than its critical inclination of the polymer to bind a charge-conferring
micelle concentration* and at ratio greater than 2 g of species uniformly along its length. As such, any polypep-
dodecyl sulfate (g of protein)–1, all of the polypeptides pres- tide that binds dodecyl sulfate abnormally should have a
ent in the solution are unfolded and separate from each different free electrophoretic mobility. When the amount
other as they become coated with the dodecyl sulfate. The of dodecyl sulfate bound to a series of polypeptides was
amount of dodecyl sulfate coating the unfolded, separated purposely decreased, the magnitudes of their free elec-
polypeptides at saturation is usually a function only of trophoretic mobilities also decreased.90
As is the case with nucleic acids,91 native proteins
* The critical micelle concentration is the minimum concentration (Figure 1–17), and other macromolecules submitted to
at which the detergent forms micelles. electrophoresis on gels of polyacrylamide or other poly-
422 Counting Polypeptides
meric supports, the electrophoretic mobilities of com- releases the complexes from the descending boundary in
plexes between dodecyl sulfate and polypeptides (Figure which they are stacked by using an ascending boundary
8–7)92 follow the relationship that increases the concentration of the neutral conjugate
base of the cationic acid that is common to all of the solu-
u i = uªi exp ( – K r,i Ta ) (8–34) tions.94–96 The other releases them by using an ascending
boundary that delivers the neutral conjugate base of a
different cationic acid of much higher pKa to jump the pH
where Ta is the concentration of acrylamide (in percent)
after the complexes have stacked.97 The latter system is
from which the gel was cast and the retardation coeffi-
more effective at releasing the smaller polypeptides from
cient, Kr,i is a constant unique to the particular polypep-
the descending boundary than is the former. The former
tide i. Because u∞i is the free electrophoretic mobility of
relies heavily on the increase in the concentration of
the complex between dodecyl sulfate and polypeptide i
polyacrylamide at the top of the running gel to accom-
and u∞ is the same for all complexes between dodecyl sul-
plish the release and fails to do so when the concentra-
fate and well-behaved polypeptides, this relationship
tion of polyacrylamide in the running gel is decreased
predicts that the lines in Figure 8–7 should intersect at
below a certain level.
the axis of the ordinate when Ta is equal to 0, which is
As with nucleic acids, the electrophoretic mobilities
almost the case. Because each complex has a unique
of complexes between dodecyl sulfate and polypeptides
retardation coefficient, electrophoresis on gels of poly-
on gels of polyacrylamide are a regular function of the
acrylamide can be used to separate these complexes one
from the other (Figure 8–8).93
Systems for stacking complexes between dodecyl
sulfate and polypeptides have been developed to
improve the resolution of the separation. One system
1.0
(Rf)
0.8 my
ch
0.6
ld
Relative mobility
ov
0.4 gd
sa
ph
0.2
0.1
0 2 4 6 8
Percentage of acrylamide (Ta)
Figure 8–8: Separation of polypeptides by electrophoresis on gels
Figure 8–7: Relative electrophoretic mobilities, Rf, of complexes of of polyacrylamide cast in a solution of 0.2% sodium dodecyl sul-
polypeptides and dodecyl sulfate as a function of the concentration fate.93 Proteins containing the polypeptides were dissolved in solu-
of acrylamide, Ta, used to cast the gel.92 Various proteins—myoglo- tions of sodium dodecyl sulfate sufficient to saturate them. They
bin (my; 153 aa), chymotrypsinogen (ch; 245 aa), L-lactate dehy- were submitted to electrophoresis on cylindrical gels (0.6 cm ¥
drogenase (ld; 331 aa), ovalbumin (ov; 385 aa), glutamate 10 cm) cast from 10% acrylamide in 0.1% sodium dodecyl sulfate
dehydrogenase (gd; 501 aa), bovine serum albumin (sa; 583 aa), and 0.1 M sodium phosphate, pH 7.0. Following electrophoresis,
and phosphorylase (ph; 842 aa)—were dissolved separately in a the gels were stained for protein. The polypeptides that were run
solution containing a concentration of dodecyl sulfate sufficient to on gel A were those composing bovine catalase (naa = 506), the
saturate the polypeptides. Each was then submitted to elec- mitochondrial isoform of porcine fumarate hydratase (naa = 466),
trophoresis on gels of polyacrylamide cast in a solution of isoform A of fructose-bisphosphate aldolase from muscle of
1% sodium dodecyl sulfate. A series of gels was used for each pro- Oryctolagus cuniculus (naa = 361), glyceraldehyde-3-phosphate
tein that differed in the percent acrylamide (Ta) from which they dehydrogenase from muscle of O. cuniculus (naa = 332), human car-
were cast. The gels were stained for protein, and the distance bonate dehydratase I (naa = 260), and equine cardiac myoglobin
migrated by the protein was divided by the distance migrated by a (naa = 153). The polypeptides run on gel B were the same as those
dye, Pyronine-Y, of low molecular weight to obtain the relative run on gel A, but the myoglobin was omitted. The polypeptides run
electrophoretic mobility (Rf) of each polypeptide at each percent on gel C were catalase, fumarate hydratase, the E isoform of alco-
acrylamide. The assumption made was that the mobility of the hol dehydrogenase from equine liver (naa = 374), glyceraldehyde-
Pyronine-Y would be unaffected by the percent acrylamide. 3-phosphate dehydrogenase, carbonate dehydratase, and myoglo-
Reprinted with permission from ref 92. Copyright 1972 Journal of bin. Reprinted with permission from ref 93. Copyright 1969 Journal
Biological Chemistry. of Biological Chemistry.
Sieving 423
length of the polypeptides, as long as they have a normal centration of polymer [grams (100 cubic centimeters)–1]
composition of amino acids87 and bind the proper into its linear density (centimeters centimeter–3), and
amount of dodecyl sulfate. To understand this property Sapp,i is the apparent surface area (centimeters2) of the
of the electrophoretic separations, the process known as macromolecule i.
sieving must be understood. Because the polymers are not lines but solids them-
selves, the apparent surface area of the macromolecule
Suggested Reading is not its real surface area. The apparent surface of
macromolecule i, Sapp,i, lies outside its actual surface by a
Weber, K., & Osborne, M. (1969) The reliability of molecular weight
determinations by dodecyl sulfate–polyacrylamide gel elec-
distance equal to the sum of the widths of any tight shells
trophoresis, J. Biol. Chem. 244, 4406–4412. of hydration around either the macromolecule or the
polymer and the width of the polymer itself. All of the
dimensions that cause the polymer not to be a line and
the macromolecule not to be a dry smooth solid object
Sieving are incorporated into the dimensions of an apparent
macromolecule that is larger than the actual macromol-
Sieving of macromolecules, for example, native proteins,
ecule. When the actual macromolecule collides with the
nucleic acids, or complexes between dodecyl sulfate and
actual polymer, the apparent macromolecule collides
polypeptides, occurs during both chromatography by
with a line in the center of the polymer.
molecular exclusion and electrophoresis on polymeric
This model predicts that if a series of beaded sta-
supports. Sieving is the discrimination between macro-
tionary phases of increasing concentration of polymer is
molecules on the basis of size that is accomplished by a
used to separate the same set of standard macromole-
random network of linear polymers. In chromatography
cules by molecular exclusion chromatography, then
by molecular exclusion, the network of polymer forms
the beads among which the mobile phase percolates and
is the sieve within the beads through which the macro- (
K av,i = exp – K r,i TP ) (8–37)
molecule diffuses when it is inside of the stationary
phase. In electrophoresis, the network of polymer forms
If this is so, then ln Kav,i should be a linear function of TP.
an obstacle course through which the macromolecule
It has been demonstrated that the relationship of
must pass as it moves in the direction of the electric field.
Equation 8–37 describes the behavior of both pro-
Consider a geometric solid of any shape within a
teins98,100,101 and polysaccharides101 during chromato-
network of lines thrown at random through a volume of
graphy by molecular exclusion on gels of both
space completely containing the solid. An equation98 for
polyacrylamide (Figure 8–9)98,100,102 and linear dextrans.101
the probability that none of these lines intersects the
If Equation 8–36 describes behavior during chro-
solid, P(ni), was derived during the solution of an unre-
matography by molecular exclusion, then when TP is
lated topological problem,99 and
fixed, –ln Kav,i should be directly proportional to Sapp,i.
Computer programs exist for calculating the accessible
P (ni) = exp (– lS/4) (8–35) surface area of a molecule of a protein (6–12) by rolling a
spherical probe over the surface of its crystallographic
where l is the density of the lines (centimeters centime- molecular model.103 The accessible surface area is the
ter–3) and S is the surface area of the solid (centimeters2). surface area traced by the center of the probe.
Assume that a macromolecule is a geometric solid and a Unfortunately, there is no computer program that per-
network of chemical polymers is a network of lines. forms such a calculation for a cylindrical probe, which
When a molecule of protein is submitted to chromatog- would not detect the smaller irregularities of the surface
raphy by molecular exclusion, the fraction of the total so readily as does a spherical probe (Figure 6–20). One
volume available to macromolecule i, Kav,i (Equation can choose, however, a radius for the spherical probe
1–21), in the stationary phase of randomly arranged that is large enough to include the radius of the polymer
linear polymers should be that fraction of the total and the layers of hydration on the polymer and the pro-
volume the occupation of which by the macromolecule tein as well as being large enough that the smaller irreg-
does not cause any polymer to intersect the macromole- ularities of the surface of the protein that would not be
cule. In this case, detected by a cylinder are also not detected by the
sphere. When a sphere of the appropriate radius is used
(
K av,i = exp – b TP Sapp,i ) (8–36) as the probe, the values of –ln Kav,i for a series of standard
proteins that have been submitted to chromatography
by molecular exclusion104 upon cross-linked dextran are
where TP is the concentration of polymer in percent found to be directly proportional to the accessible sur-
[grams (100 cubic centimeters)–1], b is a constant of pro- face areas103 calculated from crystallographic molecular
portionality to convert, among its other roles, the con- models of those same proteins (Figure 8–10A).
424 Counting Polypeptides
4 1 2 3
9
7
6
4
1
4
2
3
ly amino acids a protein contains. It has been noted by
Ogston105 that if a set of macromolecules were all spheres
6 1
5
8 2
7 4
of radius Ri and the polymers of the network were infi-
Distribution coefficient
9 6 1
5 rn
0.20 8 7 2
10 9
5 nitely long right cylinders of radius rP, then
8
7
64
1
cy
9
( )2
10
0.10 5 4 my S app,i = 4 p r P + R i (8–38)
8
11 7
ch
0.06 10 9 7
9
Because the partial molar volume of a molecule of pro-
10 5 tein is a function only of its composition of amino
8 ps
11 hb acids18,106 and because the amino acid compositions of
0.02 11 ov om most proteins are similar, each of their partial molar vol-
11
ig 10
umes should be directly proportional to the number of
sa
amino acids each protein contains (Table 8–2). To the
0.01
0 4 8 12 16 20 extent that a molecule of protein i is a sphere and has the
Percentage of acrylamide (Ta) normal composition of amino acids, the number of
amino acids it contains, naa,i, should determine its radius
Figure 8–9: Distribution coefficients for a series of globular pro- by the relationship
teins submitted to chromatography by molecular exclusion on
( )
polyacrylamide gels of varying composition.98,100 A series of gels naa,i Vaa ˝
cast from different concentrations of acrylamide, Ta (percent), were
3
R ªi = (8–39)
separately fragmented to form suspensions of polyacrylamide 4p NA
granules of different porosities. Columns were made from these
chromatographic media, and a set of standard globular proteins ––
were submitted to chromatography by molecular exclusion on where V aa is the mean partial molar volume of the amino
these columns and their respective elution volumes were used to acids in the usual protein (82 cm3 mol–1) and the super-
calculate the respective distribution coefficients, KD (Equation script has been added to R to indicate that this is a sphere
1–22). The distribution coefficient KD is directly proportional to the equivalent in volume to the volume of the protein, which
distribution coefficient Kav. The values of KD are plotted on a loga-
rithmic scale as a function of Ta. The proteins were, in order of their
is never exactly a sphere.
number of amino acids, (1) cytochrome c (cy; 104 aa), (2) ribonu- When Equations 8–36, 8–38, and 8–39 are combined
clease (rn; 124 aa), (3) lysozyme (ly; 129 aa), (4) myoglobin
( ) ( )
(my; 153 aa), (5) ovomucoid (om; 186 aa), (6) chymotrypsinogen – ln K av,i " ˝
3 naa,i Vaa
(ch; 245 aa), (7) pepsin A (ps; 327 aa), (8) ovalbumin (ov; 385 aa), = rP + (8–40)
(9) hemoglobin (hb; 574 aa), (10) serum albumin (sa; 583 aa), and 4p bTP 4p NA
(11) immunoglobulin G (ig; 1320 aa). Adapted with permission
from ref 98. Copyright 1970 National Academy of Sciences.
Figure 8–10: Sieving of globular proteins by molecular exclusion chromatography.104 The proteins, dissolved in 0.1 M KCl at pH 7.5, were
submitted to chromatography by molecular exclusion on a column (2.5 cm ¥ 50 cm) of Sephadex G-200. The volumes at which the several
proteins eluted from the column were tabulated. The distribution coefficient Kav for each was calculated (Equations 1–20 and 1–21) from its
elution volume, Ve, and the void volume of the column, V0, and the included volume of the column, Vi (as determined by the volume at which
sucrose eluted). It was assumed that Vi = VH2O, that VT = (1 – fpoly)–1VH2O, and that Wr = 20 mL g–1. The proteins used by Andrews104 were, in
order of increasing total number of amino acids, equine cytochrome c (naa = 104), myoglobin from Physeter catodon (naa = 153), bovine chy-
motrypsinogen (naa = 245), ovalbumin from G. gallus (naa = 385), bovine serum albumin (naa = 583), bovine lactoperoxidase (naa = 612), the
cytoplasmic isoform of malate dehydrogenase from porcine heart (naa = 666), bovine transferrin (naa = 685), glyceraldehyde-3-phosphate
dehydrogenase from muscle of O. cuniculus (naa = 1328), the A isoform of L-lactate dehydrogenase from muscle of O. cuniculus (naa = 1324),
alcohol dehydrogenase from Saccharomyces cerevisiae (naa = 1388), the A isoform of fructose-bisphosphate aldolase from muscle of O. cunicu-
lus (naa = 1452), the mitochondrial isoform of porcine fumarate hydratase (naa = 1864), bovine catalase (naa = 2024), b-galactosidase from E. coli
(naa = 4092), equine apoferritin (naa = 4368), and urease from Canavalia ensiformis (naa = 5040). (A) The quantity –ln Kav is plotted as a func-
tion of the accessible surface area (6–12; nanometers2) of those molecules of protein for which crystallographic molecular models were avail-
able (all of the proteins used by Andrews except lactoperoxidase, alcohol dehydrogenase, and urease). The accessible surface areas were
calculated with a spherical probe of radius 1.1 nm (Figure 6–20) by use of the program of Lee and Richards103 as adapted by Dr Ilya Shindyalov
of the Protein Data Bank. Of necessity, crystallographic molecular models of proteins from species other than the species providing the pro-
tein for the molecular exclusion often had to be used. The access codes of the crystallographic molecular models in the Protein Data Bank
that were chosen are 5CYT, 3CYT, 1CYC, 1CRC, 1HRC, 1ABS, 1DXD, 1HJT, 1SWM, 2MBW, 1MCY, 2CGA, 4CHA, 1OVA, 1BJ5, 1UOR, 1BKE,
1AO6, 4MDH, 5MDH, 1DOT, 1OVT, 1CB6, 1CE2, 1GPD, 4GPD, 3GPD, 1LDM, 2LDX, 3LDH, 5LDH, 9LDB, 9LDT, 6ALD, 4ALD, 2ALD, 1FUR,
1YFM, 1DGF, 1DGG, 4BLC, 7CAT, 8CAT, 1BGL, 1BGM, 1FHA, 1AEW, and 1DAT. The closed circles are those for proteins the frictional ratios
of which are less than 1.20. (B) The quantity (–ln Kav)" is plotted as a function of the cube root of the number of amino acids in each protein.
(C) The quantity (–ln Kav)" is plotted as a function of the Stokes radius, a, of each protein calculated from its diffusion coefficient by Equation
1–67. In panel A the line was fit to the averages of the surface areas for each protein, but in panels B and C it was fit only to the eight points
(closed circles) for the proteins the frictional ratios of which are less than 1.2.
Sieving 425
3 2
A B
2
– ln K av
( – ln K av )1/2
1
0 0
0 200 400 600 800 1000 0 5 10 15 20
Accessible surface area (nm2 ) n aa1/3
2
C
( – ln K av )1/2
0
0 2 4 6
a (nm)
426 Counting Polypeptides
where R∞ is defined by Equation 8–39. For the most acids contained within a protein of interest.104 This esti-
spherical of proteins, values of the frictional ratio of mation requires that the distribution coefficients, Kav, for
1.1–1.2 are observed.51 The values of the frictional ratio a series of uncomplicated standard proteins of known
are always greater than 1 because the water bound to a number of amino acids be used to define the line for the
protein and the irregularities of its surface increase its chromatographic system chosen for the particular exper-
actual frictional coefficient. iment. The estimate for the number of amino acids in the
The solid circles in Figure 8–10B are those for all of protein of interest is interpolated from the known values
the proteins chosen by Andrews that happen to have fric- for the standards. A standard line for a particular chro-
tional ratios less than 1.2. They are, in ascending order of matographic column must be established by running
size (with the frictional ratios in parentheses), standards on that column, because the properties of
cytochrome c (1.09), myoglobin (1.16), chymotrypsino- each commercial batch of chromatographic medium are
gen (1.12), ovalbumin (1.18), glyceraldehyde-3-phos- unique.101 It is also important to run the chromato-
phate dehydrogenase (phosphorylating) (1.16), L-lactate graphic system with a buffer of ionic strength 0.1–0.2 M
dehydrogenase (1.17), apoferritin (1.15), and urease to eliminate the effect of the dimensions of the ionic
(1.18).* The line in Figure 8–10B was drawn through double layer around the charged macromolecules on the
these points because they should be the most ideal parameter Kav.110
examples. It can be seen that several of the points for pro- As a macromolecule moves through a polymeric
teins with larger frictional ratios indicate that they are network during electrophoresis, it is also being sieved. In
behaving as if they were larger than they are, which this case, it must travel through the network in a kinetic
makes sense if their behavior is a function only of their process, rather than equilibrating with the internal
surface area. Further validating the conclusion that it is volume of a bead, but it appears that this distinction is
only the surface area of a molecule of protein that deter- inconsequential. It has been argued98 that one can view
mines its distribution coefficient is the fact that the dis- the solid matrix of the polymerized gel as an array of
tribution coefficients of proteins with larger frictional screens through which the macromolecule must travel. A
ratios (open circles in Figure 8–10A) show no more devi- random cross section through a random three-dimen-
ation from linear behavior than do those of the proteins sional network of lines will provide a distribution of
with frictional ratios less than 1.20 (solid circles in Figure points. The probability that none of these points lies
8–10A) when they are plotted as a function of surface within the randomly placed, random cross section of a
area rather than volume. geometric solid of any shape is still described by
It has been argued101,108,109 that, rather than the Equation 8–35.98 If those points represent one of the
apparent surface area of a molecule of protein, the fun- screens in the gel, if the macromolecule can pass only
damental variable in describing its behavior when it is through openings in that screen large enough so that no
submitted to sieving on chromatography by molecular point forming that screen is found within the cross sec-
exclusion is its effective radius, or Stokes radius, a, cal- tion of the macromolecule, and if the rate of its move-
culated from its diffusion coefficient (Equation 1–67). ment through the screen is proportional to the
When the same values of (–ln Kav)" displayed in Figure probability that openings of the proper size or larger will
8–10B are replotted against the effective radii for the var- be encountered, then the mobility of a macromolecule
ious proteins (Figure 8–10C), no significant improve- through a gel during electrophoresis should be described
ment is seen. Something can be learned, however, when by
a line is again drawn through the points for the eight
most globular proteins listed above, the properties of (
u i = u ªi exp – b TP Sapp,i ) (8–42)
which should be least affected by the change to effective
radius from naa § . It can be seen that the use of the effec-
13
surface areas calculated from crystallographic molecular they do display linear behavior (Figure 8–11B).40,98,111
models of the same proteins (Figure 8–11A).40,111 Again, the intercept with the abscissa is at a negative
As already noted, however, the property of a mole- value as predicted by Equation 8–44, and this intercept
cule of protein that is usually of interest is not its acces- yields a value of rP, the mean radius of the polyacry-
sible surface area but the number of amino acids it lamide, of 0.9 nm.98 Unlike the behavior of globular pro-
contains. If it is assumed that a series of proteins resem- teins on chromatography by molecular exclusion (Figure
bles a series of spheres and that Equations 8–38 and 8–39 8–10B), the proteins with higher frictional ratios (open
are still valid approximations, then symbols in Figure 8–11B) do not deviate systematically
from linear behavior more significantly than those with
A B
0.6
0.4
Retardation coefficient (K r )
0.4
K r1/2
0.2
0.2
0
0 200 400 600 800 1000 0
0 5 10 15
Accessible surface area (nm2 )
n aa1/3
Figure 8–11: Relationship between the retardation coefficients Kr measured by electrophoresis and the surface areas or the numbers of
amino acids naa for a set of proteins.40,111 A series of proteins were submitted to electrophoresis, each protein on a series of gels cast from solu-
tions of increasing concentrations of acrylamide. The slopes of the lines from plots of the logarithm of the relative mobility against the per-
cent of acrylamide were used to calculate Kr for each protein. The proteins used, in order of increasing number of amino acids, were
ovalbumin from G. gallus (naa = 385), porcine a-amylase (naa = 496), bovine serum albumin (naa = 583), human transferrin (naa = 679), ovo-
transferrin from G. gallus (naa = 686), aspartate kinase–homoserine dehydrogenase from Zea mays (naa = 828), hexokinase from S. cerevisiae
(naa = 970), the A isoform of L-lactate dehydrogenase from muscle of O. cuniculus (naa = 1324), the A isoform of fructose-bisphosphate aldolase
from muscle of O. cuniculus (naa = 1452), b-amylase from Ipomoea batatis (naa = 1992), bovine catalase (naa = 2024), the M1 isoform of pyru-
vate kinase from O. cuniculus (naa = 2120), bovine xanthine oxidase (naa = 2662), equine apoferritin (naa = 4272), ribulose-bisphosphate car-
boxylase from Chlamydomonas reinhardtii (naa = 4904), and urease from C. ensiformis (naa = 5040). (A) The values of the retardation
coefficients Kr are plotted as a function of the accessible surface areas (nanometers2) of the proteins calculated as described in Figure 8–10A
with a spherical probe of 1.1 nm. Surface areas were calculated for ovalbumin, serum albumin, transferrin, ovotransferrin, L-lactate dehy-
drogenase, aldolase, pyruvate kinase (1PKM, 1A49), catalase, xanthine oxidase (1FO4), apoferritin, and ribulose-bisphosphate carboxylase
(1AA1, 1RCO). (B) The square roots of the retardation coefficients for all of the proteins are plotted as a function of the cube roots of their
number of amino acids. In both panels, circles are for the data of Hedrick and Smith111 and squares are for the data of Bryan;40 in panel B,
solid symbols are for proteins with frictional ratios less than 1.20.
428 Counting Polypeptides
– ln K D
is added, it increases Sapp by the same increment, once
the polymer is beyond a certain length. In this case,
(
S app,i = c + d n aa,i ) (8–45)
1
where c incorporates all of the properties of homologous
short polymers only a few segments in length.
When proteins are dissolved in solutions of guani-
dinium chloride, they unfold and their individual
polypeptides become separated, random coils.112 A series
of these randomly coiled polypeptides, the lengths of 0 200 400 600
which are now precisely known, were submitted to chro- n aa
matography by molecular exclusion on beaded agarose,
and the values of KD (Equation 1–22) were reported.113 Figure 8–12: Sieving of unfolded, randomly coiled polypeptides
on chromatography by molecular exclusion.113 Each of a series of
Combining Equations 1–21, 1–22, 8–36, and 8–45
proteins was dissolved in 6 M guanidinium chloride and 0.1 M
2-mercaptoethanol and submitted to chromatography on a
(
– ln K D,i = ln K av,R + b T P c + dn aa,i ) (8–46) column (1.5 cm ¥ 90 cm) of beaded 6% agarose. The elution
volume of each polypeptide was used to calculate its distribution
coefficient KD. The negative natural logarithm of KD (–ln KD) is plot-
where Kav,R is the distribution coefficient of the small ref- ted as a function of the number of amino acids in the respective
erence solute R used to determine the apparent internal sequence, naa. The polypeptides chosen were those composing
volume. This equation predicts that a plot of ln KD,i equine cytochrome c (naa = 104), bovine hemoglobin (naa = 145),
bovine b-lactoglobulin (naa = 162), immunoglobulin G light chain
against naa,i should be linear, and it is (Figure 8–12).113 It from O. cuniculus (naa = 220), the mitochondrial isoform of malate
has been proposed that the length of a polypeptide, the dehydrogenase from liver of Rattus norvegicus (naa = 314), the
sequence of which is unavailable, could be estimated A isoform of fructose-bisphosphate aldolase from O. cuniculus (naa
from its distribution coefficient by use of such a standard = 363), ovalbumin from G. gallus (naa = 385), immunoglobulin G
curve. heavy chain from O. cuniculus (naa = 450), a-amylase A from
Aspergillus oryzae (naa = 478), bovine serum albumin (naa = 583),
The regular behavior of single-stranded nucleic and human transferrin (naa = 679).
acids upon electrophoresis is crucial to the strategies for
determining their sequences. The relative elec-
trophoretic mobilities of the components in the ladder of in the ladder representing single-stranded ribonucleic
single-stranded RNA displayed in Figure 3–13 can be acids of lengths 24–27 had all overlapped, producing this
measured from the photograph. Each relative mobility compression. It is usually assumed that a compression
can in turn be related to the relative mobility of one of the results from the ability of the 3¢ end of the single-
components chosen as a standard, for example, the stranded nucleic acid to double back upon itself and
mobility of the one containing 30 nucleotides.114 If form a double-stranded hairpin as soon as the length of
Equations 8–34, 8–43, and 8–45 are combined, and if it is the nucleic acid becomes greater than a critical value in
remembered that u∞i for all unfolded single-stranded the expanding series. As the series approaches the dis-
nucleic acids is the same, then continuity, it behaves regularly, because no hairpin is
imminent. At the discontinuity and beyond it, the hair-
( )
ui pin is present in each component, but it is eventually
– ln
u 30
(
= b T a d n b,i – 30 ) (8–47) found far enough in the interior for the series to resume
its linear behavior with the same slope it had previously
but with a displacement. The displacement indicates
This predicts that a plot of ln (ui/u30) against nb,i should that the polymer is behaving as if it were smaller than it
be linear, and it is (Figure 8–13) with the exception of the actually is, presumably because the surface area of the
compression at the discontinuity in the figure. double-helical hairpin in its interior is smaller than the
It could be ascertained, because sequencing was surface area of the same number of nucleotides in a
being performed,114 that the bands for the components single-stranded state.
Sieving 429
( )
30
u ªSTD
ln R f ,i = ln – K r,STD Ta + K r,i T a (8–49)
u ªi
0 20 40 60 80
ln R f ,i = ln
( )
u ªSTD
u ªi
(
– K r,STD Ta + b T a c + dn aa,i )
(8–50)
nb
Figure 8–13: Sieving of single-stranded ribonucleic acid on elec- Because u∞i should be the same for all complexes
trophoresis in a gel of polyacrylamide. The distance between the between well-behaved polypeptides and dodecyl sul-
origin and the final position of each band on the gel in Figure 3–13 fate90 and u∞STD, Kr,STD, and Ta are all constant, this equa-
was measured. These distances were each divided by the distance tion predicts that a plot of –ln Rf against naa should be
for the band corresponding to the ribonucleic acid 30 bases in
length to obtain mobilities relative to this internal standard (u/u30). linear. When the natural logarithms of the relative mobil-
The negative natural logarithms of these mobilities are plotted as a ities measured by Weber and Osborn93 are plotted as a
function of the lengths in bases, nb, of each single-stranded ribonu- function of the now known lengths of these polypep-
cleotide. tides, naa, they conform to this expectation (Figure
8–14).93*
At the present time, the method almost universally
Unfolded polypeptides and unfolded single-
used to estimate the length of a polypeptide, the
stranded nucleic acids are examples of well-defined
sequence of which is not yet known, is to determine the
extended polymers. Complexes between dodecyl sulfate
mobility of its complex with dodecyl sulfate upon elec-
and polypeptides, because they are not chemically
trophoresis on polyacrylamide gels. The mobility of the
defined covalent polymers, are not so well understood.
unknown is compared to the mobilities of complexes
Nevertheless, both the behavior of polypeptides dis-
between dodecyl sulfate and standard polypeptides of
solved in solutions of guanidinium chloride (Figure 8–12)
known length, usually by plotting the data as in Figure
and the behavior of single-stranded nucleic acids (Figure
8–14. It should be realized, however, that the widespread
8–13) when they are respectively submitted to sieving
reliance on this method is based on the assumption that
suggest that the extended, unfolded complexes that form
the polypeptide of interest binds the same amount of
between dodecyl sulfate and polypeptides, which resem-
dodecyl sulfate (amino acid)–1 as the standards used. A
ble the former in their unfolded state and the latter in
comparison of Figure 8–13, which describes the behavior
both their distribution of negative charge and unfolded
of a series of polymers in which the uniformity of the
state, should display electrophoretic mobilities corre-
charge distribution is covalently dictated, with Figure
lated with the length of the polypeptides. In fact, it was
8–14, which describes the behavior of a series of poly-
noted by Shapiro, Viñuela, and Maizel115 that this is the
mers in which the uniformity of charge distribution
case. The electrophoretic mobility of the complex
depends only on a fortuitous consistency in its composi-
between dodecyl sulfate and polypeptide i is generally
tion producing a fortuitous consistency in its ability to
reported as a relative mobility:
bind a small electrolyte, emphasizes the drawbacks of
this assumption.
ui
R f ,i = (8–48)
u STD
* When complexes between dodecyl sulfate and polypeptides are
submitted to electrophoresis on polyacrylamide gels in which the
where uSTD is the mobility of a standard, either a small complexes are stacked by moving discontinuities,96 the relative
dye that can be readily followed visually or one of the mobilities of the standards do not fall on a line when they are plot-
ted as in Figure 8–14. Nevertheless, their mobilities increase
obvious boundaries on a discontinuous gel. The advan-
monotonically with their length, and the length of an unknown
tage of the former point of reference is that because the polypeptide can be estimated by interpolation.
430 Counting Polypeptides
Problem 8–11: Glycogen phosphorylase from submitted to electrophoresis in the presence of dodecyl
Oryctolagus cuniculus was dissolved in a solution of sulfate, the pattern observed is a dissection of the protein
sodium dodecyl sulfate sufficient to saturate the protein into the different polypeptides of which it is composed.
and submitted to electrophoresis on a polyacrylamide Usually a protein is composed of only one polypeptide,
gel cast in a solution of sodium dodecyl sulfate. The rela- and that one polypeptide is usually present in the native
tive mobilities of the glycogen phosphorylase and several protein in several copies. There are, however, many pro-
standard proteins were measured.93 teins that contain two or more different polypeptides,
and a comprehensive description of the quaternary
protein length of polypeptide mobility relative structure of such a protein requires that each of its con-
(amino acids) to marker dye stituent polypeptides be recognized as a unique compo-
myosin 1938 0.10 nent of the overall complex.
b-galactosidase 1023 0.16 A catalogue of the polypeptides from which a pro-
serum albumin 583 0.33 tein is formed is reliable only if the shortcomings of
catalase 506 0.37 electrophoresis in the presence of dodecyl sulfate have
glutamate dehydrogenase 501 0.43
been recognized and eliminated. First, it has already
fumarate hydratase 466 0.47
fructose-bisphosphate aldolase 363 0.56 been noted that, on discontinuous electrophoresis,
glycogen phosphorylase 0.23 components of high mobility often fail to escape the
descending boundary. Complexes between dodecyl sul-
fate and short polypeptides are often trapped in this
Estimate the length of the polypeptide composing glyco- way and are unresolved. Second, it is also the case that,
gen phosphorylase. for reasons not well understood, all complexes between
dodecyl sulfate and polypeptides less than 100 amino
Problem 8–12: Estimate the length of the polypeptide acids in length seem to have the same electrophoretic
that composes porcine pepsin A from the following rela- mobility,118 regardless of the concentration of polya-
tive mobilities of complexes between the polypeptides crylamide. This lower limit below which resolution fails
and sodium dodecyl sulfate on electrophoresis on poly- can be lowered to about 25 amino acids in length by
acrylamide gels cast in a solution of dodecyl sulfate.93 adding 8 M urea to the polyacrylamide gel.97,119 Third,
because the cleavage of one peptide bond out of the
polypeptide naa distance hundreds present in an intact polypeptide always pro-
migrated duces two new polypeptides that will be separated from
(cm) each other and from their parent by electrophoresis in
serum albumin 583 1.78 the presence of dodecyl sulfate, any degradation of the
immunoglobulin G heavy chain 450 2.92 native protein by endopeptidases during or before its
D-amino acid oxidase 347 4.80 purification can artifactually multiply the apparent
glyceraldehyde-3-phosphate dehydrogenase 332 4.80 number of polypeptides without significantly altering
aspartate carbamoyltransferase, catalytic 310 5.22 the native protein or its own electrophoretic mobility.
polypeptide
Fourth, endopeptidases often unfold more slowly than
carboxypeptidase A 309 5.48
carbonate dehydratase I 260 6.12
other proteins upon exposure to dodecyl sulfate, and
pepsin A 5.06 they cleave their unfolded neighbors before they in turn
succumb. If the purified protein is contaminated with
even minute amounts of the endopeptidases that are
always present in a homogenate, they can degrade the
Cataloguing Polypeptides polypeptides during the preparation of the sample.
Because these cleavages of the unfolded polypeptides
Rather than the intact oligomeric complexes of subunits are produced at random, as opposed to the unique
formed from properly folded polypeptides that are sepa- cleavages usually produced during the degradation of a
rated during the electrophoresis of native proteins native protein by endopeptidases, they cause polypep-
(Figure 1–19), the components separated when a mixture tides in the sample to disappear into hundreds of frag-
of different proteins is submitted to electrophoresis in ments smeared over the field, each present in very low
the presence of dodecyl sulfate represent individual, yield. Such an apparent disappearance of a polypeptide
unfolded, unassociated polypeptides. Consequently, or polypeptides can also occur during the purification
such an electrophoretic separation is a catalogue of the of the protein rather than during preparation of a
polypeptides present in a sample93 rather than a cata- sample for electrophoresis. For example, it was once
logue of the proteins. A graphic example of such a cata- thought that the stoichiometry of the subunits of nico-
logue can be seen in Figure 1–22. tinic acetylcholine receptor from the electric eel was
When a purified protein the homogeneity of which simpler than that from the electric ray until it was
has been verified by electrophoresis in its native state is demonstrated that the missing polypeptides appeared
432 Counting Polypeptides
when indigenous endopeptidases, unavoidably present tryptic peptides should be present in the digest.* Each
during the purification, were intentionally inacti- component on the map should represent a different
vated.120 Yet the acetylcholine receptor originally puri- tryptic peptide. Ideally, the resolution of the map should
fied, even though it had been cut up at random, was be high enough that every peptide in the digest appears
still biologically active. Finally, if only a fraction of the as a separate, distinguishable component.
disulfides have been reduced, cross-linked, unreduced The initial triumph of analytical peptide mapping
and un-cross-linked, reduced forms of the same was in the examination of a mutant hemoglobin.124 It had
polypeptide will appear as separate components. All of been proposed that the difference between normal
these and other artifacts must be recognized and elimi- hemoglobin, referred to as hemoglobin A, and hemoglo-
nated121 before the pattern observed upon elec- bin S, a hemoglobin producing pathological distortions
trophoresis in the presence of dodecyl sulfate gives a in erythrocytes, was due to a small difference in the
reliable assessment of the different polypeptides pres- amino acid sequence of the two proteins.125 It was then
ent in a protein. shown that one and only one of the tryptic peptides on
Each of the various components separated by elec- the respective peptide maps of the two proteins dis-
trophoresis in the presence of dodecyl sulfate may or played an altered mobility (Figure 8–15).126 It was con-
may not represent a single polypeptide with a unique cluded that all of the peptides the mobilities of which
sequence of amino acids. The majority of the time they were the same between the two maps had identical
do, but it sometimes happens that one of the compo- sequences and the same relative locations in the two
nents represents two different polypeptides the lengths intact polypeptides, but that the one peptide the mobil-
of which are so close to each other that they cannot be ity of which was different had a sequence that differed
resolved. For example, the two subunits with unrelated between the two proteins by at least one amino acid.
amino acid sequences and unrelated functions compos- Because it is unlikely that the only two or three changes
ing the multienzyme complex from Salmonella in the sequence of a polypeptide would occur in the same
typhimurium responsible for anthranilate synthase, glu- tryptic peptide, this result alone was substantial evidence
tamine amidotransferase, and anthranilate phosphori- that the two proteins differed from each other at only one
bosyltransferase happen to be 530 and 520 aa in length location in their respective sequences. This was soon
and are not resolved by electrophoresis in the presence shown to be true by complete amino acid sequencing.
of dodecyl sulfate. They are, however, cleanly resolved A similar strategy was used to evaluate the differ-
by electrophoresis in 8 M urea, a solution in which they ences in the amino acid sequences of the different iso-
are also unfolded but in which their differences in forms of actin.127,128 Each member of the set of the
charge are not swamped by the binding of dodecyl sul- isoforms of actin chosen for the experiment, each of
fate.122 It is also possible that one or more of the compo- which had been isolated from a different species or a dif-
nents on the gel represent fragments of a larger ferent tissue—a total of eight in all—was digested with
component, also seen on the same gel. One way to trypsin, and each peptide map was compared to the pep-
resolve both of these ambiguities is to perform peptide tide map of actin from skeletal muscle of O. cuniculus,
maps. the complete amino acid sequence of which was known.
A peptide map is a characteristic and reproducible In all cases, the majority of the tryptic peptides were dis-
display of the peptides produced when a polypeptide is tributed over the map in the same pattern as the corre-
digested with a specific endopeptidase. The display is sponding peptides on the map from the standard, and
usually produced by chromatographically or elec- this permitted the various maps to be aligned with that of
trophoretically separating the digest in two dimensions the standard. The peptides occupying the same positions
to produce a characteristic pattern or map. Usually, the in a pair of maps were assumed to be identical to each
two dimensions are the respective orthogonal directions other in amino acid sequence. Amino acid analysis was
on a sheet of chromatographic paper or a thin layer of used to verify these identities. Each unique peptide on
cellulose on a backing of plastic. Originally, elec- the maps of the various unknowns was eluted and
trophoresis was performed in the first dimension and sequenced. Each of these amino acid sequences could be
chromatography in the second. Peptide maps are sensi- aligned with one of the tryptic peptides in the sequence
tive methods for assessing the similarity of two polypep- of actin from muscle of O. cuniculus, and in this way the
tides, demonstrating that one polypeptide is a fragment amino acid replacements in the sequences of the other
of another, or revealing that one of the components on a actins could be readily established. This set of experi-
polyacrylamide gel represents two polypeptides that for- ments relied on the fact that, aside from the first six
tuitously have the same electrophoretic mobility. amino acids in each sequence, all of the actins that were
The most reliable maps are obtained from tryptic being compared show about 95% identity when their
digests of a polypeptide because trypsin is the most
specific and dependable of the endopeptidases. * About 5% of the lysines and arginines in a protein are followed by
Polypeptides usually contain about 5 mol % arginine and proline, and trypsin is unable to cleave either a lysylproline or an
7 mol % lysine,123 and for every 100 aa in length, about 11 arginylproline peptide bond.
Cataloguing Polypeptides 433
amino acid sequences are aligned. This level of identity Peptide mapping is sometimes performed by
was what produced the underlying pattern that permit- adsorption chromatography (Figure 3–7) because these
ted the maps to be aligned and, in turn, permitted the separations are more rapidly accomplished,131 but this
ready identification of the peculiar peptides. procedure is not so informative because it involves a sep-
When the polypeptide is a long one, there may be so aration in only one dimension. It is also possible to sep-
many components on a tryptic peptide map that they arate tryptic peptides from the digest of a polypeptide in
begin to overlap. One way to solve this problem is to one dimension by mass spectrometry following matrix-
modify the tyrosine side chains in the protein with assisted laser desorption.132,133 The peptide map that
radioactive iodine by electrophilic aromatic substitution. results has the disadvantage that often only a portion of
Because there are only 3–4 tyrosines for every 100 amino the tryptic peptides is present rather than a complete set
acids in a typical protein,123 only about a third of the tryp- so it functions mainly as a fingerprint of the particular
tic peptides become radioactive, and an autoradiogram* polypeptide. Such a map, however has the advantages
of the map is less cluttered than the entire map itself, but
just as unique to the particular polypeptide. Such a map
of tyrosine-containing chymotryptic peptides was used
to show that the a polypeptides of Na+/K+-exchanging
ATPase from liver and kidney, respectively, both
polypeptides now known to be 1018 amino acids in
length with 24 mol of tyrosine (mol of polypeptide)–1,
were very similar if not identical to each other (Figure
8–16).129 Another way of generating a peptide map from
a long polypeptide is to digest the complex between it Figure 8–16: Peptide maps of tyrosine-containing chymotryptic
and dodecyl sulfate in a solution of dodecyl sulfate with peptides from the a polypeptide of Na+/K+-exchanging ATPase.129
After Na+/K+-exchanging ATPase was purified from rat liver or rat
an endopeptidase.130 Under these conditions, the diges- kidney by immunoadsorption, its a polypeptide was isolated by
tion is severely incomplete because the endopeptidase is electrophoresis on polyacrylamide gels in solutions of dodecyl sul-
rapidly inactivated by the dodecyl sulfate. Nevertheless, fate. Each of the respective purified polypeptides was then chemi-
a reproducible set of large fragments of the polypeptide, cally modified at its tyrosines by electrophilic aromatic
characteristic of both it and the specificity of the substitution with 125I. The radioactive polypeptides were then
digested separately with chymotrypsin, and the digests were sepa-
endopeptidase used, is produced, and when these large rated in two dimensions on thin layers of cellulose. Electrophoresis
fragments are separated by electrophoresis in the pres- was performed from right to left followed by ascending chro-
ence of dodecyl sulfate, the pattern of bands on the gel is matography with butanol/pyridine/water/acetic acid (65:50:40:10
a fingerprint unique to that polypeptide. v/v/v/v) from bottom to top. Peptides containing o-[125I]iodotyro-
sine were identified by placing photographic film over the chro-
matogram. The images are those of the developed films. (A) Map
* An autoradiogram is a photographic image of the map on which from a polypeptide of kidney; (B) map from a polypeptide of liver.
only radioactive components are registered. It displays the distri- Adapted with permission from ref 129. Copyright 1986 American
bution of radioactivity over the field. Chemical Society.
434 Counting Polypeptides
that the resolution is high so one dimension is sufficient, maps of each.136 This was done by separating these
that the molecular mass of each peptide appearing on it polypeptides on polyacrylamide gels in solutions of
is registered, and that the peptides can subsequently be sodium dodecyl sulfate, iodinating their tyrosines,
sequenced (Figure 3–8). If the amino acid sequences of digesting them with trypsin, and producing tryptic pep-
two closely related proteins are known, mass spectro- tide maps. These peptide maps all displayed the same
metric peptide maps can often tell a sample of one from pattern, but the maps from the smaller polypeptides
a sample of the other.134 Another advantage of a mass lacked one or two of the peptides present in those from
spectroscopic peptide map is that it can be performed on the next larger one. When collagen type XIV is isolated
a small amount of protein (1 pmol). from epidermis of Macaca fascicularis, the purified pro-
If two polypeptides yield peptide maps similar tein contains two polypeptides that can be separated by
enough that they are judged to be related, the percentage electrophoresis in the presence of dodecyl sulfate. The
of identity between their two sequences will be high; if complexes between each of these two polypeptides and
they yield peptide maps that cannot be regarded as sim- dodecyl sulfate were digested separately with glutamyl
ilar, they may still have clearly homologous sequences. In endopeptidase. In the range of lengths less than that of
two-dimensional mapping of either all the peptides in a the shorter of the two polypeptides, the two patterns of
digest or just the tyrosine-containing peptides, or in pep- fragments that resulted from these partial digestions
tide mapping by adsorption chromatography or mass were identical to each other, a result demonstrating that
spectrometry, the evaluation of the similarity of two the shorter of the two polypeptides was itself a fragment
polypeptides is based on comparison of the two patterns of the longer.137
in which the peptides are displayed (Figure 8–15). Only if Peptide mapping can be used to determine whether
a significant fraction of the peptides on the two maps an apparently unique component resolved by elec-
have the same relative positions on the field and produce trophoresis in the presence of dodecyl sulfate represents
a pattern that can be recognized is the judgment made only one polypeptide or two or more polypeptides that
that the two polypeptides are similar to each other. This fortuitously have the same electrophoretic mobility. The
implicit criterion of similarity requires that a significant electrophoretic mobility of the complex between dodecyl
fraction of the respective peptides be identical to each sulfate and the polypeptide in question can be used to
other in sequence for the decision to be made that the two estimate its length. The mole percent of lysine and argi-
polypeptides are related. One difference in the sequence nine in the protein can be either determined directly by
of two otherwise identical peptides is usually sufficient to total amino acid analysis or estimated from the fact that
cause them to have different mobilities (Figure 8–15). this number is usually about 11 mol %.123 The mole per-
Because the mean tryptic peptide is eight amino acids in cent of lysine and arginine and the length of the polypep-
length, differences between the sequences of two tide can be used to estimate the number of tryptic
polypeptides at more than 20% of the positions will cause peptides that should be produced if the component
the two maps to be completely different, even though the observed on the gel does represent only one unique
two polypeptides are similar enough to be unambigu- polypeptide. If the number of peptides observed on the
ously judged homologous in amino acid sequence. Each map agrees with this expectation, the component prob-
of the four polypeptides of nicotinic acetylcholine recep- ably represents only one unique polypeptide. If there are
tor, although they are all homologous in sequence to each about twice as many spots as expected, it must represent
other (averaging 40% identity), yields a completely dif- two different polypeptides. If the component does repre-
ferent peptide map.135 The order in which the three ways sent two or more polypeptides, it should be possible to
of detecting homologies among polypeptides fail as the separate them chromatographically or electrophoreti-
percentage of identity becomes smaller is peptide map- cally. Such a separation can usually be performed in 8 M
ping before alignment of amino acid sequences and urea, a solvent that unfolds polypeptides and separates
alignment of amino acid sequences before superposition them one from the other but that does not interfere with
of tertiary structures. either chromatography by ion exchange or electrophore-
Whenever two or more polypeptides appear upon sis. The two or more separated polypeptides should each
electrophoresis of a purified protein in the presence of give unique peptide maps, the sum of which should be
dodecyl sulfate, the possibility that the smaller polypep- the peptide map of the original mixture.
tide or polypeptides are fragments of the largest should When phosphoglycerate dehydrogenase was satu-
be examined. Such a relationship can be established by rated with dodecyl sulfate and submitted to elec-
peptide mapping. The protein ankyrin from human ery- trophoresis, one component was observed, the
throcytes is present in the cell under physiological con- electrophoretic mobility of which was that of a polypep-
ditions as the complete polypeptide and three tide 360 aa in length. The content of lysine plus arginine
progressively smaller fragments of that polypeptide. That in the protein was determined by amino acid analysis to
these four polypeptides represent such a nested set be 9.6 mol %. A tryptic digest of the protein was sepa-
derived by digestion of the largest by cellular endopepti- rated by cation-exchange chromatography, and each of
dases could be demonstrated by producing peptide the pools from this first dimension was submitted to
Cataloguing Polypeptides 435
electrophoresis on paper. This two-dimensional peptide proteins that could be separated from each other by
map displayed 39–40 major peptides. The content of molecular exclusion chromatography. Each of these pro-
tryptophan in the protein was 1.0 mol %, and four of the teins was composed of polypeptides the apparent
peptides gave a positive test for tryptophan. If all of the lengths of which were 550 amino acids.141 Although their
polypeptides in this protein are identical, there should complexes with dodecyl sulfate were almost indistin-
have been 36 tryptic peptides, four of which should have guishable in electrophoretic mobility, the polypeptides
contained tryptophan. The agreement between the in these separated proteins produced completely differ-
observed numbers and the expected numbers led to the ent tryptic peptide maps.142 It was later shown that they
conclusion that phosphoglycerate dehydrogenase was were polypeptides of 519 and 604 aa in length, respec-
composed of identical polypeptides.138 tively, with unrelated amino acid sequences.
Glutamate–tRNA ligase was submitted to elec- The use of tryptic peptide mapping to provide evi-
trophoresis in the presence of dodecyl sulfate, and a dence for the homogeneity of the polypeptides in a pro-
single component was observed, the mobility of which tein relies on the assumption that the trypsin has
was that of a polypeptide 500 aa in length. The protein digested the polypeptide completely. This should be
had a content of lysine plus arginine of 12 mol %; trypto- independently demonstrated. For example, initial tryptic
phan, 1.0 mol %; arginine, 6.3 mol %; and cysteine, digests of the polypeptides composing glucose-6-phos-
1.0 mol %. It could be concluded139 that the component phate isomerase produced only two-thirds to three-
observed upon electrophoresis represented only one fourths as many peptides as had been expected from the
polypeptide because the tryptic peptide map of the pro- assumption that they were all identical. It was found that
tein displayed 55 peptides, 30 of which gave a positive less base was consumed during the digestion than
test for arginine, five of which gave a positive test for should have been, and this suggested that the digestion
tryptophan, and five of which became radioactive after had been incomplete. When the protein was carbamy-
the protein was reduced and carboxymethylated with lated on all of its lysines and then digested with trypsin,
[14C]iodoacetic acid (Equation 3–17). the quantity of base consumed during the digestion and
Upon electrophoresis in dodecyl sulfate, the molyb- the number of peptides observed on the map were those
denum–iron protein that is one of the components of expected theoretically.143 A more sensitive measure of
nitrogenase gave two bands of stained material of very complete tryptic digestion is to compare the total con-
similar and often indistinguishable electrophoretic tent of lysine and arginine in the digest to the amount of
mobility, the apparent lengths of which were 540 amino lysine and arginine released from the peptides in the
acids. When the protein was reduced, carboxymethylated digest when a sample is in turn digested with an appro-
with [14C]iodoacetic acid, and submitted to amino acid priate carboxypeptidase.
analysis, its content of ([14C]carboxymethyl)cysteine was Electron transfer flavoprotein is another protein
1.7 mol %. Eleven of the tryptic peptides on a peptide map composed of two polypeptides the lengths of which are
of the reduced and carboxymethylated protein were very similar. It was originally believed to be a dimer of
radioactive when nine were expected. Instead of passing two identical polypeptides.144 Under certain circum-
this off as the result of incomplete digestion or inaccurate stances, however, two narrowly separated components
values for content of cysteine, the investigators pro- would appear upon electrophoresis in the presence of
ceeded to show that when the protein was dissolved in dodecyl sulfate, and these were different enough to be
urea, to unfold its polypeptides, two polypeptides could separated by preparative electrophoresis. Each of the
be isolated by cation-exchange chromatography on (car- separated polypeptides was cleaved with cyanogen bro-
boxymethyl)cellulose. Both were submitted to reduction, mide, and the fragments produced were in turn satu-
carboxymethylation, and peptide mapping. Four of the rated with dodecyl sulfate and separated in one
radioactive, cysteine-containing peptides from the map dimension by electrophoresis in a solution of 0.1% do-
of the total protein were found on the map of one of the decyl sulfate in 8 M urea (Figure 8–17).119,145 The maps
polypeptides, and the other seven radioactive peptides produced in this way from each separated polypeptide
from the map of the total protein were found on the map were unique, and their sum was equal to the map of the
of the other polypeptide, and no overlaps occurred intact protein. These results established the fact that the
between the two maps. It could be concluded that the quaternary structure of the electron transfer flavoprotein
molybdenum–iron protein had the subunit stoichiometry is ab and explained the observation that there is only
ab.140 1 mol of flavin (1.8 mol of polypeptide)–1 in the protein.
A similar situation arose with methylmalonyl-CoA Many proteins in addition to the molybdenum–iron
carboxytransferase. When this protein was submitted to protein of nitrogenase, methylmalonyl-CoA carboxy-
electrophoresis in the presence of dodecyl sulfate, a transferase, and electron transfer flavoprotein are com-
component was present the apparent length of which posed of two or more different polypeptides. Often
was 550 amino acids, but under some conditions it would there is a functional basis for this arrangement. Many
split into two bands of equal intensity. It was found that multienzyme complexes, rather than being composed of
the native enzyme could be dissociated at pH 9.0 into two a string of enzymatic domains each formed from a dif-
436 Counting Polypeptides
methylmalonyl-SCoA + biotin 1
propionyl-SCoA + carboxybiotin
(8–51)
methylmalonyl-SCoA + pyruvate 1
propionyl-SCoA + oxaloacetate
(8–53)
Origin
(
(
Origin
Chromatography Chromatography
Tryptic peptide maps of carbamylated glucose-6-phosphate isomerase from muscle of O. cuniculus.143 Electrophoresis: right map,
pyridine/acetic acid/water (520:1.4:1000 by volume), pH 6.2; left map, pyridine/acetic acid/water (7:66:1927 by volume), pH 3.5.
Chromatography: descending chromatography with butanol/acetic acid/pyridine/water (15:10:3:12 by volume). Maps were performed on
sheets (46 cm ¥ 57 cm) of chromatographic paper, and peptides were located by ninhydrin.
Cross-Linking 439
teine. Their compositions were C1D2S1E3A6I2L2, chromatography, and each of these was shown to have a
C1D1E1A1I2K1, C1D2T1G2A1V2F1R1, and unique composition of amino acids and to contain 1 mol
C1D1T1E1G1A1V1L1R1. of Cys (mol of peptide)–1.
(A) What are the lengths of the polypeptides compos- (A) What is the length of a polypeptide composing
ing this enzyme? this protein?
(B) How many polypeptides are there in the protein? (B) How many different types of polypeptides com-
pose the protein?
(C) How many different types of polypeptides are
there in the protein? (C) Explain the peptide maps.
(D) What conclusions can you draw from the tryptic
peptide map? Cross-Linking
(E) What conclusions can you draw from the tryptic There arose a disagreement over the number of subunits
peptides containing ([14C]carboxymethyl)cys- contained in fructose-bisphosphate aldolase. The physi-
teine? cal methods for estimating the molar mass of the native
protein and the molar mass of its constituent polypep-
Problem 8–15: A protein has been purified to homo- tides were unable to decide between three and four. It
geneity. It has the following properties.126 should be noted that everyone had an equal chance of
When the protein is reduced with 2-mercap- being correct, so the point is not who turned out to be
toethanol and run on a sodium dodecyl sulfate gel in the right but that the question could not be resolved simply
presence of standards, the following results are obtained: by arguing over the numbers. What was needed instead
was a different kind of experiment, and it was provided.
protein naa mobility When the fructose-bisphosphate aldolase in a
b-lactoglobulin 162 0.70
homogenate from brain of O. cuniculus was submitted to
myoglobin 153 0.73 electrophoresis in its native state, five evenly spaced com-
lysozyme 129 0.81 ponents displaying enzymatic activity were observed
ribonuclease 124 0.82 (Figure 8–18).76 Penhoet, Kochman, Valentine, and
cytochrome c 104 0.87 Rutter76 decided that this must be due to the fact that, in
protein X 0.77 the brain, two isoenzymatic polypeptides designated a
and g are translated from two different messenger RNAs
The amino acid composition of the protein is as follows: continuously and coincidentally. These polypeptides fold
separately to form monomeric subunits that then com-
amino acid mol (100 mol)–1 amino acid mol (100 mol)–1
bine at random with subunits of their own kind or of the
G 6.9 Y 2.0 other isoenzymatic type to produce hybrids of the stoi-
A 12.5 W 0.9 chiometries a4, a3g, a2g2, ag3, and g4, designated A, I, II, III,
S 5.4 C 1.0 and C in Figure 8–18.* The two different subunits, a and
T 5.4 M 1.0
g, differ in the sequences of their polypeptides and hence
P 5.0 D 8.6
V 10.7 E 5.9
in their charge. Each hybrid in turn has a different elec-
I 0.0 R 2.9 trophoretic mobility because each has a different mean
L 12.6 H 6.5 charge number Ω. The hybrids are capable of forming in
F 5.2 K 7.6 the first place because the two different polypeptides are
homologous in their sequences, have superposable terti-
ary structures in their folded state, share a common
The protein was digested with trypsin, and a pep-
ancestor, and have not diverged sufficiently from that
tide map was prepared. It contained 26 well-defined pep-
common ancestor to have lost the ability to combine with
tides.
each other in the same way that they are required to do
The peptide map was stained for various amino
with subunits identical to themselves. If this explanation
acid side chains: five peptides were positive for arginine,
is correct, the number of subunits in any molecule of
13 peptides were positive for histidine, four peptides
aldolase can be determined by simply counting the com-
were positive for methionine, three peptides were posi-
ponents on the electrophoretic separation. There must be
tive for tryptophan, and seven peptides were positive for
four. A similar hybridization was used to verify that chlo-
tyrosine.
ramphenicol O-acetyltransferase is a trimer.155
The protein was carboxymethylated with
[14C]iodoacetic acid and digested with trypsin. Three
tryptic peptides containing radioactive (car- * The proteins with quaternary structures a4 and g4 have been des-
boxymethyl)cysteine were isolated by ion-exchange ignated the A isoform and the C isoform, respectively, of fructose-
bisphosphate aldolase.
440 Counting Polypeptides
There have been a large array of reagents synthe- mercaptan such as dithiothreitol (Equation 3–18). This
sized over the years for cross-linking proteins. They have reaction generates a thiol at the location on the second
been designed to react with a broader array of amino protein that participated in the cross-link. This
acids than just lysine. They have also been designed to unmasked thiol can then be used to isolate a peptide from
form cross-links that can be reversed.160 A reagent that the second protein containing the amino acid that was
illustrates all of the various elements in the design of cross-linked.164 Because the position in the amino acid
such cross-linking reagents is sulfosuccinimidyl sequence of the first protein at which the reagent was
2-(7-azido-4-methylcoumarin-3-acetamido)-ethyl- attached was determined by the site-directed mutation
1,3¢-dithiopropionate:161 and the position in the second protein with which the
electrophile reacted can be identified by isolating a pep-
CH3
H tide containing the modified amino acid, the cross-link-
N O ing reaction can be used to identify adjacent amino acids
S O
S N in the complex that forms between the two proteins.
O The most commonly used cross-linking reagents
N3 O O O
O are bisimidates and bisaldehydes. An imidoester such as
SO3- the ones in 8–2 reacts with a primary amine to form an
amidine:
8–3
The ester of N-hydroxysulfosuccinimide is an elec- H
(
trophile that reacts with lysines. The N-hydroxysulfosuc- O N
H3C H
cinimide is used as a leaving group, rather than
N-hydroxysuccinimide, to increase solubility. The azide
is photoactivated to the nitrene that has a broader spec- ( “ ± H+
trum of reactivity than a more common electrophile. The NH3 1
± H+
NH
H
1
two electrophiles react respectively with two nucle-
ophiles on the protein to cross-link them. The disulfide
can be reductively cleaved to reverse the cross-linking.
The coumarin makes the reagent and hence the products H+ H
HN (
“
of the cross-linking fluorescent. At the other end of the “: H H
O N
scale of complexity is the cross-linking of two proteins by – CH3OH
H3C
the oxidative formation of covalent dimers between a
:
(N N
tyrosine on one of the proteins and a tyrosine on the H H
other.162 H
Cross-linking reagents that contain an activated amidine
disulfide that reacts efficiently with cysteines on the (8–54)
surface of a protein are also widely used. A mixed
disulfide with 2-thiopyridine, an excellent leaving When a molecule of dimethyl suberimidate happens to
group, can be used to attach an electrophile to a cysteine react at one of its ends with a lysine on one subunit and
on the first protein. For example, N-succinimidyl at its other end with a lysine from another subunit, those
3-(2-pyridyldithio)propionate163 two subunits become covalently cross-linked and their
polypeptides migrate together upon electrophoresis in
O
O the presence of dodecyl sulfate with the mobility of a
polypeptide the length of which is equal to the sum of the
N S N
S O lengths of the two polypeptides so joined.
O The intramolecular cross-linking of an oligomeric
protein provides a count of the number of subunits it
8–4 contains. At concentrations of protein below 10 mM, no
reacts by disulfide interchange (Figure 3–20) with the significant intermolecular cross-linking between sepa-
thiol of a cysteine to form a mixed disulfide between that rate molecules of protein occurs when a solution of pro-
cysteine and the N-succinimidyl 3-thiopropionate.The tein is mixed with a cross-linking agent such as dimethyl
reagent can be directed to a particular location on the suberimidate. Instead, what is observed are the products
first protein by inserting a cysteine at a particular posi- that result from intramolecular cross-linking among the
tion in its sequence by site-directed mutation.164 The fixed number of subunits of which the protein is com-
ester of N-hydroxysuccinimide can then react with a posed.159 When the products of such intramolecular
lysine on a second protein, cross-linking it to the first pro- cross-linking are separated on a polyacrylamide gel, a
tein. The cross-link is reversible because the mixed disul- ladder of bands, vaguely reminiscent of the ladders seen
fide can be reduced by disulfide interchange with a with randomly cleaved nucleic acids, is observed (Figure
442 Counting Polypeptides
8–19).165 These ladders, however, end abruptly because the ladder should be shown to be a regular function of its
no more polypeptides can be cross-linked than are pres- number (graph in Figure 8–19).166 This is a reassurance
ent in the complete protein. A count of the rungs in the that one of the members of the set is not missing from the
ladder provides the number of subunits in the protein. In pattern. Second, the cross-linking reaction should be
the case of glycerol kinase, the protein used for the analy- forced to completion so that only the highest oligomer is
ses presented in Figure 8–19, it could be concluded that seen (gel C, inset to Figure 8–19). This result provides the
it is composed of four and only four subunits.165 Peptide reassurance that this oligomer does represent a true limit
maps showed that all four of the polypeptides compos- to the reaction and that the solution contains only one
ing the subunits are identical to each other.165 unique multimer of the subunits rather than a mixture of
There are three reassurances that should be pro- multimers each containing a different number of sub-
vided. First, the relative mobility of each of the bands in units. Third, the possibility that intermolecular cross-
linking is occurring should be ruled out by varying the
concentration of protein. The amounts of the compo-
nents arising from intramolecular cross-linking will not
vary with the concentration of protein, but those that
arise from intermolecular cross-linking will. Random
intermolecular cross-linking also yields a distribution
that gradually declines in its amplitude with band
number rather than displaying an abrupt limit at a cer-
tain unique polymer size. It is this discontinuous behav-
ior that is the logical basis for believing the results.159
It is in fulfilling the requirement that the reaction be
forced to completion that dimethyl suberimidate usually
fails. The reaction of an imidoester with a primary amine in
aqueous solution is a competition between formation of the
desired amidine and hydrolysis of the imidoester to amide
and ester:
H “
O N
( O NH
H H 3C H
H3C
1 HO
1
:
“ “
“:
O
)H
H H
N
– CH3OH
1 H+ 2 “O
1
“: “ H “ amide
O N
H3C H
“O : “
– NH 3 O
)“ H 3C
:
regardless of how much reagent is added. Because of this, This fulfillment can be illustrated with two experi-
it is often the case that not all of the subunits can be ments. In the first experiment, a monodisperse solution
cross-linked among themselves. The logical chemical containing a homogeneous population of the oligomeric
solution to this problem would be to use an electrophile protein L-lactate dehydrogenase, a protein known to be a
at the two ends of the cross-linking reagent that is more tetramer composed of four identical subunits, was mixed
selective for a primary amine relative to a hydroxide with glutaraldehyde.167 The reaction was permitted to pro-
anion, but this has not been explored systematically. ceed a short period of time, and the products were exam-
Rather, it has been inadvertently accomplished. ined by electrophoresis on polyacrylamide gels in the
The most versatile cross-linking agent is glu- presence of dodecyl sulfate. At high enough concentra-
taraldehyde: tions of glutaraldehyde, the only component that was
H observed contained the number of polypeptides, four,
O known to be present in the protein, all covalently con-
O H nected to each other (Figure 8–20).167 No larger products,
8–5 which would have resulted from intermolecular cross-
linking, and no smaller products, which would have
Unfortunately, the chemistry of its reactions with pro- resulted from incomplete intramolecular cross-linking,
teins has never been elucidated. Presumably as an were observed. In the second experiment, the receptor for
aliphatic aldehyde it engages in all of the same chemical epidermal growth factor was cross-linked. This protein
reactions with lysine that occur after aliphatic aldehydes forms an a2 dimer from two a monomers upon the addi-
are produced in collagen by protein-lysine 6-oxidase tion of epidermal growth factor to the solution. Before the
(Figure 3–19). It is part of the lore surrounding glu- epidermal growth factor was added, only the monomer of
taraldehyde that the freshly distilled reagent is far less the receptor was seen on the polyacrylamide gel after the
active; this is meant to suggest that compounds derived protein had been cross-linked with glutaraldehyde and
from glutaraldehyde itself, such as its aldol and dehy- then dissolved in a solution of dodecyl sulfate. When epi-
drated aldol, are important to the cross-linking. The dermal growth factor was added and samples were
dehydrated aldol as well as many of the products result- removed at different times and then cross-linked,
ing from the imine formed upon the reaction of glu- unfolded in a solution of dodecyl sulfate, and submitted
taraldehyde with a lysine are ab unsaturated aldehydes to electrophoresis, the un-cross-linked monomer was
or imines. It is the greater preference of these ab unsatu- seen to be gradually but completely replaced by the cova-
rated aldehydes and imines for reaction with the weak lent cross-linked dimer.170 If any intermolecular cross-
base lysine rather than the strong base hydroxide that linking had been occurring after the glutaraldehyde was
produces a higher yield of cross-linked product when glu- added, covalent dimer should have been observed in the
taraldehyde is used. In situations when the cross-linking absence of epidermal growth factor, but it was not. If
reaction with dimethyl suberimidate cannot be forced to incomplete intramolecular cross-linking were occurring
completion, cross-linking with glutaraldehyde will usu- after the glutaraldehyde had been added, some un-cross-
ally produce the fully cross-linked protein in high yield. linked monomer should have remained after completion
The efficiency of glutaraldehyde has permitted it to be of the dimerization, but it did not.
used to provide a quantitative catalogue of the various There are two possible problems that can affect the
oligomeric complexes present in a heterodisperse solu- outcome of quantitative cross-linking. First, if the protein
tion of a single, pure protein. is cross-linked at a concentration that is significantly lower
Quantitative cross-linking is cross-linking carried than its physiological concentration, a naturally occurring
to an extent sufficient to connect covalently every sub- oligomer may dissociate; or if it is cross-linked at a con-
unit in a macromolecular complex to every other subunit centration significantly higher than its physiological con-
in the same macromolecular complex, either directly or centration, artifactual aggregation may occur. Second, if
indirectly, but not to any subunit in another macromol- the electrophilic cross-linking reagent reacts with nucle-
ecular complex. In order for glutaraldehyde to perform ophilic side chains on the protein that are involved in its
quantitative cross-linking, the formation of intermolecu- function and if its intact function is required for mainte-
lar covalent connections between independent, unasso- nance of its proper oligomerization, the protein could dis-
ciated oligomeric complexes in the solution must be sociate or artifactually associate because of its chemical
negligible because every covalent complex must repre- inactivation before it becomes cross-linked. For example,
sent only the product of intramolecular cross-linking, glutaraldehyde inactivates the ATPase of the chaperone
and every multimeric complex in the solution must be protein GroEL within 2 s of its application, and the ATPase
completely cross-linked within itself so that every one of activity is required to maintain the correct oligomeriza-
its constituent subunits is covalently attached to all of the tion of the protein. Controls were devised to demonstrate
others. Cross-linking with appropriately high concentra- that the oligomeric state of the protein nevertheless did
tions of glutaraldehyde often fulfills these require- not change in the time required for quantitative cross-link-
ments.167–170 ing to be accomplished.166
444 Counting Polypeptides
T D M
Absorbance at 560 nm
0.2
0
0 2 4 6 8
a b
Distance migrated (cm)
Figure 8–20: Cross-linking of L-lactate dehydrogenase.167 Samples
of L-lactate dehydrogenase were submitted to cross-linking for
2 min with 0.4 M glutaraldehyde at 20 ∞C and the reaction was ter-
minated by adding sodium dodecyl sulfate to 20 mM. The samples
TT D M
were then submitted to electrophoresis on polyacrylamide gels, the
gels were stained for protein, and the absorbance at 560 nm as a Mobility
function of the distance migrated is presented. (A) Cross-linking in
a solution of native tetramers (20 nM). Equivalent samples were Figure 8–21: Cross-linking of the oligomers in a heterodisperse
either cross-linked (solid line) or not cross-linked (dashed line). In solution of Na+/K+-exchanging ATPase.169 A suspension of biologi-
the former, only fully cross-linked tetramers (T) are observed; in cal membranes containing only Na+/K+-exchanging ATPase was
the latter, only un-cross-linked monomers (M). (B) Cross-linking of dissolved in a 5 mM solution of the nonionic detergent octa(ethyl-
reassembling L-lactate dehydrogenase after it had been dissociated ene glycol) dodecyl ether. After the solution was clarified by cen-
into subunits. L-Lactate dehydrogenase was dissociated into sub- trifugation, glutaraldehyde was added to 8 mM. After 45 min at
units at pH 2.3 and then diluted into a solution at pH 7.6 (final con- 22 ∞C, sodium dodecyl sulfate was added, and the sample was sub-
centration of subunits 340 nM). After 210 s (dashed line) and 1 h mitted to electrophoresis on a gel cast from 3.6% acrylamide and
(solid line), samples were removed and complexes (M, monomer; 0.1% dodecyl sulfate. After the gel was stained, it was scanned for
D, dimer; T, tetramer) were catalogued by cross-linking them for absorbance at 550 nm as a function of the distance migrated. By
2 min with 0.4 M glutaraldehyde. Reprinted with permission from their mobilities the components were identified as covalently
ref 167. Copyright 1979 Nature. cross-linked ab monomer (M), (ab)2 dimer (D), (ab)3 trimer (T),
and (ab)4 tetramer (T) of the enzyme, which in its native form is a
If all of these requirements have been satisfied, monomer (ab) of one a subunit and one b subunit in noncovalent
quantitative cross-linking can be used to define the qua- association.173 Because the enzyme is a complex of an a subunit
and a b subunit, almost no un-cross-linked a polypeptide (a) or
ternary structure of a protein. For example, it has been b polypeptide (b) remained following the cross-linking. Reprinted
used to show that 1-aminocyclopropane-1-carboxylate with permission from ref 169. Copyright 1982 American Chemical
synthase is a dimer of two identical subunits.171 The Society.
Cross-Linking 445
used to follow the kinetics of the assembly of a protein. tramer, (ab)2, was also produced during the reaction, but
When L-lactate dehydrogenase is transferred to a solu- little or no covalent trimer, covalent aa homodimer, or
tion at pH 2.3, it dissociates into monomers consisting of covalent bb homodimer was seen (Figure 8–22).
single subunits, and when it is transferred back to pH 7.6, This result demonstrates that the formation of a
it reassociates over an hour to its normal tetrameric state. cross-link between one folded a polypeptide and one
The concentration of monomer, dimer, and tetramer can folded b polypeptide in native succinate–CoA ligase
be ascertained at any given minute by removing a sample (ADP-forming) is a far more likely event than the forma-
and cross-linking it with glutaraldehyde (Figure tion of a cross-link between either two a polypeptides or
8–20B).167 An extensive study of the kinetics of this step- two b polypeptides, and it suggests that a and b polypep-
wise association could be performed in this way.168 The tides are more intimately associated with each other than
rate and mechanism of the association of the subunits of are a polypeptides with a polypeptides or b polypeptides
the catalytic trimer of aspartate carbamoyltransferase with b polypeptides. This conclusion has been validated
were also monitored by quantitative cross-linking,174 as by the crystallographic molecular model.177 Therefore,
well as the conversion of the (a7)2 tetradecamer of GroEL the proper designation for the arrangement of the sub-
into the (a7)2b7 and b7(a7)2b7 heterooligomers of GroEL units in the native, un-cross-linked protein is (ab)2.
and GroES caused by MgATP.172 The yield of heterotrimer on the gel in Figure 8–22
Cross-linking has also been used to determine the is significantly lower than the yield of heterotetramer. A
stoichiometric ratio between two dissimilar subunits. similar disparity among the products of a partial cross-
For example, the fact that the a and b polypeptides of linking reaction was seen when several tetrameric pro-
succinate–CoA ligase (ADP-forming) from E. coli disap- teins, each composed of four identical subunits, were
pear in concert almost entirely during the formation of a examined.178 L-Lactate dehydrogenase, pyruvate kinase,
covalent ab heterodimer and covalent (ab)2 heterote- fructose-bisphosphate aldolase, fumarate hydratase,
tramer (Figure 8–22)175 means that they must be present and catalase were each partially cross-linked with vari-
in equimolar ratio in the protein. A similar result was ous dimethyl bisimidates. In each case, the yield of the
observed with the two a and b polypeptides composing covalent trimer was 2–6-fold lower than that of either the
Na+/K+-exchanging ATPase.176 In either of these cases, if covalent dimer or the covalent tetramer. This result is
the polypeptides had not been present in equimolar reminiscent of the one seen with succinate–CoA ligase
ratio, either one polypeptide would have disappeared (ADP-forming), and its explanation is the same. Within
from its position on the polyacrylamide gel more rapidly
than the other during the formation of a covalent ab het-
erodimer or significant amounts of covalent products of
the form a2b or ab2 would have appeared in addition to
the covalent ab heterodimer.
The patterns seen in the ladders from partially
cross-linked samples of a protein (Figure 8–19) can pro-
vide information about the arrangement of the subunits
in the oligomer. Succinate–CoA ligase (ADP-forming)
from E. coli is a protein composed of two different sub-
units, 388 and 288 amino acids in length, each present in
two copies. These subunits could have been arranged in
at least two ways to produce stoichiometries of either
(ab)2 or a2b2. The former designation implies that the
association between an a subunit and a b subunit is more
intimate than that between either two a subunits or two
b subunits; the latter implies the reverse. Examples illus-
trating this distinction would be either a dimer of two
identical subunits, each of which was posttranslationally Figure 8–22: Cross-linking of succinate–CoA ligase (ADP-forming)
cleaved to produce a pair of subunits, each an entwined from E. coli.175 A solution of succinate–CoA ligase (1.0 mg mL–1) was
a polypeptide and b polypeptide, or a heterotetramer cross-linked with dimethyl suberimidate (2.0 mg mL–1) for 30 min.
The resulting covalent complexes were dissolved in a solution of
assembled from a fully folded a2 dimer and a fully folded sodium dodecyl sulfate and submitted to electrophoresis. (A) Un-
b2 dimer, respectively. Succinate–CoA ligase (ADP-form- cross-linked control showing the a and b polypeptides from which
ing) was reacted with enough dimethyl suberimidate to the enzyme is composed. (B) Cross-linked product. The compo-
cross-link completely the a and b polypeptides among nents observed were assigned as the covalent ab heterodimer, the
themselves but not enough to produce a high yield of the covalent heterotrimer, and the covalent (ab)2 heterotetramer on
the basis of their apparent lengths (numbers to the left of each gel)
completely cross-linked product.175 The covalent ab het- determined by their mobilities on electrophoresis relative to a set
erodimer was by far the major product of this incomplete of polypeptides of known length. Reprinted with permission from
reaction. A reasonable yield of the covalent heterote- ref 175. Copyright 1975 Journal of Biological Chemistry.
446 Counting Polypeptides
each of these molecules of protein there must be associ- of the gel, and the lower arrow marks the stained band
ations between some pairs of a subunits that are more corresponding to the single polypeptide observed when
intimate than the associations between other pairs of treatment with dimethyl suberimidate was omitted.
a subunits. The results suggest that the proper designa-
(A) By drawing a graph for each of the gels, A, B, and
tion for the arrangement of the subunits in each of these
C, show that none of the products of the reaction
proteins is (a2)2. To understand why this is so, the rules
between the respective protein and dimethyl
governing the evolution of oligomeric proteins must be
suberimidate were overlooked.
understood. These rules are based on rotational axes of
symmetry. (B) What is the stoichiometry of the subunits of the
protein run on gel A?
Suggested Reading
(C) What is the stoichiometry of the subunits of the
Davies, G.E., & Stark, G.R. (1970) Use of dimethyl suberimidate, a protein run on gel B?
cross-linking agent, in studying the subunit structure of
oligomeric proteins, Proc. Natl. Acad. Sci. U.S.A. 66, 651–656. (D) What is the stoichiometry of the subunits of the
Day, P.J., Murray, I.A., & Shaw, W.V. (1995) Properties of hybrid
protein run on gel C?
active sites in oligomeric proteins: kinetic and ligand binding Assume in making all of these assignments that proteins
studies with chloramphenicol acetyltransferase trimers,
A, B, and C are homooligomers.
Biochemistry 34, 6416–6422.
(E) In making these assignments, you have ignored
Problem 8–16: Assume that each pure hybrid of fruc- the minor bands of lower mobility seen in gels A
tose-bisphosphate aldolase shown in Figure 8–1876 was and C. If you are correct in ignoring these bands,
dissociated completely and reassociated at random what should have happened to these bands if the
during the experiment described in the figure. Refer to proteins had been more dilute in concentration
the two types of subunits as a and g. when they were treated with the same concentra-
tions of dimethyl suberimidate? Why?
(A) What is the respective subunit structure of each of
the five purified hybrids?
References
(B) Assume the subunits reassemble at random by
1. Tanford, C. (1961) Physical Chemistry of
the binomial formula a4 + 4a3c + 6a2c2 + 4ac3 + c4,
Macromolecules, pp 180–316, Wiley, New York.
where a is the fraction of the dissociated subunits
2. Eisenberg, H. (1976) Biological Macromolecules and
that are a and c is the fraction that are g and pre- Polyelectrolytes in Solution, Vol. 6, pp 100–153,
dict the ratio of components expected from the Clarendon Press, Oxford, England.
reassociation of each of the five dissociated 3. Moon, Y.U., Anderson, C.O., Blanch, H.W., & Prausnitz,
hybrids. J.M. (2000) Fluid Phase Equilib. 168, 229–239.
4. McMillan, W.G., & Mayer, J.E. (1945) J. Chem. Phys. 13,
Problem 8–17: The following pictures are of polyacryl- 276–305.
amide gels cast in 0.1% sodium dodecyl sulfate on which 5. Edelhoch, H. (1967) Biochemistry 6, 1948–1954.
proteins unfolded with dodecyl sulfate were submitted to 6. Gornall, A.G., Bardawill, C.J., & David, M.M. (1948) J.
electrophoresis.159 Biol. Chem. 177, 751–766.
7. Moczydlowski, E.G., & Fortes, P.A. (1981) J. Biol. Chem.
256, 2346–2356.
8. Scatchard, G., Batchelder, A.C., & Brown, A. (1946) J.
Am. Chem. Soc. 68, 2320–2331.
9. Van Holde, K.E. (1985) Physical Biochemistry, 2nd ed.,
Prentice-Hall, Englewood Cliffs, NJ.
10. Eisenberg, H. (1981) Q. Rev. Biophys. 14, 141–172.
11. Eisenberg, H. (1994) Biophys. Chem. 53, 57–68.
12. Schachman, H.K., & Edelstein, S.J. (1966) Biochemistry
5, 2681–2705.
13. Cohen, G., & Eisenberg, H. (1968) Biopolymers 6,
1077–1100.
14. Reisler, E., & Eisenberg, H. (1969) Biochemistry 8,
4572–4578.
15. Becerra, S.P., Kumar, A., Lewis, M.S., Widen, S.G.,
In all three experiments, the native proteins (at concen- Abbotts, J., Karawya, E.M., Hughes, S.H., Shiloach, J.,
trations of 0.03–0.2 mM in subunit) were first treated Wilson, S.H., & Lewis, M.S. (1991) Biochemistry 30,
with dimethyl suberimidate (at concentrations of 11707–11719.
3–7 mM) before they were unfolded with the dodecyl sul- 16. Eisenberg, H. (2000) Biophys. Chem. 88, 1–9.
fate. In each experiment, the upper arrow marks the top 17. Kharakoz, D.P. (1997) Biochemistry 36, 10276–10285.
References 447
18. Cohn, E.J., & Edsall, J.T. (1943) Proteins, Amino Acids Prive, G.G., Faull, K.F., & Kaback, H.R. (1999) Proc. Natl.
and Peptides as Ions and Dipolar Ions, pp 374–381, Acad. Sci. U.S.A. 96, 10695–10698.
Reinhold Publishing Corporation, New York. 50. Dayhoff, M.O. (1978) Atlas of Protein Sequence and
19. Brown, J.R. (1976) Fed. Proc. 35, 2141–2144. Structure, Vol. 5, Suppl. 3, National Biomedical
20. Reynolds, C.M., & Poole, L.B. (2001) Biochemistry 40, Research Foundation, Silver Spring, MD.
3912–3919. 51. Edsall, J.T. (1953) in The Proteins, Chemistry, Biological
21. Velten, M., Villoutreix, B.O., & Ladjimi, M.M. (2000) Activity, and Methods (Neurath, H., & Bailey, K. E., Eds.)
Biochemistry 39, 307–315. Vol. I, pp 549–726, Academic Press, New York.
22. Golbik, R., Naumann, M., Otto, A., Muller, E., Behlke, 52. Dayhoff, M.O. (1972) Atlas of Protein Sequence and
J., Reuter, R., Hubner, G., & Kriegel, T.M. (2001) Structure, Vol. 5, National Biomedical Research
Biochemistry 40, 1083–1090. Foundation, Silver Spring, MD.
23. LaRonde-LeBlanc, N., & Wolberger, C. (2000) 53. Jaenicke, R., & Knof, S. (1968) Eur. J. Biochem. 4,
Biochemistry 39, 11593–11601. 157–163.
24. Ebel, C., Eisenberg, H., & Ghirlando, R. (2000) Biophys. 54. Titani, K., Koide, A., Ericsson, L.H., Kumar, S.,
J. 78, 385–393. Hermann, J., Wade, R.D., Walsh, K.A., Neurath, H., &
25. Beernink, H.T., & Morrical, S.W. (1998) Biochemistry Fischer, E.H. (1978) Biochemistry 17, 5680–5693.
37, 5673–5681. 55. Seery, V.L., Fischer, E.H., & Teller, D.C. (1967)
26. Xia, J., Sinclair, J.F., Baldwin, T.O., & Lindahl, P.A. Biochemistry 6, 3315–3327.
(1996) Biochemistry 35, 1965–1971. 56. Dandliker, W.B., & Fox, J.B. (1955) J. Biol. Chem. 214,
27. Zondlo, J., Fisher, K.E., Lin, Z., Ducote, K.R., & 275–283.
Eisenstein, E. (1995) Biochemistry 34, 10334–10339. 57. Schroeder, W.A., Shelton, J.R., Shelton, J.B.,
28. Robertson, J.G. (1995) Biochemistry 34, 7533–7541. Robberson, B., & Apell, G. (1969) Arch. Biochem.
29. Mechanic, L.E., Hall, M.C., & Matson, S.W. (1999) J. Biophys. 131, 653–655.
Biol. Chem. 274, 12488–12498. 58. Samejima, T., & Yang, J.T. (1963) J. Biol. Chem. 238,
30. Ratcliff, G.C., & Erie, D.A. (2001) J. Am. Chem. Soc. 123, 3256–3261.
5632–5635. 59. Murthy, M.R., Reid, T.J., Sicignano, A., Tanaka, N., &
31. Titus, G.P., Mueller, H.A., Burgner, J., Rodriguez De Rossmann, M.G. (1981) J. Mol. Biol. 152, 465–499.
Cordoba, S., Penalva, M.A., & Timm, D.E. (2000) Nat. 60. Stellwagen, E., & Schachman, H.K. (1962) Biochemistry
Struct. Biol. 7, 542–546. 1, 1056–1068.
32. Lerman, J.C., Robblee, J., Fairman, R., & Hughson, F.M. 61. Kawahara, K., & Tanford, C. (1966) Biochemistry 5,
(2000) Biochemistry 39, 8470–8479. 1578–1584.
33. Tennyson, R.B., & Lindsley, J.E. (1997) Biochemistry 36, 62. Richter, G.W., & Walker, G.F. (1967) Biochemistry 6,
6107–6114. 2871–2880.
34. Musatov, A., & Robinson, N.C. (1994) Biochemistry 33, 63. Boyd, D., Vecoli, C., Belcher, D.M., Jain, S.K., &
13005–13012. Drysdale, J.W. (1985) J. Biol. Chem. 260, 11755–11761.
35. Debye, P. (1947) J. Phys. Chem. 51, 18–32. 64. Gerhart, J.C., & Schachman, H.K. (1965) Biochemistry
36. Einstein, A. (1910) Ann. Phys. (Berlin) 33, 1275– 4, 1054–1062.
1298. 65. Hoover, T.A., Roof, W.D., Foltermann, K.F.,
37. Casassa, E.F., & Eisenberg, H. (1964) Adv. Protein O’Donovan, G.A., Bencini, D.A., & Wild, J.R. (1983)
Chem. 19, 287–395. Proc. Natl. Acad. Sci. U.S.A. 80, 2462–2466.
38. Zimm, B.H.D. (1948) J. Chem. Phys. 16, 1099–1116. 66. Hammerstedt, R.H., Meohler, H., Decker, K.A., &
39. Dandliker, W.B. (1954) J. Am. Chem. Soc. 76, 6036–6039. Wood, W.A. (1971) J. Biol. Chem. 246, 2069–2074.
40. Bryan, J.K. (1977) Anal. Biochem. 78, 513–519. 67. Falcoz-Kelly, F., Janin, J., Saari, J.C., Vaeron, M., Truffa-
41. Edsall, J.T., Edelhoch, H., Lontie, R., & Morrison, P.R. Bachi, P., & Cohen, G.N. (1972) Eur. J. Biochem. 28,
(1950) J. Am. Chem. Soc. 72, 4641–4656. 507–519.
42. Fenn, J.B., Mann, M., Meng, C.K., Wong, S.F., & 68. Katinka, M., Cossart, P., Sibilli, L., Saint-Girons, I.,
Whitehouse, C.M. (1989) Science 246, 64–71. Chalvignac, M.A., Le Bras, G., Cohen, G.N., & Yaniv, M.
43. Kathmann, E.C., Naylor, S., & Lipsky, J.J. (2000) (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 5730–5733.
Biochemistry 39, 11170–11176. 69. Cassman, M., & Schachman, H.K. (1971) Biochemistry
44. Ronk, M., Shively, J.E., Shute, E.A., & Blake, R.C. (1991) 10, 1015–1024.
Biochemistry 30, 9435–9442. 70. Moon, K., & Smith, E.L. (1973) J. Biol. Chem. 248,
45. Ullah, J.H., Walsh, T.R., Taylor, I.A., Emery, D.C., 3082–3088.
Verma, C.S., Gamblin, S.J., & Spencer, J. (1998) J. Mol. 71. Eisenberg, H., & Tomkins, G.M. (1968) J. Mol. Biol. 31,
Biol. 284, 125–136. 37–49.
46. Musatov, A., & Robinson, N.C. (1994) Biochemistry 33, 72. Johnson, M.S., & Overington, J.P. (1993) J. Mol. Biol.
10561–10567. 233, 716–738.
47. Yang, G., Sandalova, T., Lohman, K., Lindqvist, Y., & 73. Power, M.D., Marx, P.A., Bryant, M.L., Gardner, M.B.,
Rendina, A.R. (1997) Biochemistry 36, 4751–4760. Barr, P.J., & Luciw, P.A. (1986) Science 231, 1567–1572.
48. Fitzgerald, M.C., Chernushevich, I., Standing, K.G., 74. Marth, J.D., Peet, R., Krebs, E.G., & Perlmutter, R.M.
Whitman, C.P., & Kent, S.B. (1996) Proc. Natl. Acad. Sci. (1985) Cell 43, 393–404.
U.S.A. 93, 6851–6856. 75. Rosenbusch, J.P., & Weber, K. (1971) J. Biol. Chem. 246,
49. Whitelegge, J.P., le Coutre, J., Lee, J.C., Engel, C.K., 1644–1657.
448 Counting Polypeptides
76. Penhoet, E., Kochman, M., Valentine, R., & Rutter, W.J. 111. Hedrick, J.L., & Smith, A.J. (1968) Arch. Biochem.
(1967) Biochemistry 6, 2940–2949. Biophys. 126, 155–164.
77. Bull, H.B., & Currie, B.T. (1946) J. Am. Chem. Soc. 68, 112. Tanford, C. (1968) Adv. Protein Chem. 23, 121–282.
742–745. 113. Fish, W.W., Mann, K.G., & Tanford, C. (1969) J. Biol.
78. Tanford, C., Swanson, S.A., & Shore, W.S. (1955) J. Am. Chem. 244, 4989–4994.
Chem. Soc. 77, 6414–6421. 114. Kumagai, I., Pieler, T., Subramanian, A.R., & Erdmann,
79. Guidotti, G. (1967) J. Biol. Chem. 242, 3694–3703. V.A. (1982) J. Biol. Chem. 257, 12924–12928.
80. Truffa-Bachi, P., Van Rapenbusch, R., Janin, J., Gros, C., 115. Shapiro, A.L., Viñuela, E., & Maizel, J.V. (1967) Biochem.
& Cohen, G.N. (1968) Eur. J. Biochem. 5, 73–80. Biophys. Res. Commun. 28, 815–820.
81. Halwer, M., Nutting, G.C., & Brice, B.A. (1951) J. Am. 116. Lumpkin, O.J. (1982) Biopolymers 21, 2315–2316.
Chem. Soc. 73, 2786–2790. 117. Lumpkin, O.J., Daejardin, P., & Zimm, B.H. (1985)
82. Tanford, C. (1980) The Hydrophobic Effect: Formation Biopolymers 24, 1573–1593.
of Micelles and Biological Membranes, 2nd ed., Wiley- 118. Williams, J.G., & Gratzer, W.B. (1971) J. Chromatogr. 57,
Interscience, New York. 121–125.
83. Mysels, K.J., & Princen, L.H. (1959) J. Phys. Chem. 63, 119. Swank, R.T., & Munkres, K.D. (1971) Anal. Biochem. 39,
1696–1700. 462–477.
84. Burgess, R.R. (1969) J. Biol. Chem. 244, 6168–6176. 120. Lindstrom, J., Cooper, J., & Tzartos, S. (1980)
85. Pitt-Rivers, R., & Impiombato, F.S. (1968) Biochem. J. Biochemistry 19, 1454–1458.
109, 825–830. 121. Weber, K., Pringle, J.R., & Osborn, M. (1972) Methods
86. Montserret, R., McLeish, M.J., Bockmann, A., Enzymol. 26 (Part C), 3–27.
Geourjon, C., & Penin, F. (2000) Biochemistry 39, 122. Henderson, E.J., & Zalkin, H. (1971) J. Biol. Chem. 246,
8362–8373. 6891–6898.
87. Weinreb, P.H., Zhen, W., Poon, A.W., Conway, K.A., & 123. Doolittle, R.F. (1981) Science 214, 149–159.
Lansbury, P.T., Jr. (1996) Biochemistry 35, 124. Ingram, V.M. (1957) Nature 180, 326–328.
13709–13715. 125. Pauling, L., Itano, H.A., Singer, S.J., & Wells, I.C. (1949)
88. Peterson, G.L., & Hokin, L.E. (1981) J. Biol. Chem. 256, Science 110, 543–548.
3751–3761. 126. Ingram, V.M. (1958) Biochim. Biophys. Acta 28,
89. Reynolds, J.A., & Tanford, C. (1970) J. Biol. Chem. 245, 539–545.
5161–5165. 127. Vandekerckhove, J., & Weber, K. (1978) Proc. Natl.
90. Shirahama, K., Tsujii, K., & Takagi, T. (1974) J. Biochem. Acad. Sci. U.S.A. 75, 1106–1110.
(Tokyo) 75, 309–319. 128. Vandekerckhove, J., & Weber, K. (1978) Eur. J. Biochem.
91. Fisher, M.P., & Dingman, C.W. (1971) Biochemistry 10, 90, 451–462.
1895–1899. 129. Hubert, J.J., Schenk, D.B., Skelly, H., & Leffert, H.L.
92. Banker, G.A., & Cotman, C.W. (1972) J. Biol. Chem. 247, (1986) Biochemistry 25, 4156–4163.
5856–5861. 130. Cleveland, D.W., Fischer, S.G., Kirschner, M.W., &
93. Weber, K., & Osborn, M. (1969) J. Biol. Chem. 244, Laemmli, U.K. (1977) J. Biol. Chem. 252, 1102–1106.
4406–4412. 131. Fullmer, C.S., & Wasserman, R.H. (1979) J. Biol. Chem.
94. Davis, B.J. (1964) Ann. N.Y. Acad. Sci. 121, 404–427. 254, 7208–7212.
95. Ornstein, L. (1964) Ann. N.Y. Acad. Sci. 121, 321–349. 132. Shevchenko, A., Wilm, M., Vorm, O., & Mann, M. (1996)
96. Laemmli, U.K. (1970) Nature 227, 680–685. Anal. Chem. 68, 850–858.
97. Kyte, J., & Rodriguez, H. (1983) Anal. Biochem. 133, 133. Zhang, X., Herring, C.J., Romano, P.R., Szczepanowska,
515–522. J., Brzeska, H., Hinnebusch, A.G., & Qin, J. (1998) Anal.
98. Rodbard, D., & Chrambach, A. (1970) Proc. Natl. Acad. Chem. 70, 2050–2059.
Sci. U.S.A. 65, 970–977. 134. Dreger, M., Otto, H., Neubauer, G., Mann, M., & Hucho,
99. Cornfield, J., & Chalkley, H.W. (1951) J. Wash. Acad. Sci. F. (1999) Biochemistry 38, 9426–9434.
41, 226–229. 135. Lindstrom, J., Merlie, J., & Yogeeswaran, G. (1979)
100. Fawcett, J.S., & Morris, C.J.O.R. (1966) Sep. Sci. 1, 9–26. Biochemistry 18, 4465–4470.
101. Laurent, T., & Killander, J. (1964) J. Chromatogr. 14, 136. Luna, E.J., Kidd, G.H., & Branton, D. (1979) J. Biol.
317–330. Chem. 254, 2526–2532.
102. Morris, C.J.O.R. (1966) Protides Biol. Fluids 14, 543–551. 137. Schuppan, D., Cantaluppi, M.C., Becker, J., Veit, A.,
103. Lee, B., & Richards, F.M. (1971) J. Mol. Biol. 55, 379–400. Bunte, T., Troyer, D., Schuppan, F., Schmid, M.,
104. Andrews, P. (1965) Biochem. J. 96, 595–606. Ackermann, R., & Hahn, E.G. (1990) J. Biol. Chem. 265,
105. Ogston, A.G. (1958) Trans. Faraday Soc. 54, 1754–1757. 8823–8832.
106. Cohn, E.J., McMeekin, T.L., Edsall, J.T., & Blanchard, 138. Grant, G.A., & Bradshaw, R.A. (1978) J. Biol. Chem. 253,
M.H. (1934) J. Am. Chem. Soc. 56, 784–794. 2727–2731.
107. Sober, H.A. (1970) CRC Handbook of Biochemistry, CRC 139. Kern, D., Potier, S., Boulanger, Y., & Lapointe, J. (1979)
Press, Cleveland, OH. J. Biol. Chem. 254, 518–524.
108. Ackers, G.K. (1964) Biochemistry 3, 723–730. 140. Lundell, D.J., & Howard, J.B. (1978) J. Biol. Chem. 253,
109. Siegel, L.M., & Monty, K.J. (1966) Biochim. Biophys. 3422–3426.
Acta 112, 346–362. 141. Green, N.M., Valentine, R.C., Wrigley, N.G., Ahmad, F.,
110. Potschka, M., Nave, R., Weber, K., & Geisler, N. (1990) Jacobson, B., & Wood, H.G. (1972) J. Biol. Chem. 247,
Eur. J. Biochem. 190, 503–508. 6284–6298.
References 449
142. Zwolinski, G.K., Bowien, B.U., Harmon, F., & Wood, 160. Lambert, J.M., Jue, R., & Traut, R.R. (1978) Biochemistry
H.G. (1977) Biochemistry 16, 4627–4637. 17, 5406–5416.
143. James, G.T., & Notmann, E.A. (1973) J. Biol. Chem. 248, 161. Thevenin, B.J., Shahrokh, Z., Williard, R.L., Fujimoto,
730–737. E.K., Kang, J.J., Ikemoto, N., & Shohet, S.B. (1992) Eur.
144. Hall, C.L., & Kamin, H. (1975) J. Biol. Chem. 250, J. Biochem. 206, 471–477.
3476–3486. 162. Brown, K.C., Yu, Z., Burlingame, A.L., & Craik, C.S.
145. McKean, M.C., Beckmann, J.D., & Frerman, F.E. (1983) (1998) Biochemistry 37, 4397–4406.
J. Biol. Chem. 258, 1866–1870. 163. Carlsson, J., Drevin, H., & Axen, R. (1978) Biochem. J.
146. Chuang, M., Ahmad, F., Jacobson, B., & Wood, H.G. 173, 723–737.
(1975) Biochemistry 14, 1611–1619. 164. Itoh, Y., Cai, K., & Khorana, H.G. (2001) Proc. Natl.
147. Recsei, P.A., Huynh, Q.K., & Snell, E.E. (1983) Proc. Acad. Sci. U.S.A. 98, 4883–4887.
Natl. Acad. Sci. U.S.A. 80, 973–977. 165. Thorner, J.W., & Paulus, H. (1971) J. Biol. Chem. 246,
148. van Poelje, P.D., & Snell, E.E. (1990) Biochemistry 29, 3885–3894.
132–139. 166. Azem, A., Weiss, C., & Goloubinoff, P. (1998) Methods
149. Baldwin, E.T., Bhat, T.N., Gulnik, S., Hosur, M.V., Enzymol. 290, 253–268.
Sowder, R.C.n., Cachau, R.E., Collins, J., Silva, A.M., & 167. Hermann, R., Rubolph, R., & Jaenicke, R. (1979) Nature
Erickson, J.W. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 277, 243–245.
6796–6800. 168. Hermann, R., Jaenicke, R., & Rudolph, R. (1981)
150. Olson, T.S., Bamberger, M.J., & Lane, M.D. (1988) J. Biochemistry 20, 5195–5201.
Biol. Chem. 263, 7342–7351. 169. Craig, W.S. (1982) Biochemistry 21, 2667–2674.
151. Guchhait, R.B., Zwergel, E.E., & Lane, M.D. (1974) J. 170. Canals, F. (1992) Biochemistry 31, 4493–4501.
Biol. Chem. 249, 4776–4780. 171. White, M.F., Vasquez, J., Yang, S.F., & Kirsch, J.F. (1994)
152. Guchhait, R.B., Polakis, S.E., Dimroth, P., Stoll, E., Proc. Natl. Acad. Sci. U.S.A. 91, 12428–12432.
Moss, J., & Lane, M.D. (1974) J. Biol. Chem. 249, 172. Azem, A., Kessel, M., & Goloubinoff, P. (1994) Science
6633–6645. 265, 653–656.
153. Song, C.S., & Kim, K.H. (1981) J. Biol. Chem. 256, 173. Craig, W.S. (1982) Biochemistry 21, 5707–5717.
7786–7788. 174. Burns, D.L., & Schachman, H.K. (1982) J. Biol. Chem.
154. Robertson, D.C., Hammerstedt, R.H., & Wood, W.A. 257, 8638–8647.
(1971) J. Biol. Chem. 246, 2073–2081. 175. Teherani, J.A., & Nishimura, J.S. (1975) J. Biol. Chem.
155. Day, P.J., Murray, I.A., & Shaw, W.V. (1995) 250, 3883–3890.
Biochemistry 34, 6416–6422. 176. Craig, W.S., & Kyte, J. (1980) J. Biol. Chem. 255,
156. Meighen, E.A., & Schachman, H.K. (1970) Biochemistry 6262–6269.
9, 1163–1176. 177. Fraser, M.E., James, M.N., Bridger, W.A., & Wolodko,
157. Cooper, E., Couturier, S., & Ballivet, M. (1991) Nature W.T. (1999) J. Mol. Biol. 285, 1633–1653.
350, 235–238. 178. Hucho, F., Meullner, H., & Sund, H. (1975) Eur. J.
158. Singer, S.J. (1959) Nature 183, 1523–1524. Biochem. 59, 79–87.
159. Davies, G.E., & Stark, G.R. (1970) Proc. Natl. Acad. Sci.
U.S.A. 66, 651–656.
Chapter 9
Symmetry
The arrangement in space of the subunits in a multimeric these heterooligomeric protomers arranged in an array
protein is its quaternary structure. Many multimeric unique to that protein makes up its quaternary structure.
proteins are composed of multiple copies of only one Because every protomer is identical to all the others, the
particular subunit. These subunits, each necessarily arrangement of the protomers of such a heteromulti-
identical to the others when free in solution, combine meric protein is formally equivalent to the arrangement
together to form the final molecule of protein. In such of the subunits of a homomultimeric protein.
homomultimeric proteins, each of the subunits in the Each of the multimeric proteins discussed so far
crystallographic molecular model can be formally desig- can be divided formally into a set of identical protomers
nated a protomer1 of the final overall structure. A pro- or almost identical subunits. Aside from a few peculiar
tomer of a multimeric protein is the smallest portion of exceptions, the rules that govern the way these pro-
that protein from copies of which its entire quaternary tomers or these subunits are arranged in space to pro-
structure is created. Consequently, all of the protomers duce the complete molecular structure of the entire
in the protein must be the same. protein seem to be the same. In an oligomeric protein
Some multimeric proteins, like the isoenzymatic formed from a fixed number of identical protomers,
hybrids of fructose-bisphosphate aldolase (Figure 8–18), those protomers are arrayed around rotational axes of
are composed of two or more distinct subunits that are, symmetry. In an oligomeric protein formed from a fixed
nevertheless, each the offspring of the same common number of homologous but nonidentical subunits, those
ancestor. Although different in amino acid sequence and subunits are arrayed around rotational axes of pseu-
in the atomic details of their tertiary structure, these sub- dosymmetry. In a polymeric protein with an indefinite
units are still related closely enough to participate number of identical protomers or homologous subunits,
together to form the complete heteromultimeric protein, those protomers or those subunits are arrayed around a
much as identical protomers would participate in a screw axis of symmetry or a screw axis of pseudosymme-
homomultimeric protein. In such instances, each of the try, respectively.
individual subunits, although actually different, can be There is also a set of multimeric proteins composed
considered at low resolution to be one of the indistin- of nonidentical subunits that are assembled haphazardly
guishable protomers forming the overall structure. In with no regard, or sometimes a slight regard, for symme-
contrast to the hybrids formed by the isoenzymatic sub- try. In these proteins, the various subunits are associated
units of fructose-bisphosphate aldolase, other proteins with each other by interfaces that, other than their lack of
built from such similar but distinct subunits usually symmetry, are almost indistinguishable in their details
incorporate those subunits in unvarying ratios. For from those holding together symmetric proteins. Such
example, hemoglobin is always an (ab)2 tetramer and asymmetric, heteromultimeric proteins are much more
acetylcholine receptor is always an a2bgd pentamer, even likely to assemble and disassemble under different situ-
though the a and b subunits of hemoglobin or the a, b, g, ations, and such alternations in quaternary structure are
and d subunits of acetylcholine receptor are respectively often involved intimately in their function. There are also
homologous to each other and necessarily superposable. members of this set of heteromultimers, however, that
These exclusive stoichiometries are established by the have stable quaternary structures. Unlike symmetric
distinct atomic interactions between the subunits that homomultimeric proteins and symmetric heteromulti-
take place as the multimer assembles. meric proteins, asymmetric heteromultimeric proteins
Some multimeric proteins contain two or more dis- seem to be cobbled together in the absence of any set of
similar, unrelated subunits. For example, aspartate car- rules.
bamoyltransferase contains a subunits and b subunits
that are unrelated to each other. Even though the sub-
units composing such a protein are completely different, Rotational and Screw Axes of Symmetry
the final structure produced, when observed as a crystal-
lographic molecular model, can often be divided for- The fundamental symmetry operations that are avail-
mally into identical protomers, each containing one copy able to asymmetric objects such as the protomers of a
of each of the different subunits. A particular number of protein when they assemble into a homomultimeric
452 Symmetry
structure are those around rotational axes and screw the molecular model of the actin filament, the screw axis
axes. If a protein is constructed with rotational symme- of symmetry is a vertical line parallel to the plane of the
try, only a finite number of protomers produce the final page. If the image of any protomer is transposed by being
structure. If a protein is assembled with screw symmetry, rotated –166 ∞ (left-handed) around this axis while it is
a potentially infinite number of protomers usually can lifted 2.8 nm in a direction parallel to it, it superposes on
combine to produce a polymer of indefinite length. In a the next protomer in the helical polymer. Because this
few isolated instances observed so far, a protein, operation can be repeated indefinitely, all of the pro-
although it is assembled from subunits arranged with tomers are indistinguishable from each other except by
screw symmetry, is nevertheless forced to have only a their place in line.
finite number of protomers in the final structure. Such A rotational axis of symmetry (Figure 9–1A) is nec-
finite structures assembled with screw symmetry are essarily 2-fold, 3-fold, 4-fold, and so forth. This require-
thought to be very rare. ment arises from the fact that as one of the images of a
Consider two proteins that illustrate the observa- protomer is being rotated to superpose it on its neighbor,
tion that multimeric proteins constructed from identical all of the images of its neighbors are also simultaneously
protomers are assembled around rotational or screw being rotated (Figure 9–1A). When the rotation is com-
axes of symmetry. The proteins are malate dehydroge- pleted, each of the images must superpose on its respec-
nase from Aquaspirillum arcticum (Figure 9–1A),2 a tive partner. This can be accomplished only if the
dimer built from two identical protomers that are each protomers are arrayed about the axis at angles to each
folded polypeptides 329 amino acids in length, and actin other that are integral quotients of 360 ∞ (360 ∞/n). The
(Figure 9–1B,C),3–6 a protein that forms helical polymeric integer defines the number of times the rotation can be
fibers of indefinite length with each fiber built from many accomplished before returning to the beginning. The
identical protomers, each of which is the same folded rotational axis of symmetry within malate dehydroge-
polypeptide 375 amino acids in length.4 nase is a 2-fold rotational axis of symmetry. After two
A rotational axis of symmetry is a line about which superpositions, the original locations are regained. For
a structure can be rotated by 360 ∞/n, where n is an inte- the rotation to superpose the images of all protomers
ger larger than 1, to superpose upon itself. An exact 2-fold simultaneously each time, no translation along the axis
rotational axis of symmetry runs through the center of can occur.
the a2 dimer in the crystallographic molecular model of Screw axes of symmetry are defined by a rotation
malate dehydrogenase. If the two subunits of malate through any angle and a translation of any distance, and
dehydrogenase, still held together by the same interface, they are designated as left-handed or right-handed by
had been distorted intramolecularly by the structure of the same rules as apply to a helices. If the translation is
the protein itself into significantly different conforma- not zero, a screw axis of symmetry produces a helical
tions or if they had had different sequences but were nev- array. By the principle that the majority rules, right-
ertheless homologous to each other, a rotational axis of handed screws are given a positive sign and left-handed
pseudosymmetry would superpose one of them upon screws are given a negative sign. For example, one of the
the other. A rotational axis of pseudosymmetry is a line screw axes of symmetry in the actin polymer, the left-
about which the structure of a protein can be rotated by handed one, has a rotational angle of –166 ∞. Trivially, it
360 ∞/n to superpose upon each other subunits with is also possible to generate an actin polymer with a right-
identical sequence but significantly different conforma- handed screw axis of +194 ∞ but, less trivially, by two
tion, subunits of different sequence but the same coaxial right-handed screw axes of +28 ∞. A helical array
common fold, or distinct but homologous domains. always comes with such a set of different coaxial screw
The value of n is the fold of the symmetry. The axes. A rotational axis of symmetry is simply a special
2-fold rotational axis of symmetry in the crystallographic case of a screw axis of symmetry where the translation is
molecular model of malate dehydrogenase is a line per- zero and the rotational angle is required to assume
pendicular to the plane of the page in Figure 9–1A pass- values that are integral quotients of 360 ∞.
ing through the center of the molecule of protein. If the A screw axis of symmetry, other than the special
image of the upper protomer in the figure is rotated 180 ∞ case of a rotational axis of symmetry, in which transla-
about this axis, it superposes exactly on the lower pro- tion cannot occur, does not require the angular steps to
tomer. Because of this rotational axis of symmetry, the be integral quotients of 360 ∞. As the image of one of the
two protomers in the protein are indistinguishable. protomers is rotating and rising, the point of superposi-
Through the center of the indefinitely long helical tion need not occur at any particular angular disposition
polymer of actin, of which only a segment is drawn in along the helix produced by the screw axis. There is, how-
Figure 9–1B, runs a screw axis of symmetry. A screw axis ever, one requirement that limits the angles and transla-
of symmetry is a line, passing through a structure, about tions permitted a screw axis of symmetry. Its operation is
which the structure can be rotated by an angle between most readily observed by comparing the crystallographic
–180 ∞ and 180 ∞ and along which the structure can be molecular models of the protomer of protocatechuate
simultaneously translated to superpose upon itself. In 3,4-dioxygenase from Pseudomonas putida (Figure
Figure 9–1: a-Carbon diagrams drawn from crystallographic molecular models of malate dehydrogenase from A. arcticum2 and actin from Oryctolagus cuniculus.3 (A) Malate dehy-
drogenase is constructed from two subunits, one drawn with thicker line segments than the other. The view is down a crystallographic 2-fold rotational axis of symmetry in the
space group P21212 of the array. This drawing was produced with MolScript.485 (B) A model of the actin filament was constructed by placing individual actin monomers, a crystal-
lographic molecular model for which is available,4 in positions and orientations indicated by the map of electron density calculated from an X-ray fiber diffraction pattern of a gel
of oriented actin filaments.3 The orientation and position of the actin monomer that were assigned in this way are in agreement with the orientation and position of the protein
encoded by the mreB gene of Escherichia coli, a homologue of actin, within its filament, which serendipitously happens to be present in a crystal of this protein.6 There are eight
actin monomers in the segment of the filament displayed in this figure. The circles within each monomer designate the location of a bound Ca2+ ion. This drawing was produced
with MolScript.485 The atomic coordinates on which the drawing is based were provided by Ken Holmes. (C) A low-resolution molecular model of the actin filament, calculated by
image reconstruction of electron micrographs of ordered actin filaments,5 is included, at the same scale, to illustrate more clearly the packing of the monomers along the helix.
Reprinted with permission from ref 5. Copyright 1983 Academic Press.
236 236
78 11 78 11
174 174
120 120
210 210
148 148
208 208
97 97
1 1
375 375
167 167
324 324
453
288 288
454 Symmetry
protocatechuate
tides of different sequence, but they are clearly
homologous to each other because they share a common
fold. Through the center of the ab heterodimer of proto-
catechuate 3,4-dioxygenase (Figure 9–2A) there runs a
screw axis of pseudosymmetry. It is a horizontal line par-
allel to the plane of the page. If the image of the upper,
b subunit is transposed by being rotated +169.2 ∞ around
3,4-dioxygenase
this axis as it is simultaneously shifted to the right
0.674 nm in a direction parallel to the axis, it superposes
on the lower, a subunit.8
A helical polymer of actin can be constructed by
placing one protomer upon an origin, properly oriented
with respect to the axis of symmetry; transposing its
from image around the axis by –166 ∞ and along the axis by
2.8 nm; placing another protomer at this next location;
and repeating this process indefinitely. Suppose this
were attempted with the common subunit for protocate-
chuate 3,4-dioxygenase. Place an a subunit upon an
58
32
121
105 145
47
14
105 145
47
to a dimer.9
In theory, the same requirement restricting proto-
catechuate 3,4-dioxygenase and hexokinase to be dimers
37
14
2
B
C
one or more of these symmetry operations is precluded. The interfaces that are repeated throughout a mul-
An open structure is a structure built upon a screw axis timer composed of identical protomers are the origin of
of symmetry to which protomers can be infinitely added the geometry of the final structure. Because they are cre-
by that symmetry operation. An oligomeric protein is a ated by evolution, their appearance is determined by a
multimeric protein with a closed structure of a fixed completely random process. As time passes, variation in
number of protomers, and a polymeric protein is a mul- the identity of the amino acids on the surface of a given
timeric protein with an open structure of an indefinite monomeric protein occurs. At some point, in some
number of protomers and of an indefinite length. organism, in some species, a constellation of amino acids
Each multimeric protein composed of identical appears on the surface that permits a stable interface to
protomers, even a helical polymer of indefinite length, form between two of these identical monomers.
can be considered to be the manifestation of a set of Natural selection operates at this point. It takes
interfaces between those protomers. Each of these inter- little imagination to realize that if the vast majority of
faces includes all of the points of contact that lie between multimeric proteins were not closed structures, the cell
two protomers in the structure, and each is formed by the would rapidly fill with helical polymers and become a
association of two complementary faces, one from each solid, inflexible object incapable of the pliability essential
of the two protomers. These faces are particular regions to life. The difficulties encountered with helical polymers
on the respective surfaces of the two associating pro- of hemoglobin S in sickled erythrocytes dramatically
tomers. Because the structures of the protomers are illustrate this problem.10 An example of a polymerization
identical to each other, each necessarily possesses on its that is undesirable for a different reason is that of a tRNA-
own surface all of the unique faces forming the interfaces intron endonuclease from Archaeoglobus fulgidus, which
found in the complete molecule. The interface between can form a helical polymer that is enzymatically inactive
the two subunits of malate dehydrogenase (Figure 9–1A) because the active site is sterically blocked by neighbor-
is formed from two identical faces, one on the surface of ing monomers in the polymer.11 If a monomeric protein
each of its subunits. Note the particularly intimate con- were to sustain a series of mutations dictating that it
tact across the interface as the secondary structures, combine with its twins in such a way that a polymeric
which mimic each other across the axis of symmetry, fiber necessarily results, this set of mutations would
interdigitate. probably be eliminated by natural selection. Mutations
Following its biosynthesis, a polypeptide folds to leading to closed structures, however, aside from lower-
form a structure capable of recognizing and being recog- ing the osmotic pressure of the cytoplasm, may be neu-
nized by other folded polypeptides. If it is to combine tral initially, but oligomeric proteins have potentials
with its twins to form a multimeric protein constructed denied to monomeric proteins, and the appearance of an
from identical subunits, it must do so in a series of indi- oligomeric protein during evolution is the first step in the
vidual steps, and each step must involve the formation of eventual exploitation of these potentials. Nevertheless, if
an interface from two complementary faces. The atomic the interface is compatible with a closed structure, it can
contacts within each of these consecutively formed inter- initially be fixed by genetic drift as a neutral variation.
faces are as specific as the atomic contacts throughout With its fixation within that species, the protein has
the protein. For the same reasons that a folded polypep- become an oligomer of identical protomers.
tide assumes a precise and unique atomic structure, the There remains one perplexing fact. As in the case of
interface between two subunits has a precise and unique malate dehydrogenase (Figure 9–1A), the vast majority of
atomic structure. homomultimeric proteins that have been examined con-
If, as the result of evolution, a face appears any- clusively are built around rotational axes of symmetry.
where on the surface of a folded polypeptide and a face Often, as in the case of malate dehydrogenase, these
complementary to the first appears anywhere else on the rotational axes of symmetry can be proven to be exact;
surface of the same folded polypeptide, the face on one they always seem to be exact. In fact, protocatechuate
copy of that folded polypeptide will associate with its 3,4-dioxygenase (Figure 9–2) is one of the few exceptions
complement on another copy of the same folded to this rule, and it is not even a homodimer. If rotational
polypeptide. Any such random association between any axes of symmetry are no more than severely restricted
two identical asymmetric objects always defines a cases of screw axes of symmetry, and if a screw axis of
unique screw axis, an angle of rotation about that screw symmetry is compatible with a closed structure, why
axis, and a translation along that screw axis that will are multimeric proteins almost always rotationally
superpose the image of one of the asymmetric objects symmetric?
upon the other. Either these three parameters are con- The main difference between a dimer like malate
sistent with an open structure or they are consistent with dehydrogenase and a dimer like protocatechuate
a closed structure. If they are consistent with an open 3,4-dioxygenase lies in the respective interfaces defining
structure, the very fact that the one interface can form these structures. In a rotationally symmetric dimer such
means that many others will subsequently form. A series as malate dehydrogenase, individual interactions
of such interfaces is a helical polymer. between the two protomers come in sets of identical
456 Symmetry
pairs. The best way to see this is to consider the a helices resent advantageous variations. Improvements in a cer-
containing Isoleucines 59 on either side of the rotational tain protein are retained by natural selection, and their
axis of symmetry (Figure 9–1A). These a helices and their retention is unaffected by the frequency with which ret-
carboxy-terminal loops of random meander insert into rograde changes arise. Mutations turning the dimer back
pockets in the opposite protomers. Consider a specific into a monomer, such as those that can be performed
position such as Isoleucine 59 in the segment of the experimentally,12,13 would be eliminated by natural
sequence of the lower protomer forming this insertion. selection if they were disadvantageous. It is possible that
The amino acid at this location is making several con- oligomerization of a protein has an immediate advanta-
tacts with amino acids in the pocket in the upper pro- geous effect. If it did, the fact that most oligomeric pro-
tomer. Suppose a mutation occurred at position 59 that teins are built around 2-fold rotational axes of symmetry
strengthened these interactions by a certain increment. would be a reflection of the fact that such oligomers arise
Because the upper protomer was read from the same with a high frequency and of the fact that they are fixed
gene, at sequence position 59 in its a helix the same by natural selection because oligomers are advanta-
favorable change would occur automatically so the geous.
increment for the increase in stability for the whole inter- Because events were discussed in the opposite
face would be twice that of each individual increment. order of the normal progression, a summary of the his-
The same argument could be made for each location in torical and logical sequence seems appropriate. As a
the interface. result of genetic variation among the individuals in a
In a protein built on a screw axis of symmetry, how- given species, a constellation of amino acids appears on
ever, the amino acids at the same sequence positions in the surface of a monomeric protein within one of those
the two protomers never interact with the same amino individuals. The constellation causes molecules of that
acids from the other protomer across the screw axis of previously monomeric protein to associate with each
symmetry (Figure 9–2B). A mutation, occurring any- other. This association necessarily creates an interface.
where in the interface, that adds an increment of stabil- This interface necessarily forces the two protomers it
ity to the dimer is not duplicated automatically. It follows brings together to be related to each other by a unique
that as variation proceeds during evolution, the incre- screw axis of symmetry. The association between the two
mental changes that occur within an interface built or more protomers created by this screw axis of symme-
around a 2-fold rotational axis of symmetry are amplified try is tested by natural selection. Occasionally, a helical
2-fold relative to those that occur within each interface polymer, which necessarily results from a screw axis that
around a screw axis. This conclusion is valid whether one is not closed, is advantageous and is retained. Most of the
of these interfaces has already appeared during evolu- time, however, the survivors of natural selection are
tion or is merely incipient. closed oligomeric proteins the interfaces of which dictate
The formation of an interface between two identical rotational axes of symmetry.
monomeric proteins, which is the evolutionary event that
precedes the appearance of a multimeric protein, is not an Suggested Reading
all or none phenomenon. The chemical reaction in ques-
tion is Monod, J., Wyman, J., & Changeux, J.P. (1965) On the nature of
allosteric transitions: a plausible model, J. Mol. Biol. 12, 88–118.
2a 1 a2 (9–1)
Problem 9–1: Using as your three examples malate dehy-
Associated with this reaction is a change in standard free drogenase, protocathecuate 3,4-dioxygenase, and fila-
energy, and it is this change in standard free energy that mentous actin, discuss the topics of rotational axes of
determines the extent of the reaction. The numerical symmetry, screw axes, open structures, closed struc-
value of this change in standard free energy is dictated by tures, interfaces, and helical polymers.
the particular interactions that occur among the amino
acids within the interface. The particular interactions
that occur are the product of evolution by natural selec- Space Groups
tion. Each explicit variation in one of these interactions
adds or subtracts an increment of free energy to the over- Screw and rotational operations around axes of symme-
all change. If the increments are automatically doubled, try occur within crystals of proteins in addition to the
overall decreases in the standard free energy change for translational operations relating the unit cells. These
the reaction proceed more rapidly over evolutionary axes of symmetry are the fundamental operations that
time. define the space groups. A space group of identical enan-
Incremental decreases in the standard free energy tiomeric objects is a potentially infinite array of those
change, however, are also doubled. Although rotationally objects, the positions and orientations of which are
symmetric dimers should appear more frequently, they related to each other by screw axes of symmetry, rota-
should also disappear more frequently, unless they rep- tional axes of symmetry, and translational operations.
Space Groups 457
For reasons mainly associated with the phenomenon of axis of symmetry perpendicular to the page. Because of
diffraction, a unit cell is defined solely in terms of trans- the two rotational dispositions, which arise only because
lation. A unit cell is the smallest unit from exact copies of of the increased efficiency of packing, the unit cell ends
which, distributed only by simple translational move- up containing two of the enantiomeric objects rather
ments, the entire crystal is created. In contrast, the crys- than one. Each of the 2-fold rotational axes in the center
tallographic asymmetric unit is the smallest unit from of each of these unit cells is itself a rotational axis of sym-
exact copies of which, distributed both by translation metry for the entire array if it is assumed that the array is
and by rotation around axes of symmetry, the entire crys- infinitely propagated in three dimensions. There are also
tal is created. Crystallographic asymmetric units are usu- three distinct sets of 2-fold rotational axes of symmetry
ally delineated to include one or more intact subunits of between the unit cells. It is a feature of axes of symmetry
a protein or one or more intact molecules of a protein. in space groups that more than one distinct set appears
If crystallographic asymmetric units containing one at a time, and together these sets of axes of symmetry
or more subunits or one or more molecules of a protein define the space group.
were always distributed in a crystal so that each of those In crystals, as opposed to individual oligomers and
asymmetric units had exactly the same rotational orien- polymers, screw axes of symmetry are required to have
tation, all unit cells would be of the same type, P1, and rotational angles that are integral quotients of 360 ∞. This
each unit cell would contain only one crystallographic arises from the fact that these screw axes of symmetry
asymmetric unit. Packing the same asymmetric units in operate on the entire array in the crystal. As the image of
different rotational orientations to form a crystal, how- one of the crystallographic asymmetric units from which
ever, is not forbidden, and strangely shaped enan- the crystal is composed is rotating and rising around the
tiomeric objects, such as asymmetric units containing screw axis, the images of every other asymmetric unit in
protein, usually are packed with greater efficiency when the array are rotating and rising around the same axis. At
they can assume different rotational orientations. If the completion of the operation, all of the images in the
these rotational orientations are to be compatible with array must superpose on identical partners. This can
the infinite regular array that is a crystal, they must be occur only if the rotations of both the screw axes and the
related by particular symmetry operations. Dismissing rotational axes of symmetry in a space group are 2-fold,
mirror symmetry, which is irrelevant to enantiomeric 3-fold, 4-fold, or 6-fold. No other rotational multiplicities
objects such as proteins, one is left with axial symmetry. are compatible with an infinite array of asymmetric
In a diagram of the simple space group P2 (Figure objects. Space groups never have 5-fold rotational or
9–3), the array of unit cells portrayed represents one of screw axes of symmetry because an infinite repetitive
the layers in a three-dimensional crystal. The array of the array of pentagons cannot be formed. Any translational
enantiomeric objects is created by distributing them distance, as long as it is compatible with an unclosed
about rotational axes of symmetry and by translational screw axis of symmetry, is compatible with an infinite
operations. Within each unit cell, the two identical enan- array. Technically no crystal is infinite, but it always has
tiomeric objects are related by a central 2-fold rotational the potential to be so, and this potential is all that
matters.
The arrangement of asymmetric units in a space
group can be displayed by distributing drawings of an
enantiomeric object representing a crystallographic
asymmetric unit of protein around appropriately posi-
tioned axes of symmetry. It is convenient to use an enan-
tiomeric object familiar to all chemists, as well as one
that is easy to draw, namely, a small enantiomeric mole-
cule. Lactic acid is one of the smallest enantiomeric mol-
ecules. Drawings of distributions of lactic acid in the
space groups C2, P212121, and P3121 (Figures 9–4, 9–5,
and 9–6, respectively)14–18 illustrate certain properties of
space groups, their rotational axes of symmetry, and the
unit cells they create.
Although there are four distinct sets of 2-fold rota-
tional axes in the space group P2 of Figure 9–3, it is con-
sidered to be a simple 2-fold array, designated by the
single integer 2, because all of its axes are parallel to each
Figure 9–3: Packing of identical asymmetric objects, which repre- other and no set can exist without all of the others. The
sent the crystallographic asymmetric units from which the array is
formed, in the space group P2, in which they alternately assume presence of a rotational axis of symmetry in a space
two different rotational orientations. The symbol ˙ indicates a group is indicated by an unadorned integer: 2, 3, 4, or 6.
2-fold rotational axis of symmetry perpendicular to the page. The presence of a screw axis of symmetry is indicated by
458 Symmetry
B
OH OH
H3C HO H3C
H H
HOOC HOOC
OH H COOH O H
CH3
H 3C H H 3C H
HOOC HOOC
Top HO
H COOH
OH CH3 OH
H 3C H H3C H
HOOC HO HOOC
H COOH
CH3 H
O OH
H3C H H3C H
HOOC HO HOOC
H COOH
CH3
OH OH
H 3C H H 3C H
HOOC HOOC
Figure 9–4: Space group C2. (A) Molecules of lactic acid arranged in the space group C2.
The lattice of space group C2 is monoclinic, with two of its crystallographic angles equal
to 90 ∞ and one of its crystallographic angles not equal to 90 ∞ . The space group is charac-
terized by rows of parallel axes of symmetry. There are four rotational axes and four screw
axes for every unit cell. Each molecule of lactic acid represents the crystallographic asym-
metric unit. The crystallographic angle that is not equal to 90 ∞ is in the top and bottom
faces of the unit cell, and the axes of symmetry are all vertical. (B) Diagram of the dispo-
sition of the axes of symmetry in panel A. The diagram is of the top side of the top face of
the unit cell, and the symbols are for the 2-fold rotational axes of symmetry or the 2-fold
screw axes of symmetry viewed end-on. The planes containing the alternating screw and
rotational axes of symmetry are parallel to the side faces of the unit cell and pass through
the front face at intervals of one-quarter and three-quarters of the unit cell. The screw
axes are in the front face, the back face, and halfway between the front and the back face.
The rotational axes of symmetry are at one-quarter and three-quarters of the distance
between the front and back face. (C) Bovine pancreatic deoxyribonuclease I packed in its
crystal in the space group C2.14 The crystallographic asymmetric unit contains one mol-
ecule of the protein. All of the axes of symmetry run horizontally, rather than vertically,
and the unit cell has been shifted to its traditional15 position so that the planes contain-
ing the alternating axes of symmetry, which are slanted and normal to the page in panel
B, are now parallel to the plane of the page and lie in the front face, the back face, and in
the center, halfway between the front and the back face of the unit cell. The half arrows
indicate two of the 2-fold screw axes, and the full arrow indicates one of the 2-fold rota-
tional axes. The three marked axes are in the plane passing through the center of the unit
cell. The molecules shown are related only by this set of axes of symmetry but are related
to their neighbors in the unit cells in front and behind by the other two sets of axes of
symmetry in the front and back faces, respectively. The crystallographic angle in the side
faces is 91.4 ∞, and the other two crystallographic angles are exactly 90 ∞ . Reprinted with
permission from ref 14. Copyright 1986 Academic Press.
Space Groups 459
an integer, 2, 3, 4, or 6, followed by another integer in primitive, P1, which lacks axial symmetry entirely,
subscript, for example, the 31 screw axes of symmetry in enforces one or more of the angles of the unit cell to be
the space group P3121 (Figure 9–6). The main integer, n, 90 ∞ or 120 ∞ or enforces one or more of the axes of the
is the integer by which 360 ∞ is divided to obtain the rota- unit cell to be the same length or enforces requirements
tional angle of the steps. The integer in subscript, m, on both angles and lengths. These are not coincidental
determines the fraction, m/n, of the unit cell over which identities but required identities. They are dictated by
the translation occurs with each rotational step. The the symmetry operations and are thus exact quantities. If
translation is always right-handed to the rotation. the angle in the monoclinic crystal of Figure 9–3 were not
Consequently, by this convention a 31 screw and a exactly 90 ∞, the crystal would be filled with fractures and
41 screw are right-handed, and a 32 screw and a 43 screw would not be a crystal.
are left-handed. Reality seems to take place in Cartesian space, and
The designation of the space group in a crystalline there are only 71 space groups into which an infinite
array takes the form of a capital letter* followed by one or array of identical crystallographic asymmetric units,
more numbers. An example would be P212121, which each containing one or more subunits of protein, can be
would mean a primitive lattice made of rectangular arranged in Cartesian space to produce a crystal. Every
parallelepipeds intersected by three orthogonal sets of crystal of protein has its crystallographic asymmetric
2-fold screw axes of symmetry (Figure 9–5). The arrange- units arrayed in one of these space groups. The space
ments of the axes in a particular space group can be group is established as the crystal nucleates and grows in
learned only by consulting a diagram of that space the dish; it cannot be dictated by the investigator. She
group.15 can only try to change the conditions of crystallization in
The crystallographic asymmetric unit has a volume the hope that another space group will be generated by
that is always an integral quotient of the unit cell and the process. This is often attempted because the identity
thus its volume is equal to or smaller than that of the unit of the space group determines how difficult it will be to
cell. An integral number of asymmetric units, but not calculate a map of electron density.
necessarily of intact asymmetric units, composes a unit The 71 space groups available to a crystallizing pro-
cell. For example, one whole, two halves, four fourths, tein are distinguished one from the other by the arrange-
and eight eighths gathered from 15 different asymmetric ment in space of their respective screw axes and
units, each represented by one intact molecule of lactic rotational axes of symmetry. In turn, the space group of
acid, can together create a unit cell containing a total of the particular crystal of protein that is formed is identi-
four asymmetric units in the space group P212121 (Figure fied by the investigator from a characteristic pattern cre-
9–5). The crystallographic asymmetric unit in Figure ated in the data set by its particular arrangement of axes
9–4C is one molecule of deoxyribonuclease I, and the of symmetry. These are patterns in which identities
unit cell contains a total of four asymmetric units but not occur in the amplitudes of the reflections. For example,
four intact molecules of the protein. in the oscillation photograph in Figure 4–1B, the fact that
The space group imposes certain constraints on the patterns of the intensities of the reflections above
the structure of the unit cell. For example, in the space and below the equator are mirror images of each other is
group P2 that produces the array of Figure 9–3, the 2-fold consistent with the existence of a rotational axis of sym-
rotational axes of symmetry must be normal of the plane metry in the crystal parallel to the axis of oscillation. The
of Figure 9–3 or the superposition cannot occur. particular pattern of identities within the entire data set
Therefore, each unpictured asymmetric unit above and identifies the axes of symmetry in the crystal and their
below the plane of the page in the lattice must be per- arrangement in space, and hence the space group of the
pendicularly aligned with one of the asymmetric units in crystal.
the plane of the page. This requires that the two angles of The packing of deoxyribonuclease I in the space
the fundamental unit cell aligning the axis normal to the group C2 (Figure 9–4C),14 that of the lectin from Pisum
page be precisely 90 ∞. A lattice where two of the angles sativum in the space group P212121 (Figure 9–5E),16,17 that
must be 90 ∞ is monoclinic (caption to Figure 4–2). In the of telokin from Meleagris gallopavo in the space group
space group P212121 displayed in Figure 9–5, the three P3221 (Figure 9–6C),18 that of porin from Rhodobacter
necessarily orthogonal sets of screw axes force the unit capsulatus in the space group R3 (Figure 9–7),19 and that
cell to be a rectangular parallelepiped and the lattice to of ferredoxin from Aphanothece sacrum in the space
be orthorhombic. Each space group other than the most group P41 (Figure 9–8)20 illustrate the accommodations
of the molecules of proteins to the axes of symmetry
defining these five space groups.
There are no rotational axes of symmetry, only
* The capital letter refers to the particular relationship between the screw axes of symmetry, relating the asymmetric units in
underlying lattice and the unit cell for the space group of interest.
These relationships are primitive (P), C-face centered (C), A-face
the space groups P212121 and P41, but in the space groups
centered (A), B-face centered (B), all-face centered (F), body cen- C2, P3121, and R3, pairs of asymmetric units are disposed
tered (R), or hexagonally centered (H). around 2-fold rotational axes or triplets of asymmetric
OH
A OH H OH
OH HOOC CH3 H3C H
H3C H HOOC H3C H
HOOC H3C HOOC
H HO
HOOC
HO
HO HO
H3C H COOH
HO H COOH CH3
H COOH H COOH
CH3 OH CH3
H COOH
CH3
OH
OH H
H3C H OH
HOOC HOOC
CH3 H 3C
H3C H
H OH HOOC
HOOC HO
H3C H
HOOC
B C D
E
Space Groups 461
Figure 9–5: Space group P212121. (A) Molecules of lactic acid arranged in the space group P212121. The lattice of space group P212121 is
orthorhombic, with all three crystallographic angles equal to 90 ∞ , and the unit cell is rectangular. This results from the fact that there are
three sets of 2-fold screw axes of symmetry, each set is necessarily orthogonal to the others, and in each set the screw axes are parallel to one
of the crystallographic axes. There are four screw axes for each unit cell in each of the three sets. The molecule of lactic acid represents a crys-
tallographic asymmetric unit, and from any one molecule of lactic acid the entire array can be created by performing the operations of 2-fold
screw symmetry in the three orthogonal directions. (B) The four parallel horizontal screw axes of symmetry passing through the left face of
the unit cell in panel A from front left to back right. These axes are in the top face, bottom face, and halfway between the top face and bottom
face, one-quarter and three-quarters of the way across the unit cell. (C) Vertical 2-fold screw axes of symmetry passing through the top face
of the unit cell in panel A. Half of these vertical axes coincide with each vertical column of lactic acids, of which there are two for each unit
cell. The other half of these vertical screw axes of symmetry coincide with vertical lines in the center of each vertical face. When rotated 180 ∞
about any one of these vertical axes while simultaneously rising half of a unit cell, the lattice superposes on itself. (D) The four parallel hori-
zontal screw axes of symmetry passing through the front face of the unit cell in panel A from front right to back left. These axes are at one-
quarter and three-quarters of the distance between top and bottom face and one-quarter and three-quarters of the distance between the side
faces. The symbols in panels B–D are the 2-fold screw axes of symmetry seen end-on. (E) The lectin from P. sativum packed in its crystal in
the space group P212121.16,17 The asymmetric unit contains one complete molecule of the protein, which is a homodimer of identical subunits
related by a local noncrystallographic 2-fold rotational axis of symmetry. The unit cell is positioned in the traditional15 location with respect
to the three orthogonal sets of 2-fold screw axes of symmetry so that the top is shifted one quarter of its width forward and the front is shifted
one quarter of its width downward relative to their positions in panels C and D, respectively. The central, complete molecule of protein is
represented in thicker lines. It is related to the two molecules drawn with thinner lines above and below it by one of the vertical screw axes
of symmetry halfway across the unit cell and one quarter of the way forward from the back face. It is related to the two molecules to its upper
right and upper left, respectively, by one of the horizontal screw axes of symmetry parallel to the plane of the page, one quarter of the way
down the unit cell, and halfway between the front and back faces. And it is related to the two molecules to its lower right and lower left, respec-
tively, by the two screw axes of symmetry normal to the plane of the page half of the way up from the bottom face and each one quarter of
the way in from one of the sides. Reprinted with permission from ref 17. Copyright 1990 Elsevier B.V.
units are disposed around 3-fold rotational axes of sym- when it is within the crystal is an exact rotational axis of
metry that are inherent in the space groups. In the space symmetry because a crystallographic axis of symmetry is
groups C2 and P3121, each and every asymmetric unit is necessarily an exact rotational axis of symmetry. If a crys-
related to at least one of its adjacent twins by a particular tallographic axis of symmetry were not exact, crystal
2-fold rotational axis of symmetry. This unique relation- growth could not continue because the small equal devi-
ship establishes a particular pair of twins. This pair is ations between the actual rotational operation and an
exceptional because the whole lattice can be divided into exact rotational operation would add up across the crys-
an array of these pairs, and in each of these pairs the tal and eventually produce an interruption in the lattice.
orientation of the two twins to each other is the same. In An exact rotational axis of symmetry in a protein is a
the space group C2 pictured in Figure 9–4A, such a pair rotational axis which, in a crystallographic molecular
of twins includes any two lactic acid molecules that have model of that protein, coincides with a crystallographic
their carboxylic acid functional groups opposite the axis of symmetry.
hydroxyls of their neighbors. Every lactic acid molecule The molecular 2-fold rotational axis of symmetry in
in the crystal participates in one and only one such the middle of each dimer of malate dehydrogenase from
symmetric relationship. A. arcticum (Figure 9–1) coincides with one of the crys-
The particular 2-fold rotational axes of symmetry tallographic 2-fold rotational axes of symmetry in the
connecting the noted pairs of rotationally symmetric space group P21212 in which it crystallizes.2 Conse-
twins are crystallographic axes. A crystallographic axis quently, the molecular axis is exact. The molecular 3-fold
of symmetry is any one of the axes of symmetry that rotational axis of symmetry in the middle of each trimer
defines the space group of the crystal. It exists only when of porin from R. capsulatus coincides with one of the
the protein is in a crystal. The 2-fold rotational axis of crystallographic 3-fold rotational axes of symmetry in the
symmetry running through the center of malate dehy- space group R3 in which it crystallizes (Figure 9–7)19 and
drogenase (Figure 9–1) and connecting its twin subunits is exact.
is a molecular axis of symmetry. A molecular axis of sym- There is one crystal that contains an educational
metry is an axis of symmetry that exists in the molecule exception to this rule that a molecular axis of symmetry
of a protein regardless of whether or not that molecule is coinciding with a crystallographic axis of symmetry is
in solution or in a crystal. Crystallographic axes and exact. Each ab protomer of the (ab)3 trimeric portion of
molecular axes arise under different circumstances. The rat mitochondrial H+-transporting two-sector ATPase21 is
molecular axes of symmetry are created as the oligomeric found in its own asymmetric unit when the protein crys-
protein assembles in the cell, and the crystallographic tallizes in the space group R32. Each (ab)3 trimer, how-
axes of symmetry are created as the crystal grows in a ever, has one and only one g subunit associated with it.
dish. These two types of axes of symmetry are independ- Consequently, only one of the asymmetric units in each
ent properties. They can, however, but they are never triplet of asymmetric units containing the entire
required to, coincide. If a molecular rotational axis of (ab)3 trimer can contain a particular segment of a g sub-
symmetry coincides with a crystallographic rotational unit. In fact, the g subunits are distributed at random
axis of symmetry, it can be stated that the molecular axis among the three so the map of electron density contains
HO H
HOOC CH3 HO
HO H H
HHOOC CH3 HOOC CH3
H3C HO
H COOH H COOH
H 3C HO H3C HO
O H HOOC
H 3C H
H HOOC HOOC
O H 3C
H OH OH H
HOOC H3C H3C
H 3C H OH H 3C H OH
HOOC HO H 3 C HOOC
H COOH
HOOC H3C O H3C
A HO H COOH H HO COOH
H H
HOOC CH3 HOOC CH3
H HO HO H H HO
HOOC CH3
H COOH
H H
H H3CHOOC CH3 HO HO CH3
120° O 120° HOOC
Æ
ó
Æ
Æ
ó
1 1
Æ Æ—
Æ
—
6 6
ó
B 1 ó 1
ó
ó
— —
6 6
Æ
1 1
ó
Æ
Æ Æ—
Æ
—
ó
6 6
1
—
1
—
1
—
3 3 3
Figure 9–6: Space group P3121. (A) Molecules of lactic acid arranged in the space group P3121. The
lattice of space group P3121 is trigonal, with two crystallographic angles of 90 ∞ and one crystallo-
graphic angle of 60 ∞. The angle of 60 ∞ is in the bottom faces of the three unit cells that are drawn. The
space group P3121 has a set of right-handed 3-fold screw axes of symmetry. In the drawing, two of
the 3-fold screw axes of symmetry of each unit cell coincide with the vertical columns of lactic acid
molecules. The third 3-fold screw axis of symmetry of each unit cell coincides with the vertical edge
of each unit cell. Because the other crystallographic angle in the bottom face is 120 ∞, counterclock-
wise rotation of 120 ∞ about this latter axis while simultaneously rising one-third of a unit cell causes
the three upward columns of lactic acid around this axis to superpose on themselves and the three
downward columns of lactic acid around this axis to superpose on themselves. Normal to each of the
3-fold screw axes of symmetry running along each vertical edge of the unit cells and intersecting per-
pendicularly with these latter vertical axes are 2-fold rotational axes of symmetry arrayed at 60 ∞
angles to each other. In the bottom face of the unit cell, the first 2-fold rotational axis of symmetry is
the diagonal bisecting the 120 ∞ angle. Each successive 2-fold rotational axis of symmetry is 60 ∞ coun-
terclockwise to the one below it and one-sixth of the distance up the axis of the unit cell.
(B) Diagram15 of the arrangement of all of the axes of symmetry in space group P3121. The view is
looking down onto the bottom face of the unit cell containing the 60 ∞ angle. The flared triangles
denote 3-fold screw axes normal to the page. The full arrows are 2-fold rotational axes of symmetry,
and the half arrows are 2-fold screw axes of symmetry parallel to the plane of the page. The fractions
indicate how far up the vertical edge or vertical face of the unit cell the axes parallel to the plane of
the page are found. Reprinted with permission from ref 15. Copyright 1983 D. Reidel. (C) Telokin
from the gizzard of M. gallopavo packed in its crystal18 in the closely related space group P3221. The
protein is a monomer of a single folded polypeptide, and the asymmetric unit contains only one of
these monomers. The view is from the top so the 3-fold screw axes of symmetry in this view are
normal to the plane of the page rather than vertical. They are in the same locations in the unit cell as
those in the space group P3121 (panel B) but are left-handed rather than right-handed screw axes
(hence the 32 instead of 31). This difference causes the 2-fold screw axes of symmetry and 2-fold rota-
tional axes of symmetry parallel to the plane of the page (panel B) to be encountered in a clockwise
succession rather than a counterclockwise succession but at the same locations, angles, and depths.
Reprinted with permission from ref 18. Copyright 1992 Elsevier B.V.
Space Groups 463
three overlapping, symmetrically displayed copies of the tallographic 2-fold rotational axes of symmetry.
g subunit each copy having one third the expected elec- (S)-2-Hydroxy-acid oxidase crystallizes in the space group
tron density. The actual individual asymmetric units in I422. As a result, its molecular 4-fold rotational axes of
each molecule of the protein in the crystal are not sym- symmetry and its four molecular 2-fold axes of symmetry
metric within themselves, but the crystal is 3-fold sym- coincide with crystallographic rotational axes of symme-
metric. The individual molecular rotational axes of try29 and must be exact. Dihydrolipoyllysine-residue
pseudosymmetry relating the three ab protomers is pre- acetyltransferase from Azotobacter vinelandii30 and
cisely 3-fold, but the symmetry of each molecule is per- dihydrolipoyllysine-residue succinyltransferase from
turbed by the presence of the necessarily asymmetric E. coli,31 related oligomers each composed of 24 identical
g subunit. subunits, both crystallize in the space group F432 in
The results from several crystallographic experi- which the asymmetric unit is a single subunit, and the
ments serve to illustrate the distinction between crystal three molecular 4-fold rotational axes of symmetry, the
symmetry and molecular symmetry and the conse-
quences of their coincidence.
Triose-phosphate isomerase, a dimer composed of
two identical subunits, crystallizes in the space group
P212121. The crystallographic asymmetric unit is the
a2 dimer, and the 2-fold molecular rotational axis of sym-
metry within the dimer cannot coincide with a crystallo-
graphic rotational axis of symmetry because there is
none.22 Glyceraldehyde-3-phosphate dehydrogenase
(phosphorylating), an (a2)2 tetramer composed of four
identical subunits, also crystallizes in the space group
P212121, and the asymmetric unit is necessarily the entire
tetramer. Glutathione peroxidase, however, also an
(a2)2 tetramer composed of four identical subunits, crys-
tallizes in the space group C2, and the asymmetric unit is
the a2 dimer.23 Consequently, in this instance one of the
molecular axes of symmetry coincides with a crystallo-
graphic axis of symmetry, and the two a2 dimers com-
posing the (a2)2 tetramer must be related to each other by
an exact 2-fold molecular rotational axis of symmetry.
The other two molecular 2-fold rotational axes of sym-
metry orthogonal to the one that coincides cannot also
coincide with orthogonal crystallographic 2-fold rota-
tional axes of symmetry because there is none.
Phosphorylase b, a dimer composed of two identi-
cal subunits, crystallizes in the space group P432121, the
asymmetric unit is one subunit,24 and the dimer must be
constructed upon an exact 2-fold molecular rotational
axis of symmetry. Alcohol dehydrogenase, a dimer com-
posed of two identical subunits, crystallizes in the space
group C22121, and the molecular 2-fold rotational axis of
one single folded polypeptide of this trimer.
Figure 9–7: Molecules of porin from R. cap-
sulatus packed in the space group R3 of its
crystal.19 The lattice of space group R3 is trig-
onal with two crystallographic angles of 90 ∞
and one of 60 ∞ . Three-fold rotational axes of
symmetry and 3-fold screw axes of symme-
try normal to the front face pass along its
edges and through its interior. The protein is
a trimer of three identical folded polypep-
tides, and the asymmetric unit contains only
constructed around three orthogonal apparently exact ing because it had been shown previously that crystal-
2-fold molecular rotational axes of symmetry. The lization of the protein causes several of its enzymatic
molecular 2-fold rotational axis of symmetry of the properties, which are uncomplicated in solution, to
dimeric lectin from P. sativum does not coincide with a become asymmetric. Consequently, it was concluded
crystallographic rotational axis of symmetry because that crystal packing forces caused an otherwise symmet-
there is none in its space group P212121 (Figure 9–5), but ric protein to become remarkably asymmetric.
when the one folded polypeptide in the crystallographic There are, however, a few observations suggesting
molecular model is rotated around the molecular axis of that there may be subtle asymmetries in some
symmetry, its a carbons superpose on those of its twin homooligomers. For example, when the subunits of the
with a root mean square deviation of 0.06 nm.16 The crystallographic molecular model of the (a2)2 tetramer of
rotational angle around the molecular axis of symmetry L-2-hydroxyisocaproate dehydrogenase were super-
producing the smallest root mean square deviation posed intramolecularly, it was found that two super-
(0.019 nm) of the superposed a carbons of the two sub- posed well on each other (root mean square deviation of
units in the crystallographic molecular model of for- 0.02 nm) and the other two did also (root mean square
mate dehydrogenase from Pseudomonas was 179.9 ∞.37 deviation of 0.03 nm) but that neither member of one of
In similar superpositions, the a carbons of the two sub- these pairs superposed well on either member of the
units of triose-phosphate isomerase from yeast coin- other pair (root mean square deviation of 0.13 nm).45
cided with a root mean square deviation of less than This asymmetry seemed too large to be due to crystal
0.04 nm,38 and those of the two subunits of transketo- packing, and it is possible that this protein may be asym-
lase from yeast, by 0.024 nm.39 The error in the coordi- metric in solution.
nates for the crystallographic molecular model of It is also possible to perform a self-rotation func-
inorganic diphosphatase from yeast was estimated to be tion on the asymmetric unit in the space group of a
0.037 nm; upon superposition of the one of its subunits crystal to detect molecular symmetry before phases are
on the other by rotation around the molecular axis of available. The 11-fold molecular rotational axis of sym-
symmetry, the root mean square deviation of all of the metry in the trp RNA-binding attenuation protein from
atoms in the two polyamide backbones was only Bacillus subtilis, which cannot coincide with any of the
0.038 nm.40 Although there were functional indications permissible crystallographic rotational axes of symme-
that the subunits of ribulose-bisphosphate carboxylase try, was readily detected within the asymmetric unit of
were in different environments in solution, when the its space group C2 by performing a self-rotation func-
different folded polypeptides in the crystallographic tion.46
molecular model were superposed by rotation around
the molecular axes of symmetry, their a carbons coin-
cided with a root mean square deviation of less than
Suggested Reading
0.02 nm, well within the accuracy of the coordinates Hahn, T., Ed. (1983) International Tables for Crystallography, Vol. A:
themselves, and it was concluded that there was no Space-Group Symmetry, D. Reidel, Dordrecht, The Netherlands.
structural evidence for asymmetry.41
When such superpositions are performed about Problem 9–2: In the space group P3121 portrayed in
molecular rotational axes of symmetry that do not coin- Figure 9–6, every lactic acid molecule is related to one of
cide with crystallographic rotational axes of symmetry, it its neighbors by the same 2-fold rotational axis of sym-
is often found that flexible regions of the crystallographic metry. Draw two lactic acid molecules arranged around
molecular model do not coincide as well as more rigid that specific rotational axis of symmetry.
regions because they respond readily to variations in
their surroundings resulting from differences in crystal
Problem 9–3: The stereodiagram on the next page is
packing.42 For example, most of the a carbons of the five
based on the crystallographic molecular model of the
identical b subunits of heat-labile enterotoxin from
four individual molecules of the protein HPr from the
E. coli superpose to within 0.04 nm upon successive rota-
monosaccharide transport system of Streptococcus fae-
tions about the molecular 5-fold rotational axis of sym-
calis arranged in the tetragonal unit cell.47 Reprinted
metry, but the positions of the a carbons in the flexible
with permission from ref 47. Copyright 1994 Elsevier
loop between Glycine 54 and Serine 60 deviate by
B.V.
0.1–0.2 nm from each other.43 Unlike malate dehydroge-
nase from A. arcticum, cytoplasmic malate dehydroge- On three rectangles representing the front, the side, and
nase from Sus scrofa crystallizes in the space group the bottom, respectively, of the unit cell in the orienta-
P212121 with its a2 dimer in each asymmetric unit.44 The tion of the figure, indicate the positions of any screw axes
rotational angle around the molecular axis of symmetry of symmetry or rotational axes of symmetry passing
producing the superposition with the minimum root through the unit cell. Use symbols for rotational or screw
mean square deviation is 174 ∞ instead of 180 ∞. It was, axes of symmetry like those in the diagrams in Figures
however, concluded that this observation was mislead- 9–3 to 9–8.15
466 Symmetry
Problem 9–3 point group 2(C2),* two protomers are arranged around
a 2-fold rotational axis of symmetry to form a dimer. Half
of all homooligomeric proteins are dimers (Table 9–1)
the protomers of which, with few exceptions (Figure 9–2),
are arranged with the symmetry of point group 2(C2).
Malate dehydrogenase (Figure 9–1A) and k bungarotoxin
from Bungarus multicinctus (Figure 9–9)49 are examples
of oligomers of point group 2(C2).
k Bungarotoxin crystallizes in the space group P6
with an a2 dimer in the crystallographic asymmetric unit.
In the crystal, one protomer superposes on the other
upon a 178.5 ∞ rotation about the molecular axis of sym-
metry. With the exception of the flexible loops between
Cysteine 27 and Proline 36 and between Proline 15 and
Glutamine 18, which adjust malleably to the constraints
of crystal packing, the a carbons superpose to a root
mean square deviation of 0.05 nm around the molecular
rotational axis of symmetry, which probably differs from
180∞ also because of crystal packing. The interdigitations
of the side chains forming the interface mimic each other
across the axis of symmetry, for example, the hydropho-
bic cluster containing Isoleucine 20, Cystine 46/58, and
Valine 60 from one protomer and Phenylalanine 49 from
the other. The axis of symmetry itself runs through the
hydrogen bond between the two Glutamines 48. The
structure is unquestionably closed; every amino acid in
the common sequence of the two identical polypeptides
that is enclosed within the interface from one of the pro-
tomers is also enclosed from the other.
In larger proteins with more than one domain, it is
Oligomeric Proteins often the case that the interface forming the dimer con-
nects only one of the domains to its twin in the other pro-
A protomer is the smallest portion of a protein from
copies of which its entire quaternary structure is created.
The protomers of a homooligomeric protein are Table 9–1: Frequency of Homooligomers
arranged around rotational axes of symmetry, and the
number of protomers and the way in which they are number of subunits percent observeda
arranged designates the point group to which the
oligomer belongs. A point group is the distribution in 2 50
3 5
which a particular number of protomers are arranged
4 35
about one or more particular rotational axes of symme- 6 10
try oriented at particular angles to one another and inter- 8 3
secting at a common origin, in which all of the centers of 10 1
mass of the protomers are equidistant from this origin, 12 2
and in which all of the symmetrically related positions a
The table of oligomeric stoichiometries published by Darnell and Klotz48 was
are occupied. The finite and specific number of the pro- used to calculate these frequencies. Because this is a selected and incomplete list,
tomers, their equidistance from the origin, and the some numbers were rounded to the nearest 5%.
common intersection of all of the rotational axes of sym-
metry distinguishes the point groups from the space
groups as well as from the linear groups, which designate * There are two notations currently in use to identify the individual
open linear multimers such as helical polymers. point groups. Crystallographers use the Hermann–Maugin nota-
Oligomeric proteins have exploited all of the available tion, 2, 3, 4, …, 222, 322, 422, …, 23, 432, and 532, which will be the
point groups lacking mirror planes. notation used here out of parentheses. Spectroscopists use the
A cyclic point group arranges protomers in a circle Schönflies notation, C2, C3, C4, …, D2, D3, D4, …, T, O, and I, which
will be the notation used here within parentheses. Chemists other
about only one rotational axis of symmetry, and the dif- than crystallographers and spectroscopists will use one or the other
ferent cyclic point groups are distinguished by the fold of of these notations depending on who taught them point groups or
the axis. In the simplest of these cyclic point groups, what book they happened to open.
Oligomeric Proteins 467
tomer, but this limited interface nevertheless dictates the The existence of the local 2-fold rotational axes of
symmetry of point group 2(C2) for the whole dimer.50,51 pseudosymmetry running through the base pairs means
The domain (117 aa of the 450 aa in the intact protein) that for the sequence
forming the entire interface holding together the dimer
of glutathione-disulfide reductase52 has been detached
genetically and shown to form by itself a dimer.53 In the 9–2
a2 dimer of human hexokinase, each of the two subunits
contains two internally duplicated domains, each super-
posable on a complete subunit of hexokinase from yeast, which contains a split palindrome, rotation about the
and they are connected by an a-helical segment of six local 2-fold axis of pseudosymmetry through the central
turns. The amino-terminal domain of one subunit forms
an interface with the carboxy-terminal domain of the
other subunit and vice versa to form a dimer with two
well-separated but identical interfaces on either side of
7
the molecular 2-fold rotational axis of symmetry.54
38
29
Proteins that associate with double-helical DNA
17
are often dimeric, and such a dimer uses its own 2-fold
rotational axis of symmetry to recognize a local 2-fold
52
61
rotational axis of pseudosymmetry in the double helix of
the DNA. Regardless of its sequence, between the two
61
bases in any one of the pairs of bases in a molecule of
52
DNA and running perpendicular to the hydrogen bonds
between them is a local 2-fold axis of pseudosymmetry
17
(Figure 3–9). A local rotational axis is an axis of rotation
38
29
around which superpose upon one another only struc-
7
tural units immediately adjacent to that axis. Because a
real molecule of DNA is rarely straight, if the whole mol-
ecule of DNA is rotated around one of these local axes by
7
180 ∞, it roughly superposes on itself in the immediate
38
29
17
immediate vicinity because of its curvature. Because the
two bases in the central base pair are never the same, the
52
61
base pairs on either side of the central base pair are usu-
ally not the same, and the DNA is usually curved, this
local axis is always pseudosymmetric. Halfway between
61
52
any consecutive two of these 2-fold rotational axes of
pseudosymmetry and at an angle halfway between their
17
29
pseudosymmetry.
The existence of this second set of local 2-fold rota-
7
9–1
point group 3(C3) arise during evolution by natural selec- phosphate aldolase from E. coli,68 L-ribulose-phosphate
tion is about 10 times lower than the frequency with 4-epimerase from E. coli,69 and mammalian70 and bacte-
which dimers with symmetry of point group 2(C2) arise rial71 IMP dehydrogenase; a5 pentamers with cyclic
(Table 9–1). In fact, at one time it was thought that such symmetry of point group 5(C5), such as acetylcholine-
trimeric proteins did not exist, but there are now crystal- binding protein from Lymnaea stagnalis,72 the B subunit
lographic molecular models for a number of them.19,59–66 of heat-labile enterotoxin from E. coli,43 and human
It is by considering the problem of assembling a trimer serum amyloid P component;73 a6 hexamers with cyclic
relative to that of assembling a dimer that the reason for symmetry of point group 6(C6), such as transitional
the scarcity of trimers becomes apparent. endoplasmic reticulum ATPase from Mus musculus74
The interface is the feature that evolves and not the and the replicative DNA helicase encoded by the bacter-
oligomer. A dimer built around a 2-fold rotational axis of ial plasmid RSF1010;75 and a7 heptamers with cyclic sym-
symmetry is held together by one more or less continuous metry of point group 7(C7), such as transcriptional
interface centered on the rotational axis of symmetry.
The axis divides the complete interface into two identical
halves (Figures 9–1A and 9–9). Each half is the formal
equivalent to one of the three identical interfaces distrib-
uted around the 3-fold rotational axis of symmetry in a
trimer (Figure 9–11). The incremental decreases of free
energy associated with favorable mutations are not auto-
matically doubled during the evolution of an interface in
a trimer as they are in the evolution of an interface in a
dimer. Because termolecular collisions rarely occur, the
assembly of an oligomeric protein must proceed through
a series of bimolecular steps. A bimolecular collision pro-
ducing a dimer automatically involves the simultaneous
formation of the two halves of its interface and incorpo-
rates the free energies of formation of both halves into the
immediate product. The first step in the assembly of a
trimer, however, is the collision of two monomers to form
only one of its three interfaces. This first interface, stand-
ing alone, must exist long enough or form often enough
for the third protomer to complete the ring, yet it is the
evolution of this initial interface that does not benefit from
symmetry as does the evolution of the initial and final
interface of the dimer. Therefore, trimers should appear
less frequently than dimers during evolution. In favor of
the symmetric trimer, however, is the fact that, as with the
two halves of the interface in a dimer, its three identical
interfaces can evolve simultaneously, a fact that magnifies
the incremental decrease in free energy change in the over-
all formation of the complete oligomer for each favorable
mutation.
If the 3-fold axis of symmetry of chloramphenicol
line segments. This drawing was produced
Figure 9–11: Three-fold rotational sym-
metry of point group 3(C3). An a-carbon
diagram of chloramphenicol O-acetyltrans-
ferase from E. coli is drawn from the crystal-
lographic molecular model of this trimeric
protein.58 The three identical subunits are
distinguished by the different widths of the
activator NTRC1 from Aquifex aeolicus76 and the small * The size of an interface will be presented as the total accessible
nuclear ribonucleoprotein from Pyrobaculum surface area from its two participants that is buried upon its for-
aerophilum,77 are even more rare than trimers with cyclic mation. Consequently, each interface is formally defined as the
adhesive interaction between any two subunits that holds only
symmetry of point group 3(C3). Amazingly, however, those two subunits together in the complex. This definition ignores
there is a protein that is an a11 undecamer with symme- any cooperativity that arises in interfaces in which n subunits are
try of cyclic point group 11(C11).78 held together by a total of n interfaces around an n-fold rotational
Tetramers are the second most common type of axis of symmetry when n is greater than 2, such as the stability
oligomeric protein (Table 9–1). Almost all tetrameric pro- gained when the third subunit is added to complete a cyclic trimer
or the stability realized in the entwined cone of four carboxy ter-
teins have their protomers arranged in the symmetry of mini (Figure 9–12) in the four formal interfaces around the 4-fold
dihedral point group 222(D2) instead of the symmetry of rotational axis of symmetry in L-lactate dehydrogenase
cyclic point group 4(C4). A dihedral point group is a point (cytochrome).
Figure 9–13: A dimer of dimers with dihedral sym- Oligomeric Proteins 471
metry of point group 222(D2). (A) Point group
222(D2). Four spheres are placed at the vertices of
G
G
2
1
3
3
A
1
duced with MolScript.485
G
2
which are tilted somewhat from being normal to the
plane of the page and through which runs the exact hor-
izontal molecular 2-fold rotational axis in the plane of the
page (the interfaces 2 in panel A), are less extensive
(24 nm2 interface–1) and were designated as the inter-
faces holding together the two dimers to form the
tetramer. The interfaces along the exact 2-fold rotational
axis of symmetry normal to the plane of the page (inter-
faces 3 in panel A) are almost nonexistent because of the
squashing of the tetrahedron.
Almost all tetramers of dihedral symmetry of point
group 222(D2) are constructed along these lines, with an
B
terface and the bb interface, through which runs the another of the 2-fold rotational axes of symmetry and
molecular 2-fold axis of symmetry, are the rudimentary another identical dimer is placed where the image of the
pair, so the structure can be represented as (ab)2. first has been positioned, then the entire tetramer is cre-
The oxygenated form of hemoglobin in solution ated.
participates in the dissociation If the tetrahedron of Figure 9–13A is squashed flat,
the structure becomes a ring of four protomers that
Kd vaguely resembles a ring with cyclic symmetry of point
(ab )2 1 2ab (9–2)
group 4(C4), often with a large hole in the middle.88,89
Nevertheless, it is easy to distinguish the former from the
and the value of the dissociation constant latter because its ring has dihedral symmetry of point
group 222(D2). The orientation of the protomers in an
[ab ]2 oligomer with dihedral symmetry alternates up-down-
Kd = (9–3) up-down around the ring, and the two orthogonal 2-fold
[(ab )2 ] rotational axes of symmetry in the plane of the ring
between every pair of subunits, which do not exist in a
is 1 mM.85 That the same interface always remains in the structure with cyclic symmetry, remain.
dimer follows from the fact that no hybrid dimers of As with protocatechuate 3,4-dioxygenase (Figure
the type a ¢b or ab ¢ are formed under conditions where 9–2), which is a rare example of a dimer the subunits of
the reaction of Equation 9–2 is rapidly interconverting which are arranged around a screw axis of symmetry,
tetramers and dimers in a mixture of two hemoglobins, lac repressor from E. coli90 and the lectin from Arachis
(ab)2 and (a ¢b ¢ )2, from two species, dog and human, hypogaea91 are tetramers in each of which a pair of rota-
respectively, even though hybrids of the type (ab )(a ¢b ¢) tionally symmetric dimers of identical subunits is
did form readily.86 In this experiment, hemoglobins from arranged around a screw axis of symmetry.
the two species were used to permit isoelectrophoretic Ribulose-phosphate 3-epimerase from chloroplasts
separation of the hybrids (as in Figure 8–18). As a control, of Solanum tuberosum (Figure 9–14A)92 is an (a2)3 hexa-
it was shown that the hybrid dimers a ¢b and ab ¢ could be mer with the symmetry of dihedral point group 322(D3)
artificially formed and easily separated isoelectrophoret- (Figure 9–14B), phosphoribulokinase from Rhodobacter
ically, from each other and from the parent dimers ab sphaeroides (Figure 9–15A)93 is an (a2)4 octamer with the
and a ¢b ¢. The a ¢b and ab ¢ hybrids also could not be shuf- symmetry of dihedral point group 422(D4) (Figure
fled by the reaction of Equation 9–2. Therefore one of the 9–15B), and peroxiredoxin from Crithidia fasciculata
two different pairs of interfaces between a and b sub- (Figure 9–16)94 is an (a2)5 decamer with the symmetry of
units in the hemoglobin tetramer84 must be much dihedral point group 522(D5). The crystallographic
stronger than the other. molecular model of the (a2)3 hexamer of ribulose-phos-
A number of other oligomeric proteins also partici- phate 3-epimerase has a central molecular 3-fold rota-
pate in dissociations the equilibrium constants for which tional axis of symmetry and three molecular 2-fold
are large enough to be measured. In fact, the dimer of rotational axes of symmetry at angles of 60 ∞ to each
interleukin-8 that is observed crystallographically has a other, each of them orthogonal to the central axis and
dissociation constant so large that the protein is actually one of them exact. The crystallographic molecular model
a monomer at physiological concentrations.87 Most of the (a2)4 octamer of phosphoribulokinase has a cen-
oligomeric proteins, however, have interfaces so strong tral, exact 4-fold rotational axis of symmetry and four
that their dissociation does not occur within normal exact 2-fold rotational axes of symmetry at angles of 45 ∞
ranges of concentration. to each other and all of them orthogonal to the central
A molecular asymmetric unit is the smallest unit of axis. The crystallographic molecular model of the
the structure of an oligomeric molecule that, when sub- (a2)5 decamer of peroxiredoxin has five molecular 2-fold
mitted to the appropriate symmetry operations, creates rotational axes of symmetry at angles of 36 ∞ to each
the entire structure. The individual subunits of the cyclic other, all of them orthogonal to the central molecular
oligomers in Figures 9–9 to 9–12 are the molecular asym- 5-fold rotational axis of symmetry.
metric units of their respective molecules. In the In the dihedral point groups of odd fold (3-fold,
(a2)2 tetramer of 2,2-dialkylglycine decarboxylase (pyru- 5-fold, and 7-fold), the interfaces found at the two ends
vate) (Figure 9–13B), the molecular asymmetric unit is, of each 2-fold rotational axes of symmetry, although
by inspection, the single folded polypeptide. If the one rotationally symmetric about the axis, are different from
polypeptide is positioned in space, its image is rotated each other (Figure 9–14B); in the dihedral point groups of
180 ∞ around any one of the 2-fold rotational axes of sym- even fold (4-fold and 6-fold), the interfaces found at the
metry, and another identical folded polypeptide is two ends of each 2-fold rotational axis of symmetry are
placed where the image of the first has thus been posi- the same, but there are two different kinds of 2-fold rota-
tioned, one of the three possible dimers in the tetramer tional axes of symmetry that alternate around the central
is created. If the image of this dimer is then rotated about axis (Figure 9–15B). Nevertheless, in both instances, odd
Oligomeric Proteins 473
and even, there are only two types of interfaces associ- an upper subunit and a lower subunit in neighboring
ated with the 2-fold rotational axes of symmetry regard- dimers.94 In phosphoribulokinase, however, the six iden-
less of how large the fold (hence the notation n22). A tical interfaces connecting the three upper subunits to
third type of n-fold symmetric interface can occur across each other and the three lower subunits to each other are
the central n-fold axis. as extensive as the five identical interfaces between a
In oligomeric proteins with dihedral symmetry lower subunit of one dimer and an upper subunit of a
there are usually a set of n strong, identical interfaces dis- neighboring dimer, but the interfaces holding the dimers
tributed around the central axis that connect pairs of sub-
units into dimers. Examples are the three interfaces
approximately normal to the plane of the page each con-
necting an upper subunit with a lower subunit in Figure
9–14A and the five interfaces at about 2 o’clock, at about
5 o’clock, at 7 o’clock, at about 10 o’clock and at about 12
o’clock in Figure 9–16. As is the case with those in isolated
dimers with cyclic symmetry, each of these interfaces has
a 2-fold rotational axis of symmetry running through its
center. In ribulose-phosphate 3-epimerase, phospho-
ribulokinase, and peroxiredoxin, these interfaces forming
the dimers are more extensive than the interfaces joining
the dimers into the hexamer, the octamer, or the decamer,
respectively, so the proteins are a trimer of dimers, a
tetramer of dimers, and a pentamer of dimers, respec-
tively, all with dihedral symmetry.
A
There are three configurations in which such
dimers can be assembled into rings with dihedral sym-
metry: eclipsed, staggered, and splayed (Figure 9–17).
The dimers in ribulose-phosphate 3-epimerase (Figure
9–14A) are eclipsed; those in ribulokinase (Figure 9–15A)
are staggered; and those in peroxiredoxin (Figure 9–16)
are splayed. In ribulose-phosphate 3-epimerase the only
interfaces holding the three dimers together are the six
identical ones, three between the subunits in the upper
ring and three between the subunits in the lower ring,92
and in peroxiredoxin the only interfaces holding five
dimers together are the five identical ones, each between
Figure 9–14: A trimer of eclipsed dimers with dihedral
symmetry of point group 322(D3). (A) a-Carbon diagram of
the homohexamer of ribulose-phosphate 3-epimerase from
chloroplasts of S. tuberosum, drawn from the crystallo-
COOH
H COOH
COOH
CH3
H
H3C
HO
H
H COOH O
H3C
H
was produced with MolScript.485
CH3
B
H
O
O HH
3
CHCH
H
3
HOOC
HOOC
H
474 Symmetry
HOOC
HOOC
H 3C
CH3
HOOC
O
O
H
H
O
O
H
H
H
CH3 H3C
CH3H3C
B
H
H
COOH
COOH
O
O
H
H
H
O
H
O
H C
3
CH3
H
H
COOH
COOH
family of superoxide dismutases, a fundamental a2 dimer, their folded homologues from E. coli and M. musculus at
which is superposable among the proteins from different their peripheries.122
species, either stands alone or is arranged in several dif- When monomers are transformed by evolution into
ferent ways to form distinct (a2)2 tetramers.109 The cyto- a homooligomeric ring with cyclic symmetry (Figures 9–9,
plasmic ribulose-phosphate 3-epimerases from animals 9–11, and 9–12) or when homodimers are transformed by
and plants are a2 dimers rather than (a2)3 hexamers, as are evolution into a homooligomeric ring with dihedral sym-
those from chloroplasts (Figure 9–14A).92,110 In fact, the metry (Figures 9–14A, 9–15A, and 9–16), a face and the
dimeric ribulose-phosphate 3-epimerases from the cyto- complement to that face must be created on the surface
plasms of fungi and animals are more closely related to of the monomer or the monomer in the dimer, respec-
the dimeric ribulose-phosphate 3-epimerase from a given tively. The face and its complement must be positioned,
plant than is the hexameric ribulose-phosphate
3-epimerase from its own chloroplasts. The dihydrodi-
picolinate synthases from Nicotiana sylvestris and E. coli
are tetramers with dihedral symmetry assembled from
superposable a2 dimers, but the dimers face each other in
opposite directions in the two proteins.111 Consequently,
it seems that most oligomers with dihedral symmetry
are (a2)2 dimers of dimers, (a2)3 trimers of dimers,
(a2)4 tetramers of dimers, (a2)5 pentamers of dimers,
and (a2)6 hexamers of dimers.
There are of course exceptions to the observation
that most proteins with dihedral symmetry are assem-
bled from symmetric dimers. Histidine decarboxylase
from Lactobacillus is a dimer of trimers,112
glutamate–ammonia ligase from Salmonella
typhimurium is a dimer of hexamers,113 and human
serum amyloid P component is a dimer of pentamers.114
There are also two proteins, a chaperonin115 and an intra-
cellular, multifunctional endopeptidase116 that are
[(ab)7]2 dimers of heptamers. The fact that the octamer of
IMP dehydrogenase from Tritrichomonas foetus with
dihedral symmetry of point group 422(D4) dissociates
into two tetramers with a dissociation constant of 1 mM
demonstrates that it is a dimer of two tetramers, each of
cyclic symmetry.117
In proteins with cyclic symmetry such as
dimers118–120 or trimers, tetramers, and pentamers121 as
well as proteins with dihedral symmetry,106 related super-
posable monomers have been assembled by evolution
around different interfaces or into different quaternary
structures. The malleability of the arrangements of pro-
tomers with both dihedral and cyclic symmetry within
sets of oligomers of the same family of proteins or even
Figure 9–16: A pentamer of splayed dimers
with dihedral symmetry of point group
522(D5). An a-carbon diagram of peroxire-
tures with the same amino acid sequence, and the axis occurred to the ancestor of 5-carboxymethyl-2-hydroxy-
has become a 2-fold rotational axis of pseudosymmetry. muconate D-isomerase. It is also a trimer with subunits
There is at least one example of a protein in which a containing internally duplicated domains around local
single folded polypeptide has an internal 3-fold rota- 2-fold rotational axes of pseudosymmetry orthogonal
tional axis of pseudosymmetry133,134 and one with an to the central 3-fold rotational axis of symmetry.
internal 5-fold rotational axis of pseudosymmetry,135 4-Oxalocrotonate tautomerase is an enzymatically
presumably the remains of an ancestral trimer and an related protein, the monomer of which is homologous to
ancestral pentamer, respectively. The first and the third both of the duplicated halves of the monomer of
of the three domains in the single folded polypeptide 5-carboxymethyl-2-hydroxymuconate D-isomerase.
composing pyruvate oxidase from Lactobacillus plan- 4-Oxalocrotonate tautomerase is an a6 hexamer with
tarum superpose on each other with a root mean square dihedral symmetry, the entire structure of which super-
deviation of 0.19 nm upon rotation of 190 ∞ around a
rotational axis of pseudosymmetry between them.136
These two halves of the duplicated gene, when undupli-
173 285
cated, encoded the two identical subunits of an a2 dimer.
197
70
There is, however, another unrelated domain of about
the same size inserted into the polypeptide between the
151
two that nevertheless remain symmetrically arrayed.
117
There are also examples of proteins in which an
208
early gene duplication, which produced two domains
now related by a 2-fold axis of pseudosymmetry bearing
236
witness to that duplication, was then followed by a later
4
96
262
gene duplication. This later duplication produced
52
another 2-fold axis of pseudosymmetry relating two
22
copies of the product of the early duplication.137,138
132
Because the three 2-fold axes of pseudosymmetry in each
of these proteins, the two early ones and the one later
one, are almost parallel to each other, the original pro-
tein must have been an a2 dimer and after its two identi-
173 285
cal subunits were fused, the product of the fusion then
197
70
evolved to become an a2 dimer, the identical subunits of
151
208
haps on its way to yet another duplication.
236
4
F46
E49
Y63
E49
D57
Y63
D57
N424
Arg279 O Arg279 N 0.289
N431
Tyr280 OH Gln303 OE1 0.336
Gln303 OE1 Tyr280 OH 0.344
H426
W429
H426
a
The donor or acceptor in one subunit (subunit A) and its acceptor or donor,
W429
respectively, in the other subunit (subunit B) across the interface in the crystallo-
N424
graphic molecular model142 of the homodimer of the carboxylesterase from
A. fulgidus are tabulated. bThe atom performing the donation or the acceptance
N431
in each amino acid is in the notation of Figure 4–14. The crystallographic abbre-
N431
W429
viations for an acyl oxygen of the backbone and an amido nitrogen of the back-
bone are O and N, respectively. Two dimers together form the crystallographic
H426
asymmetric unit so the symmetrically arrayed hydrogen bonds have different
N424
lengths.
N424
is required is that hydrogen–carbon bonds be removed
from the aqueous phase and sequestered within the N431
H426
H426
is inconsequential.
Almost all of the interfaces between subunits in W429
N424
N424
symmetry.151–153 In the crystallographic molecular model the a2 dimers of concanavalin A,165 alcohol dehydroge-
of the a2 dimer of human nitric-oxide synthase, a Zn2+ nase (Figure 6–9), k bungarotoxin (Figure 9–9), and heme-
cation sits upon the molecular 2-fold rotational axis, sym- binding protein 23 from R. norvegicus.166 A b sheet of six
metrically bound by the two Cysteines 110 and the two strands is orthogonally (Figures 6–33 and 6–34) and sym-
Cysteines 115;154 and in the crystallographic molecular metrically packed against an identical b sheet of six
model of the (a2)2 tetramer of methionine adenosyltrans- strands from the other subunit in the a2 homodimer of
ferase from R. norvegicus, a K+ cation sits upon one of the the lectin from A. hypogaea,167 and a b sheet of six strands
molecular 2-fold rotational axes of symmetry symmetri- is packed in parallel and symmetrically against an iden-
cally bound by the four amido oxygens of the backbone tical b sheet of six strands from the other subunit of the
from the two positions 264 and the two positions 265.155 a2 dimer of glucose-fructose oxidoreductase from
Most of the interfaces between subunits in Zymomonas mobilis.168 In the a4 cyclic tetramer of each
oligomeric proteins are formed from two complemen- half of the dihedral octamer of dihydroneopterin aldolase
tary faces, each of which is a portion of the surface of its from Staphylococcus aureus169 and in the a2 dimer of each
globular, folded polypeptide, and this portion is no more half of the dihedral tetramer of urate oxidase from
irregular than the usual exposed surface of the usual Aspergillus flavus,121 each subunit contributes four
globular, folded polypeptide. In some instances, for b strands or eight b strands, respectively, to the dramatic
example, glucose oxidase from Aspergillus niger,156 antiparallel b barrel of 16 strands in the center of these
superoxide dismutase from Pseudomonas ovalis,157 chlo- oligomers. In the pentameric rings within the coat pro-
ramphenicol O-acetyltransferase (Figure 9–11), and ribu- tein of rhinovirus 14, each of the identical subunits con-
lose-phosphate 3-epimerase (Figure 9–14A), the two tributes one b strand to the parallel b barrel of five strands
faces are almost flat and the resulting interface is almost in the center of the oligomer.170
planar. Often, however, segments of secondary structure Subunits in some homooligomeric proteins are joined
or loops between secondary structures will penetrate to each other by structural swapping.171 When the subunit
superficially the subunit across the interface (Figures in such an oligomer is in its monomeric form, it is a com-
9–1A and 9–13B). pact, globular structure. When that monomer combines
In contrast to such classical interfaces, there are with another identical monomer, however, one or more of
interfaces that are formed from regular arrays of sec- its elements of secondary structure, for example, an amino-
ondary structure. The most common examples of this terminal a helix,172 a b hairpin,173 two a helices and two
type of interface are those in oligomers that are held strands of b structure,174 or a structural domain,171 takes the
together by coiled coils of a helices, as are general con- place of its twin on the other subunit, and its twin takes its
trol protein GCN4 (Figure 6–29) and methyl-accepting place on its own subunit. Because the two elements of struc-
chemotaxis protein II (Figure 6–30). The interface ture that have swapped are identical to each other, each can
between the two subunits in the a2 dimer of the variant fit precisely into the cavity vacated by the other. The result-
surface glycoprotein from Trypanosoma brucei is an ing a2 dimer is held together by the respective strands of
antiparallel coiled coil of four a helices, each about 50 aa polypeptide connecting each swapped segment with the
long;158 the interface between the two subunits of translo- rest of its subunit. Often it is only these strands of random
cated intimin receptor is an antiparallel coiled coil of four meander that hold the two subunits together and no formal
a helices, each about 20 aa long;159 the interface con- interface is formed between them.175 Conclusive proof of
necting the three subunits of human mannose-binding structural swapping requires a crystallographic molecular
protein is a parallel coiled coil of three a helices, each model of the unswapped monomer and the swapped dimer
20 aa long;160 and the interface between the two a2 dimers so that it can be shown that the swapped segments occupy
of the tetrameric lac repressor from E. coli is an antipar- in the same orientation the same locations in the dimer that
allel coiled coil of four a helices, one from each subunit.90 were occupied by the unswapped segments in their respec-
The central cores holding together the four subunits of tive monomers.171,172,176–178 In the coat protein of bacterio-
the (a2)2 tetramers of fumarate hydratase II from E. coli,161 phage MS2, however, two of the three subunits in a
histidine ammonia-lyase from P. putida,162 and adenylo- homotrimeric substructure have swapped a b hairpin but
succinate lyase from P. aerophilum163 are bundles of 20 in the third subunit that b hairpin occupies the same loca-
antiparallel a helices, five from each subunit and each tion on its own subunit occupied by the swapped b hair-
about 30 aa long. pins on the other two subunits.173
Regular arrays of b structure also are used to con- The requirements for structural swapping have
nect subunits of oligomeric proteins together. In the been examined by site-directed mutation.177,179 Site-
a3 trimer of UDP-N-acetylglucosamine diphosphorylase, directed mutation has also been used to convert an oth-
the interfaces are formed by three identical b helices, one erwise monomeric protein into a structurally swapped
from each subunit that run parallel to each other around dimer.180
the molecular 3-fold rotational axis of symmetry, which Conclusive evidence of structural swapping is avail-
coincides with a crystallographic axis.164 Continuous able for only a few proteins, but there are a number of
b-pleated sheets run from one subunit into the other in oligomeric proteins in which one or more segments of the
Oligomeric Proteins 481
polypeptide forming one subunit reaches over to embrace trimers of the carboxy-terminal domains (200 aa) of the
its neighboring subunit just as the same segments from its e subunits directed outward in upper and lower rings of
neighbor symmetrically embrace it. One example of this six.
is the a2 dimer of glucose-6-phosphate isomerase from The globin subunits of the protein, each containing
O. cuniculus;181 another is the a3 trimer of 4-chlorobenzoyl- a heme, come in four isoforms (a, 151 aa; b, 145 aa; g,
CoA dehalogenase from Pseudomonas.62 In the symmet- 153 aa; and d, 142 aa). They first pair as ab dimers and
ric a2 dimer of human interleukin-5, the carboxy-terminal gd dimers with cyclic pseudosymmetry of point group
24 amino acids of each subunit run symmetrically down 2(C2). An ab dimer and a gd dimer then assemble around
one side of the other in random meander and then turn to a molecular 2-fold rotational axis of pseudosymmetry,
run across the surface of the other in an a helix.182 In the but their own local 2-fold rotational axes of pseudosym-
a2 dimer of ADP-ribose diphosphatase from E. coli, the metry intersect the central 2-fold rotational axis of pseu-
first 57 amino acids of each subunit form a three-stranded dosymmetry of this tetramer at angles of 54∞186 instead of
antiparallel b sheet that lies upon the surface of the 90∞, as they would in a tetramer of dihedral pseudosym-
remaining globularly folded 153 amino acids of the other metry. Nevertheless, the structure is closed not because
subunit,183 and these 57 amino-terminal amino acids are of steric exclusion but because the subunits are not iden-
missing in monomers from the same family. In the tical. The d subunits of three copies of this asymmetric
(a2)2 tetramer of catalase from Penicillium vitale, the first tetramer then associate with each other through three
13 amino-terminal amino acids of one subunit are identical interfaces around a 3-fold rotational axis of
threaded through a large loop of 39 amino acids bulging symmetry to produce a symmetric [(ab)(gd)]3 trimer of
out of the surface of its symmetric twin and vice versa to asymmetric tetramers. One of these trimers of tetramers
form a dimer in which each subunit is hooked to the then attaches to each of the 12 globular trimers directed
other.184 outward from the dihedral core of e subunits to produce
The ultimate expression of cyclic and dihedral sym- the final molecule containing 36 e subunits and 144
metry is erythrocruorin. Erythrocruorin from Lumbricus globin subunits.
terrestris is an{[(ab)(gd)]3}12e36 oligomer of 180 folded The strategy employed to assemble erythrocruorin
polypeptides with a total of 29,916 aa arranged with provides a way of assembling more than 100 folded
dihedral symmetry of point group 622(D6).185 The core of polypeptides into an enormous structure by a hierarchy of
the protein, which confers the dihedral symmetry, is a symmetries. It is also possible to accomplish the same goal
hexamer of dimers of trimers. Each of the 36 e subunits even more dramatically with hexagonally expanded icosa-
(240 aa) of this core is first assembled into a homotrimer hedral symmetry.
with cyclic symmetry. Along the 3-fold rotational axis of
symmetry of the trimer, the amino-terminal 50 aa of Suggested Reading
each subunit forms a parallel coiled coil of three a helices
Buehner, M., Ford, G.C., Moras, D., Olsen, K.W., & Rossmann, M.G.
holding the three subunits together. These coiled coils (1974) Three-dimensional structure of D-glyceraldehyde-
then combine in pairs to form dimers of trimers. Six of 3-phosphate dehydrogenase, J. Mol. Biol. 90, 25–49.
these dimers of trimers assemble around a central 6-fold Royer, W.E., Jr., Heard, K.S., Harrington, D.J., & Chiancone, E.
rotational axis of dihedral symmetry in a splayed array, (1995) The 2.0 A crystal structure of Scapharca tetrameric hemo-
the interfaces of which are formed between the coiled globin: cooperative dimers within an allosteric tetramer, J. Mol.
coils of a helices. This structure displays the 12 globular Biol. 253, 168–186.
482 Symmetry
Problem 9–4: Make a xerographic copy of the following Problem 9–5: The following diagram is based on the
figure, reprinted with permission from ref 23, copyright crystallographic molecular model of transketolase.39 This
1983 European Journal of Biochemistry. drawing was produced with MolScript.485
Using a ruler, draw all of the rotational axes of sym-
(A) How many protomers does the protein contain,
metry on one of the two members of the stereo pairs. Use
what types of axes of symmetry does the structure
the abbreviations for rotational axes found in the
contain, and what are their locations in the struc-
International Tables for Crystallography.15
ture?
Oligomeric Proteins 483
The following diagram is based on the crystallographic Problem 9–6: The following figure is a drawing of the
molecular model of glycerate dehydrogenase.187 This draw- region of a crystallographic molecular model of
ing was produced with MolScript.485 glutathione synthase188 that includes a portion of one
of the interfaces between the protomers. This drawing
(B) How many protomers does the protein contain,
was produced with MolScript.485
what types of axes of symmetry does the structure
contain, and what are their locations in the struc- (A) What type of axis of symmetry runs through the
ture? figure?
(B) Describe in detail the location of the axis of sym-
metry in the portion of the structure presented in
the figure by naming the three amino acid side
chains in each protomer that are immediately
adjacent to that axis of symmetry
484 Symmetry
Problem 9–7: A crystallographic molecular model of (A) How many domains are there in the portion of the
cytidine deaminase from E. coli has been constructed. crystallographic molecular model shown in the
The protein is a homodimer formed from two identical figure?
folded polypeptide chains, each 294 aa in length. The fol-
(B) What criterion did you use to decide how many
lowing figure is a tracing of the a carbons from
domains there are?
Glutamate 49 to Alanine 294 in one of the two folded
polypeptides in the crystallographic molecular model.189 (C) By what pseudosymmetry operation are the
This drawing of the crystallographic molecular model domains related to each other, and where is
was produced with MolScript.485 the axis of pseudosymmetry located in the figure?
(D) How did this structure arise during evolution?
Below is an alignment of the segment of the sequence of
cytidine deaminase from E. coli between Glutamate 49
93
55
49 EDALAFALLPLAAACARTPLSNFNVGAIARGVSG
183 GYALTGDALSQAAIAAANRSHMPYSKSPSGVALECKDG
237
TWYFGANMEFIGATMQQTVHAEQSAISHAWLSGEK--ALAAI
RIFSGSYAENA--AFNPTLPPLQGALILLNLKGYDYPDIQRA
80
TVN---YTPCG--HCRQFMNELNSGLDLRIHLPGREAHALRD
203
VLAEKADAPLIQWDATSATLKALGC----HSIDRVLLA 294
269
294
179
YLPDAFGPKDLEIKTLL 177
(E) This alignment is based on the structure shown in
the figure above. How was this alignment per-
formed?
93
151
(G) There are three axes in the figure: horizontal, ver- units. Arrange 12 identical subunits around local molec-
tical, and normal. Designate correctly each of ular rotational axes of symmetry, including the global
these three axes as 2-, 3-, 4-, or 6-fold axes of sym- 2-fold rotational axis of symmetry of the dihedral point
metry or axes of pseudosymmetry. One of these group, to produce a segment composing one fifth of a
axes is a crystallographic axis of symmetry. Which cylinder.
is it?
151
151
and what type of oligomer usually has this type of
55
55
80
symmetry?
80
124
124
The amino acid sequence of cytidine deaminase
from B. subtilis is
179
179
MNRQELITEALKARDMAYAPYSKFQVGAALLTKDGKVYRGCNIE
NAAYSMCNCAERTALFKAVSEGDTEFQMLAVAADTPGPVSPCGA
CRQVISELCTKDVIVVLTNLQGQIKEMTVEELLPGAFSSEDLHD
93
93
269
ERKL
269
294
(I) Align this sequence with the sequence of amino
294
acids 49–177 of the protein from E. coli. The most
conserved region in these two sequences is PCGX-
CRQ, which contains amino acids from the active
151
site of cytidine deaminase. Aligning these two
151
regions from the two proteins will give you a start
55
80
55
in the alignment. Put in gaps and try to get the
80
124
best alignment.
124
(J) For your alignment, what is the percentage of 179
179
identity and how many gaps are there? In calcu-
lating the percentage of identity, assume that the
length of the aligned region is the average of the
lengths of the two aligned sequences. 93
93
269
(K) Which are more closely related, the two 269
294
294
a
A rotational axis or symmetry is a line passing through the center of the oligomeric structure. Because it is a line, it extends in both directions
from the center. As a result, each axis of symmetry passes out of the oligomeric structure at two opposite points, and at each of these two points
there is a symmetric arrangement of asymmetric objects on the surface of the structure. bThese are the angles between the immediately adja-
cent axes of the same fold.
G
that illustrate the types of rotational axes of symmetry in 3 3
G
each of these three point groups and their orientations in G G
space. These are Kepler’s rhombic dodecahedron (Figure 2 4
G
G
G
2 4
G
G
G
9–21A), the triangular expansion of Kepler’s rhombic
dodecahedron (Figure 9–21B), and the triangular expan-
3 G G 3
G
G
5
ily asymmetric protomers. G G G 3
In the tetrahedral point group 23(T), 12 identical 3 G
protomers are arranged about four isometrically spaced 5
3-fold rotational axes of symmetry (at 70.53 ∞ to each
other) and three orthogonal 2-fold rotational axes of Figure 9–21: Regular polyhedra that have isometric symmetries:192
symmetry that all intersect at a common origin (Figure (A) the rhombic dodecahedron with tetrahedral symmetry of point
group 23(T), (B) the tetracosahedron that is the triangular expan-
9–21A). When 12 protomers of protein are assembled in sion of the rhombic dodecahedron and that represents octahedral
the tetrahedral point group 23(T), they do not fit into the symmetry of point group 432(O), and (C) the hexacontahedron that
neat geometrical boundaries of any polyhedron. is the triangular expansion of a rhombic triacontahedron and that
has icosahedral symmetry of point group 532(I). Kepler derived the
rhombic triacontahedron from the intersection of the dodecahe-
* The unexpanded rhombic triacontahedron of Kepler, although dron and octahedron by connecting the vertices at the 5-fold and
constructed from 30 faces arrayed around rotational axes of 3-fold axes of symmetry with lines. In each of the figures, rotational
symmetry, cannot accommodate 30 asymmetric objects. axes of symmetry on the circumference are labeled with the
Consequently, it does not represent a point group. number of their fold.
Isometric Oligomeric Proteins 487
G
C
other by the other end of that same axis.
K H
Several of the proteins with tetrahedral symmetry
I
are such tetramers of trimers. In dilute solutions of
B
guanidinium chloride, 3-dehydroquinate dehydratase
dissociates into its constituent trimers.194 Phaseolin from
A
P. vulgaris is a trimeric protein139 that associates to form
a dodecamer195 with tetrahedral symmetry139 below
L
pH 4.5. The self-rotation function of the asymmetric unit
E
of crystals of catabolic ornithine carbamoyltransferase
D
from Pseudomonas aeruginosa has the maxima consis-
J
F
tent with an oligomer with tetrahedral symmetry,196 and
when the protein is cross-linked the major covalent
species are trimer, hexamer, nonamer, and dodecamer, a
result consistent with the trimer being the fundamental
G
C
unit.197
In contrast to these three dodecamers, the dode-
K H
I
B
pentamer in the usual way that two identical proteins are example, the one containing monomer V) and the
associated, which is around a 2-fold rotational axis of creases between the remaining three dimers were made
symmetry. This particular 2-fold rotational axis of sym- more acute, the space previously occupied by the miss-
metry, however, happened to incline the two pentamers ing dimer would close up and a new interface could form
with respect to each other so that their respective 5-fold (one between dimer I/II and the side of monomer IV)
rotational axes of symmetry both intersected the 2-fold now identical to the other two around the once 4-fold but
axis of symmetry and formed the angle required to exist now 3-fold axis of symmetry. If this transformation were
between the 5-fold rotational axes of symmetry in an performed at each end of each 4-fold axis in vertebrate
icosahedron, which is 63 ∞. If two interfaces, the one ferritin, six dimers would be removed, the structure
defining the pentamer and the other at the 2-fold rota- would be converted from a tetracosamer with octahedral
tional axis defining an angle of 63 ∞, are built into faces on
a protomer, 60 such protomers will automatically assem-
ble into an icosahedral shell.
In the protein coat of satellite panicum mosaic
virus, the interfaces holding the trimers together around
the 3-fold axes of symmetry (15.7 nm2 interface–1) are
more extensive than those holding together the dimers
around the 2-fold axes (12.8 nm2 interface–1) or the pen-
tamers around the 5-fold axes (10.5 nm2 interface–1).201 If
the trimer was the fundamental unit from which the pro-
tein coat arose, the evolution of two complementary
faces on the trimer that formed a dimer of trimers inclin-
ing the two 3-fold rotational axes of symmetry at 42 ∞
would also automatically create the entire icosahedrally
symmetric oligomer of 60 subunits.
Vertebrate ferritin is a tetracosamer the identical
subunits of which are arranged with octahedral symme-
try (Figure 9–25A),205,206 but ferritin from Listeria innocua
is a dodecamer the identical subunits of which are
arranged with tetrahedral symmetry.207 These are exam-
ples of two different quaternary structures for the same
species of protein. The subunits of vertebrate ferritin
and the ferritin from L. innocua are both antiparallel
coiled coils of four a helices (Figure 9–25B) that are
superposable on each other and consequently homolo-
gous to each other. In both proteins, these coiled coils of
a helices associate side by side in opposite directions to
form dimers around 2-fold rotational axes of symmetry,
and the interfaces forming the dimers from the two pro-
teins are homologous to each other. The dimers, in turn,
in each protein combine in triplets around 3-fold rota-
tional axes of symmetry in which the three identical
interfaces around each axis are formed by the two
Figure 9–24: Icosahedral symmetry of point
group 532(I). An a-carbon diagram of one
hemisphere of the protein coat of satellite
panicum mosaic virus is drawn from the
crystallographic molecular model of the pro-
tein.201 The protein crystallized in the space
group P4232 with a pentamer of subunits,
each of 157 aa, in the crystallographic asym-
metric unit, the smallest asymmetric unit
symmetry.
Individual subunits have been drawn with
line segments with one of three different
thicknesses or shadings. Only 34 of the 60
subunits are drawn to make the structure
easier to visualize. These 34 subunits form
trimers associate with each other around 5-fold rota- different situations to the requirements of two rotational
tional axes of symmetry. The respective interfaces at the axes of symmetry of different fold. The respective homol-
2-fold rotational axes of symmetry between these homol- ogous subunits in the two different quaternary structures
ogous trimers that are arranged around these two differ- of ferritin or in the two different quaternary structures of
ent rotational axes of symmetry adapt flexibly to the dihydrolipoyllysine-residue acetyltransferase are quasi-
dispositions of the trimers that are required by those equivalent to each other, but because in each case they
axes.210 In these two different quaternary structures of have different amino acid sequences, it is the differences
these homologous acetyltransferases, the octahedral and in amino acid sequence that might explain their abilities
the icosahedral, the homologous complementary faces to assume the different dispositions. Furthermore, the
on two different, but homologous, trimers must accom- two rotational axes of symmetry of different fold to which
modate in the one case a 4-fold rotational axis of sym- they adapt are in different oligomers. In many viral pro-
metry and in the other case a 5-fold rotational axis of tein coats, however, the several quasi-equivalent sub-
symmetry. units are formed from folded polypeptides of the same
Ferritin, small heat shock protein, and dihydro- sequence, and the quasi-equivalent subunits are found
lipoyllysine-residue acetyltransferase, because they each together in the same icosahedral shell.
are examples of the same protein having different qua- To understand the relationships of such quasi-equiv-
ternary structures, reinforce the conclusion that quater- alent subunits to each other in such an oligomer, the local
nary structure contains no information relevant to the rotational axes within a protomer must be distinguished
evolution of proteins. The examples of ferritin and dihy- from the global rotational axes of icosahedral symmetry
drolipoyllysine-residue acetyltransferase also illustrate governing the entire structure. A global rotational axis of
the ability of two different subunits, formed respectively symmetry is an axis of symmetry around which a rotation
from closely related polypeptides, to assume similar of 360∞/n causes the entire oligomer to superpose upon
arrangements around rotational axes of symmetry of itself. A global rotational axis is thus distinguished from a
different fold. It is this ability of interfaces to adjust flex- local rotational axis that operates only on structural units
ibly to different rotational axes of symmetry that has in its immediate vicinity.
been exploited by evolution to increase the size of viral The triangle from which the expanded rhombic tri-
protein coats. acontahedron (Figure 9–21C) is constructed, although
Although there are a few other viruses like satellite formally an asymmetric object because two of its vertices
panicum mosaic virus that have protein coats assembled lie at global 3-fold rotational axes of symmetry and one
from only 60 subunits arranged with 532(I) symme- vertex lies at a global 5-fold rotational axes of symmetry
try,203,214,215 the problem with such protein coats is that and only one of its three edges lies on a global 2-fold rota-
they are too small. In order to enclose enough DNA to tional axis of symmetry, is nevertheless an equilateral tri-
accomplish a successful subversion of the host, the pro- angle and locally symmetric. Were the equivalent mass of
tein coats usually must be larger. In two instances, this three identical folded polypeptides, related to each other
problem has been solved by using an elongated homo- by this local 3-fold rotational axis of pseudosymmetry,
dimer as a protomer and having these homodimers to fill this triangle, it would have three times more area
arrayed as spokes around the 5-fold rotational axes of than if the equivalent mass of only one folded polypep-
symmetry. One monomer of the dimer forms the near tide filled it, and the shell could then contain 3.5 times
end of the spoke adjacent to the 5-fold axis; and the other more nucleic acid. This solution requires (Figure 9–21C)
monomer, the far end of the spoke. A local 2-fold rota- that this folded polypeptide be capable of quasi-equiva-
tional axis of pseudosymmetry relating the two elon- lence because those subunits forming the interfaces
gated monomers is located in the center of the spoke. around one vertex of the triangle would have to be
The interdigitation of these long spokes forms the pro- arrayed around a global 5-fold rotational axis of symme-
tein coat. Although each is formed from copies of the try (72 ∞ for each step; complementary faces at 108 ∞),
same polypeptide, a monomer at the 5-fold hub is while those subunits forming the interfaces around the
required to assume a significantly different shape from a other two vertices would have to be arrayed around local
monomer at the periphery of the spoke in order for the 6-fold rotational axes of pseudosymmetry (60 ∞ for each
interdigitation to succeed and the global 3-fold and step; complementary faces at 120 ∞) that each coincide
2-fold rotational axes of icosahedral symmetry to be with a global 3-fold rotational axis of symmetry. The
satisfied.216,217 Most viral protein coats, however, are built requirements of this quasi-equivalence would force each
with a different strategy that takes advantage of quasi- of the three subunits arrayed around the rotational axis
equivalence and pseudosymmetry to provide a general in the center of each triangle to assume a significantly
solution to the problem of expanding the size of an icosa- different conformation, so the local 3-fold rotational axis
hedral shell.218,219 around which they are arrayed is one of pseudosymme-
Quasi-equivalence218 is the manifestation of the try.
ability of either two or more copies of the same subunit Quasi-equivalent subunits arrayed around rota-
or two or more homologous subunits to adapt flexibly in tional axes of pseudosymmetry cannot each be individ-
492 Symmetry
ual protomers, but the set of all of the quasi-equivalent role. For example, only subunits C are adjacent to the
subunits arrayed around a rotational axis of pseudosym- global 2-fold rotational axes of symmetry. Subunits B and
metry or several rotational axes of pseudosymmetry can C must alternate around the global 3-fold rotational axes
be a protomer of the overall quaternary structure. of symmetry to produce local 6-fold rotational axes of
Consequently, it is the three quasi-equivalent subunits pseudosymmetry, while subunits A are distributed
arrayed around the local 3-fold rotational axis of around the global 5-fold rotational axis of symmetry.
pseudosymmetry that would form the protomer of the These quasi-equivalent situations produce alterations in
icosahedral array. The global rotational axes of this icosa- the structures of these folded polypeptides that are most
hedral array would remain true global rotational axes of obvious at the interfaces among the homotrimers; it is
symmetry for the entire structure because the asymme- here that the strain of requiring the same protein to
try would be confined entirely within each of the pro- adapt to the different rotational axes of symmetry is the
tomers of the point group. strongest.
Tomato bushy stunt virus has a protein coat with The packing at the quasi-equivalent interfaces has
just such an arrangement of subunits (Figure 9–26).220 been described in detail for the protein coat of southern
Each of its 60 identical pseudosymmetric, trimeric pro- bean mosaic virus,221 which is closely related to tomato
tomers is formed from three folded polypeptides, sub- bushy stunt virus. The three identical but quasi-equiva-
units A, B, and C, each of the same sequence 386 aa in lently folded polypeptides, each 260 aa in length, are
length and the tertiary structures of which, when they are arranged around a local 3-fold rotational axis of
in the viral protein coat, are homologous and superpos- pseudosymmetry (Figure 9–27A)221 to create the
able. The differences in their respective conformations homotrimeric protomer in which the three subunits
permit each of them to play its required quasi-equivalent adapt to their respective quasi-equivalent environments.
The A subunits are arranged around the global icosahe-
dral 5-fold rotational axes of symmetry (Figures 9–21C,
9–26, and 9–27B). Each of the B and C subunits uses the
same vertex to form the local 6-fold rotational axis of
pseudosymmetry (Figures 9–26 and 9–27C) as that used
by the A subunit to conform to the global 5-fold rota-
tional axis of symmetry. Careful inspection of Figure
9–27B,C shows that the two unique defining interfaces
are similar but significantly adjusted to accommodate
the differences in the angular requirements around these
two axes. These adjustments, in turn, create conforma-
tional changes throughout each of the individual sub-
units, causing their overall structures to differ. These
differences are most obvious in the angular orientations
of both the pleats within the b sheets and the b sheets
themselves when the three different conformations are
compared.
The protein coats of tomato bushy stunt virus,
southern bean mosaic virus, turnip yellow mosaic
virus,222 black beetle nodavirus,223 and primate cal-
civirus,224 among others, are all constructed from 180
identical subunits distributed among 60 identical
homotrimers. Because, however, the three subunits in a
Figure 9–26: Arrangement of the 180 subunits in the protein coat
of tomato bushy stunt virus.220 Each tile is a single folded polypep- trimer (A, B, and C in Figure 9–26) are not in identical envi-
tide, and all of the polypeptides are identical to each other in amino ronments, they need not be identical in amino acid
acid sequence. Three folded polypeptides, designated A, B, and C, sequence. In some icosahedral protein coats with trimeric
are arrayed around a local 3-fold axis of pseudosymmetry to pro- protomers, gene triplication of the nucleic acid encod-
duce the trimeric protomer of the icosahedral array. The vertex ing the amino acid sequence of the protein forming the
occupied by the A subunit lies at a global icosahedral 5-fold rota-
tional axis of symmetry, and the axes occupied by the B and coat has occurred to produce three genes. The protein
C subunits lie at global icosahedral 3-fold rotational axes of sym- coats of the comoviruses, of which those of cowpea
metry that are also local 6-fold rotational axes of pseudosymmetry. mosaic virus and beanpod mottle virus are examples, are
Global 2-fold rotational axes of symmetry relate C subunits, and an interesting intermediate case in this process.225 In the
local 2-fold rotational axes of pseudosymmetry relate A and protein coat of these viruses, two of the subunits in the
B subunits. Each subunit has a protrusion that runs up its associ-
ated 2-fold rotational axis. The diagram was adapted from the crys- heterotrimeric protomer are internally repeating
tallographic molecular model of this viral protein coat. Reprinted domains on the same polypeptide. This suggests that the
with permission from ref 220. Copyright 1983 Academic Press. general sequence of events has been a gene duplication
Figure 9–27: a-Carbon diagrams of the folded polypeptides compos-
ing the protein coat of southern bean mosaic virus drawn from the
crystallographic molecular model of the entire viral protein coat.221
The entire crystallographic molecular model has 180 identical
polypeptides all folded into the same tertiary structure. There are
142 142
three different environments, however, in which the folded polypep-
tides are found that can be designated A, B, and C (Figure 9–26).
(A) Three folded polypeptides arrayed about the 3-fold rotational axis
of pseudosymmetry at position A (at 12 o’clock), position B (at 4 198 198
258 258
o’clock), and position C (at 8 o’clock), respectively. (B) Two folded
polypeptides arrayed about the global 5-fold rotational axis of icosa- 230 230
hedral symmetry both in positions A. The lines drawn through the
64 64
centers of the two subunits meet at an angle of 72 ∞ on the global
5-fold rotational axis of symmetry. Because this is an axis of symme-
try, the two subunits are exactly superposable, and all interfaces B 126 171
99
126 171
99
around this axis are identical. (C) Two folded polypeptides arrayed 126 230 126 230
about the global 3-fold rotational axis of icosahedral symmetry, which 142 142
is a local 6-fold rotational axis of pseudosymmetry, at position C
(2 o’clock) and position B (at 4 o’clock). The lines drawn through the
centers of the two subunits meet at an angle of 60 ∞ on the local 6-fold
rotational axis of pseudosymmetry. The conformations of the B and
C subunits are noticeably different because they must flexibly adjust 171 171
to different dispositions within the overall icosahedral array. In both
panels B and C, the lines meeting at the rotational axes of symmetry 198 198
pass through the a carbon of Threonine 198 and then through the 99 99
respective subunit along the same trajectory. These drawings were 64 64
produced with MolScript.485
258 258
142 142
198 198
258 258
230 64 230 64
A 126 126
C 126
230
126
230
126 142 126 142
230 230
A171 A171
99 258 99 258
142 64 142 64
198 99 198 99
258 258
64 198 B 64 198 B 171 171
142 171 142 171 198 198
230 198 230 198
142 142
99 99
126 126
25864 230 126 258 64 230 126 258 258
493
171 99 171 99
C C 64 64
494 Symmetry
that gave rise to two genes producing two separate able.226–229 If this is a definitive correspondence, it
polypeptides followed by a gene duplication of one of demonstrates that the ancestors of each of these protein
these genes producing a single polypeptide with two coats were constructed from 180 folded polypeptides of
internally repeating domains followed by a division of the identical sequence.
latter duplicated gene so that it then produced two It has been shown that the single folded polypep-
smaller polypeptides, each containing the complete fold tides composing the protomers of the protein coats of
of its ancestral polypeptide. Following the triplication, satellite panicum mosaic virus and satellite tobacco
each of the three genes evolved independently to produce necrosis virus, another small virus the protein coat of
three polypeptides, each of different sequence and each which has only 60 subunits,215,230 are superposable on any
presumably incorporating changes rendering it more one of the folded polypeptides forming the trimeric pro-
successful at occupying its respective quasi-equivalent tomer of the protein coat of either tomato bushy stunt
position in the viral protein coat. virus or southern bean mosaic virus.201,231 It has also been
There are crystallographic molecular models for pointed out that, even though the former have 60 sub-
icosahedral protein coats of four viruses that have units and the latter have 180 subunits, the packing of the
accomplished the complete evolutionary transition: rhi- subunits of the protein coat of satellite panicum mosaic
novirus,226 poliovirus,227 Mengo virus,228 and foot-and- virus and satellite tobacco necrosis virus is similar to the
mouth disease virus.229 The protein coats of these viruses packing of the subunits of the protein coats of both south-
are spherical shells (Figure 9–28A,B),227 the surfaces of ern bean mosaic virus and tomato bushy stunt virus.
which are paved with heterotrimeric protomers in When the global 3-fold rotational axis of symmetry in the
icosahedral array (Figure 9–28C).228 That the three differ- protein coat of satellite tobacco necrosis virus is aligned
ent folded polypeptides forming these four viral protein with the local 3-fold rotational axis of pseudosymmetry
coats have arisen from a gene triplication follows from within one of the trimeric protomers of the protein coat
the fact that, in each case, the three different polypep- from one of the other viruses (Figure 9–29),230 the three
tides folded in their native conformations are superpos- global 5-fold rotational axes of symmetry in the protein
coat of satellite tobacco necrosis virus coincide with one
of the global 5-fold rotational axes of symmetry and two
of the local 6-fold rotational axes of pseudosymmetry in
the protein coat of the other virus.
It has also been observed232 that when the first 61
amino acids of the protein coat of southern bean mosaic
virus are removed, the remainder of the folded polypep-
tide can assemble to produce a hollow icosahedral shell
Figure 9–28: Space-filling representations of the folded polypeptides assembled into the icosahedral protein coat of poliovirus as drawn
from the crystallographic molecular model of this oligomeric protein.227 (A) View into the central cavity of the viral protein coat into which
the viral RNA is packed. (B) View of the surface of the viral protein coat in which the atoms contributed by each of the three different types
of folded polypeptides, VP1, VP2, and VP3, have been given different shades of gray. Panels A and B reprinted with permission from ref 227.
Copyright 1985 American Association for the Advancement of Science. (C) Diagrammatic representation of the surface of an icosahedral viral
protein coat228 in the same orientation as panel B to illustrate the distribution of the various folded polypeptides around the rotational axes
of icosahedral symmetry and local pseudosymmetry. Panel C reprinted with permission from ref 228. Copyright 1987 American Association
for the Advancement of Science.
Isometric Oligomeric Proteins 495
angles are close enough, the significant stability of such Figure 9–30A, the global 6-, 2-, 3-, and 2-fold rotational
an edifice, each of whose associations strengthens all of axes of hexagonal symmetry at its four vertices have the
the others, could force the interfaces to rearrange suffi- same relative spacing as the global 5-, 2-, 3-, and 2-fold
ciently to accommodate the icosahedral arrangement. rotational axes of symmetry around a quadrilateral pro-
This cooperativity in the construction of the shell, tomer in an icosahedral array (Figure 9–30, right panel).
which should resemble the cooperativity among the sup- Consequently, the array of subunits within any one of
ports of a building, also permits the interfaces to be these boundaries is able to be one of the 60 protomers in
weaker than interfaces that must stand alone. an icosahedral array if the subunit at the global 6-fold
Nevertheless, that such a monomer, with such faces, has rotational axis at the top vertex is able to adapt quasi-
arisen rarely during evolution would not be a surprising equivalently to the global 5-fold rotational axis of icosa-
fact. hedral symmetry. Similar compatibilities between the
There is, however, a positive-strand RNA virus, global axes of symmetry at the vertices and on the edges
MS2, that infects bacterial hosts. It also has an icosahe- of the nested equilateral triangles of Figure 9–30B and the
dral protein coat composed of 60 pseudosymmetric quadrilaterals of Figure 9–30C–E also allow the arrays of
homotrimers, but the folded polypeptides do not appear subunits within their boundaries to be one of the pro-
to be related to the folded polypeptides of the coat pro- tomers in an icosahedral array. The nested sets in Figure
teins of other positive-strand RNA viruses.173 9–30A produce protomers with 4, 9, 16, and 25 subunits
All icosahedral viral protein coats have 60 identical (T = 4, T = 9, T = 16, and T = 25); those in Figure 9–30B pro-
protomers arrayed around the global 5-fold, 3-fold, and duce protomers with 3, 12, and 27 subunits (T = 3, T = 12,
2-fold rotational axes of icosahedral symmetry of point and T = 27); and the skewed quadrilaterals in Figure 9–30
group 532(I). It is the identity and the number of subunits panels C, D, and E produce protomers with 7, 13, and 19
within each protomer that differ among them. The small- subunits (T = 7, T = 13, and T = 19), respectively.
est protein coats have only one subunit in each pro- In each of these protomers, the three designated
tomer, those with three subunits in each protomer are global rotational axes of symmetry of the hexagonal array
somewhat larger, and those with more than three are at the vertices and the edges become global axes of icosa-
larger still. The number of subunits within a protomer is hedral symmetry with the same fold. The global 6-fold
designated with a capital T. For example, the expanded rotational axis of symmetry of the hexagonal array at the
viral protein coats that have been discussed so far, in undesignated vertex becomes a quasi-equivalent global
which there are three subunits in each protomer (Figures 5-fold rotational axis of symmetry when the protomer is
9–26 and 9–28), have T = 3 icosahedral symmetry. The in the icosahedral array. The other global rotational axes
numbers of subunits found in the protomers of the larger of symmetry of the hexagonal array that fall within and
protein coats are those numbers that permit the subunits on the edges of the boundaries become local rotational
to assume quasi-equivalent positions around local rota- axes of pseudosymmetry in the icosahedral array, or if
tional axes of pseudosymmetry and that at the same time the protomer is large enough some of them become local
are compatible with the global rotational axes of icosa- rotational axes of symmetry. It was the realization of the
hedral symmetry. For example, in viral protein coats with fact that such a hexagonal array is compatible with icosa-
T = 3 icosahedral symmetry, the fact that a global 5-fold hedral symmetry that allowed Fuller to design the geo-
rotational axis of symmetry and two global 3-fold rota- desic domes,219 which preceded the realization that viral
tional axes of symmetry are arranged equilaterally protein coats are geodesic domes.218
(Figure 9–21C) produces the local 3-fold rotational axis of The protein coats of Sindbis virus233 and Nudaurelia
pseudosymmetry around which three subunits can be w Capensis virus234 each have 240 identical subunits
arranged within a protomer while still being compatible arranged with T = 4 icosahedral symmetry. In the pro-
with those global axes of symmetry. tomer of T = 4 icosahedral symmetry (Figure 9–30A),
The viral protein coats with expanded T = 3 icosa- there is a local 3-fold rotational axis of pseudosymmetry
hedral symmetry are the simplest cases of a common (gray symbol in Figure 9–30A) equidistant from the two
strategy218,219 used to expand the number of subunits in a 2-fold rotational axes of symmetry and the upper vertex,
protomer. Consider a hexagonal array of cyclic hexam- in a position equivalent to the local 3-fold rotational axis
ers with their respective 6-fold rotational axes of symme- of symmetry in the center of a protomer of T = 3 icosahe-
try normal to the plane of the array (Figure 9–30). Such a dral symmetry (gray symbol in Figure 9–30B). This local
hexagonal array of hexamers automatically creates an 3-fold rotational axis of pseudosymmetry is retained in
array of global 6-, 3-, and 2-fold rotational axes of sym- the T = 4 icosahedral shell in each protomer and is the
metry, also all normal to the plane, that operate on the local rotational axis of pseudosymmetry for a trimer of
entire array. In certain combinations, these global axes of quasi-equivalent subunits, just as is the equivalent local
symmetry have the same spacing and almost the same 3-fold rotational axis of pseudosymmetry in a protein
fold relative to each other that the global axes of icosahe- coat of T = 3 icosahedral symmetry. Another copy of the
dral symmetry have around one of its protomers. For same trimer necessarily occupies each position in the
example, in each of the four nested quadrilaterals in shell at which one of the 10 global 3-fold rotational axes
Isometric Oligomeric Proteins 497
G G G G G G G G G G G
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
G G G G G G G G G G G
G G G G G G G G G G
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
G G G G G G G G G G
G G G G G G G G G G G
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
G G G G G G G G G G G
G G G G G G G G G G
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
G G G G G G G G G G
G G G G G G G G G G G
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
G G G G G G G G G G G
G G G G G G G G G G
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
GG
G G G G G G G G G G
A B C D E
Figure 9–30: Compatibility of a hexagonal array of hexamers with icosahedral symmetry. The positions of the global axes of symmetry
normal to an infinite hexagonal array of homohexamers are indicated in the distributions surrounding the two hexamers in the upper right
of the array. A solid hexagon indicates a global 6-fold rotational axis of symmetry. These rotational axes of symmetry are located in the respec-
tive positions throughout the array. (A) Segments of the hexagonal array containing 4, 9, 16, and 25 subunits, respectively, that can be pro-
tomers of icosahedral symmetry. The quadrilaterals enclosing these four segments and those in C, D, and E are the protomers of an
alternative icosahedral hexacontahedron (drawn to the right of the hexagonal array). In each of the protomers of this hexacontahedron, a
5-fold, a 2-fold, a 3-fold, and a 2-fold rotational axis of global symmetry are consecutively joined by line segments. (B) Segments of the hexag-
onal array containing 27, 12, and 3 subunits, respectively, that can be protomers of icosahedral symmetry that are each enclosed by the
boundaries of the equilateral triangle connecting a global 5-fold and two global 3-fold rotational axes of symmetry as in the icosahedral hexa-
contahedron in Figure 9–21C. The gray global 3-fold rotational axes of symmetry of the hexagonal array in A and B become local 3-fold rota-
tional axes of pseudosymmetry when the protomers for T = 4 and T = 3, respectively, are inserted into an icosahedral array (the
hexacontahedron to the right of the hexagonal array and the hexacontahedron in Figure 9–21C, respectively). (C–E) Skewed segments of the
hexagonal array containing 7 (C), 13 (D), and 19 (E) subunits that also can be protomers of icosahedral symmetry. The 2-fold and 3-fold rota-
tional axes of symmetry noted at the boundary of each of the potential protomers coincide with global 2-fold and 3-fold rotational axes of
symmetry of the symmetry, and the unmarked 6-fold rotational axis of symmetry becomes a quasi-equivalent 5-fold rotational axis of sym-
metry, when the protomer is inserted into the icosahedral array.
of symmetry emerges because that position is itself rotational axes of symmetry in each of the various
quasi-equivalent to that occupied by a trimer on one of skewed quadrilaterals in Figure 9–30C–E does not super-
the local 3-fold rotational axes of pseudosymmetry. pose on itself, so there are right-handed and left-handed
Consequently, in a coat protein with T = 4 icosahedral versions of each of these quadrilaterals. The coat protein
symmetry, there are 60 trimers located at the 60 local of bacteriophage P22 is a left-handed T = 7 array.
3-fold rotational axes of symmetry, one in each pro- The family of papilloma viruses, simian virus 40,
tomer, and 20 of the same trimers located on the respec- and polyoma viruses have protein coats that are a pecu-
tive ends of the 10 global 3-fold rotational axes of liar variation on T = 7 icosahedral symmetry.238–240 Each
symmetry. of these viral protein coats is assembled from 72 identi-
Because the requirements placed upon the 80 cal pentamers with cyclic symmetry the subunits of
trimers in T = 4 icosahedral symmetry are similar to those which are held together by extensive interfaces. Rather
placed upon the 60 trimers in T = 3 icosahedral symme- than using the same subunit to form pentamers and
try, there are proteins that can adopt either T = 3 or T = 4 hexamers, both the global 5-fold rotational axes of icosa-
icosahedral symmetry depending on the conditions,235 hedral symmetry and the local 6-fold rotational axes
and within the family of alphaviruses there are protein within the protomer are occupied by copies of the pen-
coats with either 180 subunits (T = 3) or 240 subunits tamer. Consequently, dramatically different interfaces,
(T = 4).233 on the same outer surfaces of identical folded polypep-
The protein coat of bacteriophage P22236,237 has tides in each pentamer, must be made around the global
seven identical subunits (429 aa) arranged with T = 7 2-fold rotational axis of icosahedral symmetry and the
icosahedral symmetry (Figure 9–30C) in each of its 60 local 2- and 3-fold rotational axes of hexagonal symme-
protomers. The subunits adapt quasi-equivalently to try in the spaces between the homopentameric subunits.
form the pentamers sitting on the global 5-fold rotational This is accomplished by forming the interfaces with flex-
axes of symmetry and the hexamers sitting on the local ible, structurally swapped strands of polypeptide rather
6-fold rotational axes of pseudosymmetry within each than the usual rigid interfaces of interdigitated second-
protomer. The mirror image of the arrangement of the ary structure.
498 Symmetry
Bluetongue virus216 and reovirus241 have protein their subunits end up evenly distributed over the surface
coats with T = 13 icosahedral symmetry (Figure 9–30D). of an almost spherical oligomeric protein, just as the
In the former, the most extensive interfaces are those modular units of a geodesic dome end up producing an
holding trimers of subunits together around the global almost spherical exploded icosahedron and for the same
and local 3-fold rotational axes, a fact suggesting that the reason. The joints adjust to distribute uniformly the
fundamental unit of the structure is a trimer. The trimers, strain produced by creasing218 the hexagonal array con-
however, must adapt quasi-equivalently to being sequent to requiring certain of its 6-fold rotational axes
arranged about both the global 5-fold rotational axes of of symmetry to become 5-fold rotational axes of symme-
symmetry and the local 6-fold rotational axes of symme- try.219
try. The protein clathrin forms isometric cagelike struc-
Herpes simplex virus242–244 has a protein coat with tures that assemble around small pinocytotic vesicles as
T = 16 icosahedral symmetry (Figure 9–30A). Pentamers they bud inward from the plasma membrane of an
called pentons occupy the global 5-fold rotational axes of animal cell. The polypeptide is 1600 aa in length, and
symmetry, and hexamers called hexons occupy the local when folded it produces a tubular protein, 45 nm in
6-fold rotational axes of symmetry; but both pentons and length and 2.5 nm in diameter.249 The protomer from
hexons are formed from five copies and six copies, which the cages of clathrin are formed is a trimer of
respectively, of the same folded polypeptide (1374 aa). these polypeptides, all three joined together at one end
There are 16 copies of this folded polypeptide in each around a 3-fold rotational axis of symmetry to produce a
protomer so each protein coat contains 1,319,040 aa. triskelion with bent arms.249 Different numbers of these
Adenovirus245 has a protein coat with T = 25 icosa- triskelia can assemble to produce intact cages of various
hedral symmetry (Figure 9–30A). In this arrangement shapes between 70 and 200 nm in diameter.250 The wires
each protomer contains four hexons that occupy the local of the mesh forming these cages are presumed to be
6-fold rotational axes of pseudosymmetry, but a hexon is formed from two or more intertwined arms of the triske-
not a hexamer, it is a homotrimer. Each of the three iden- lia.251 Each and every vertex in each and every cage is a
tical subunits of one of these homotrimers contains two junction of three wires, and each vertex must contain the
internally duplicated domains, and six domains, two local 3-fold rotational axis of symmetry at the nexus of an
from each subunit, are arranged around each local 6-fold individual triskelion. The mesh itself is always formed of
rotational axis to produce the pseudosymmetrically dis- pentagons and hexagons of wire producing polyhedra
played faces for the interfaces in which each hexon must with as many as 32 faces, but most of them are not based
participate with its six neighbors.245 Unlike those in on isometric symmetries.250 It is the elongated and flexi-
herpes simplex virus, the pentons in adenovirus, cen- ble nature of the subunit that permits the one protein to
tered on the global 5-fold rotational axes of symmetry, are generate such a wide variety of oligomeric proteins.
formed by different folded polypeptides (571 aa) from
those (967 aa) forming the hexons. Suggested Reading
In all of these viral protein coats based on hexago-
Caspar, D.L.D., & Klug, A. (1962) Cold Spring Harbor Symp. Quant.
nal expansion of the basic icosahedral protomer, the Biol. 27, 1–24.
problem of closing the protomer arises. A hexagonal
Rossmann, M.G., Abad-Zapatero, C., Hermodson, M.A., &
array is a potentially infinite array, yet the protomers Erickson, J.W. (1983) Subunit interactions in southern bean
within the boundaries of the various equilateral triangles mosaic virus, J. Mol. Biol. 166, 37–83.
and quadrilaterals of Figure 9–30 are finite portions of Izard, T., Aevarsson, A., Allen, M.D., Westphal, A.H., Perham, R.N.,
that infinite array. During the assembly of the protein de Kok, A., & Hol, W.G. (1999) Principles of quasi-equivalence
coat, a mechanism for measuring out the size of that por- and Euclidean geometry govern the assembly of cubic and
tion is required, and this role seems to be filled by pro- dodecahedral cores of pyruvate dehydrogenase complexes,
teins accessory to the subunits of the coat.246 Proc. Natl. Acad. Sci. U.S.A. 96, 1240–1245.
The viral protein coats in which the protomer con-
tains only a few subunits have the appearance of spheres, Problem 9–9: In Figure 9–22 there are four 3-fold rota-
even though their surfaces are often quite irregular tional axes of symmetry, each of which superposes four
(Figure 9–28). As the number of subunits in the hexago- different triplets of subunits. For example, one of these
nal array of hexamers becomes greater, there is a ten- axes superposes A, B, and C; J, E, and H; L, D, and G; and
dency for the structure to look polyhedral243,247,248 K, F, and I.
because of the tendency of the protomers to adopt the
(A) List the four triplets for each of the other three
plane of the hexagonal array. In some viral protein coats,
3-fold rotational axes of symmetry.
the same quasi-equivalent interface will hold its two sub-
units in a plane at one location, fitting into a polyhedral In Figure 9–22 there are three 2-fold rotational axes
face, and at an angle to each other in another, forming a of symmetry, each of which superposes six different
crease along a polyhedral edge.223 Usually, however, the twins of subunits; for example, one of them superposes A
hexagonally packed protomers are bowed out so that and K, B and L, C and J, D and H, E and I, and F and G.
Helical Polymeric Proteins 499
(B) List the six twins for each of the other two 2-fold noted that the actin helix can be represented as a single
rotational axes of symmetry. helix or as a double helix.
If the generating interface creates a screw axis of
Problem 9–10: Make a xerographic copy of Figure 9–26. symmetry where the radial angle between successive
On that xerographic copy designate every global 2-, 3-, subunits is fairly small, so that there are a number of sub-
5-, and 6-fold rotational axis of symmetry. Use the units in one turn of the helix, and where the rise for each
standard symbols for this designation. In the same figure subunit is just enough to cause the subunits in each turn
designate some of the local 2-, 3-, and 6-fold rotational of the helix to lie upon the subunits from the turn below
axes of pseudosymmetry by the symbols P2, P3, and P6, it, then a singly threaded cylinder is formed. Because of
respectively. steric problems, such a generating interface usually cre-
ates a screw axis of symmetry that is separated from the
Problem 9–11: Kepler derived the rhombic dodecahe- subunits themselves, rather than passing through each of
dron from the intersection of a cube and an octahedron. them, and the threaded cylinder is hollow inside. An
Draw the intersection of a cube and an octahedron, and example of such a hollow, singly threaded helical cylin-
connect its vertices to produce a rhombic dodecahedron. der is tobacco mosaic virus (Figure 9–31).253–256 In tobacco
mosaic virus, the angle between successive subunits is
22.03 ∞, and the rise for each subunit is 0.14 nm.255,256
Helical Polymeric Proteins These dimensions bring the subunits in the next turn of
the helix into contact with the subunits in the preceding
Helical fibers formed from identical subunits of protein turn, and the pitch of the helix is 2.3 nm, the height of a
have useful properties, and there are many examples of subunit. Between two successive turns there are inter-
them. For example, to accomplish its function of faces among the protomers. Because of the screw sym-
hybridizing homologous single strands of DNA, the RecA metry, each protomer provides at its lower surface the
protein binds to DNA in a long helical polymeric sheath upper faces for the interfaces with the two protomers
that matches the helical symmetry of the DNA. The heli- below it and at its upper surface the lower faces for the
city of the sheath, however, is built into the protein interfaces with the two protomers above it. Because every
because it spontaneously forms a sheath with almost the protomer is the same, each of these respective interfaces
same helical symmetry even in the absence of the is the same and repeats along the thread every 22.03 ∞.
DNA.252 The generating helix emphasized in the drawing of
Every time an interface is created by evolution tobacco mosaic virus (Figure 9–31B) is a right-handed
between two complementary faces on the surface of helix of pitch 2.3 nm. If the structure is examined closely,
copies of the same globular protein, no matter how those however, it can be seen that there are sets of helices of
two faces are disposed on its surface, a distinct screw axis steeper pitch running through it. The subunits in tobacco
of symmetry is defined. Most of these screw axes would mosaic virus are arranged upon a helical surface lat-
be open and generate helical polymers of the monomer, tice257 that contains all of these other sets of helices. The
so the existing helical polymers must be the few that have helical surface lattice can be displayed in two dimensions
escaped elimination by natural selection. The surprising by cutting the cylinder along a line parallel to the central
fact is that there are so few helical polymeric proteins. axis and flattening it upon the page (Figure 9–31C). When
A single geometric helix can be defined by three the helical surface lattice of tobacco mosaic virus is
parameters: its radius, its hand, and its pitch. The pitch viewed in this format, it can be seen that, in addition to
of a helix is the distance it rises for each complete turn. It the single right-handed helix of pitch 2.3 nm running
can also be defined by four parameters: its radius, its through the lattice, there are also sets of 17 parallel right-
hand, a recurring radial angle dividing the helix into handed helices and sets of 16 parallel left-handed helices
equal segments of arc, and the rise for each of these (lower and upper sets of arrows, respectively, in Figure
equal segments of arc. In a helical polymeric protein the 9–31C).258 Any one of these sets of helices can also define
second definition makes more sense because the succes- the structure. A helical surface lattice can be uniquely
sive subunits can be considered to be the repeating seg- designated by the number of parallel strands of left-
ments of arc. handed twist (a negative number) and the number of
An interface between two identical molecules of a parallel strands of right-handed twist (a positive number)
protein can generate several types of helical polymers. for any one of the sets of each respective hand.257 For
The actin helix (Figure 9–1B,C) is an example of the sim- example, the helical surface lattice of tobacco mosaic
plest type in which the protomers ascend one step at a virus can be designated (–16, 1) or (–16, 17). The surface
time around the screw axis. In the actin polymer, the lattice of the helical polymer of flagellin from Salmonella
screw axis of symmetry passes through a corner of each typhimurium also contains a singly threaded right-
protomer, the helix is not much wider than the protomer handed helix as well as a set of five parallel left-handed
(Figure 9–1C), and the radial angle between successive helices, a set of six parallel right-handed helices, and a set
protomers is fairly large (–166 ∞). It has already been of 11 parallel left-handed helices.259
500 Symmetry
A
A B
B C
C
–16
17
Figure 9–31: Helical surface lattice of tobacco mosaic virus. (A) Molecular model derived from a map of electron density calculated from the
helical diffraction pattern of X-radiation (Bragg spacing ≥ 0.29 nm) emerging from an oriented preparation of the viruses.256 The diffraction
from a specimen prepared in 1960 was gathered to Bragg spacing of 0.29 nm in 1982. The timing of this sequence of events illustrates how
rare it is to obtain a well-aligned preparation of a helical polymeric protein. Reprinted with permission from ref 256. Copyright 1989 Elsevier
B.V. (B) Diagrammatic representation of the helical surface lattice of the protein coat of tobacco mosaic virus.254 Individual protomers form
a singly threaded helical screw that is a hollow cylinder. Each protomer has the same orientation and the successive protomers are related
by a rise of 0.14 nm and a rotation of 22 ∞ along the helix. Reprinted with permission from ref 254. Copyright 1972 Federation of American
Societies for Experimental Biology. (C) The surface helix of tobacco mosaic virus flattened onto the page.255 The hollow cylinder in panel B
was split along a vertical plane passing through its wall and then flattened onto the page. The horizontal line represents a circle around the
cylinder, the plane of which is normal to the central axis of the cylinder, that has been also split and flattened. It has been added to assist in
counting the numbers of parallel helices in the various sets. The arrows at the upper end of the lattice indicate the set of 16 left-handed
helices, and those at the lower end, the set of 17 right-handed helices that run through the lattice. These are referred to as 16 start and 17 start
arrays, respectively. Reprinted with permission from ref 255. Copyright 1974 Academic Press.
A helical surface lattice does not have to have a flagellin, or the extended tail of the T4 bacteriophage.
single shallow helix of one or the other hand acting as the This radial angle between successive subunits that these
generating operation. The extended tail of the T4 bacte- interfaces dictate can have any numerical value.
riophage is built from hexamers with cyclic symmetry. Technically, this means that the helix or helices never
The hexamers are stacked upon each other by successive position a monomer exactly above any monomer below
interfaces that cause them to be out of alignment in a it in the helix. For example, tobacco mosaic virus has
right-handed sense by 17 ∞.260 This creates a helical sur- 49.02 subunits in three turns (22.03 ∞ subunit–1), not 49.256
face lattice that is a hextuply threaded cylinder. The Therefore, there is no translationally repeating unit in a
(–6, 6) lattice contains both a set of six parallel helices of rigid, biological helical structure. What this means is that
shallow pitch with left-handed sense and a set of six par- a helical polymeric protein has difficulty crystallizing in
allel helices, of steeper pitch, with a right-handed sense. three dimensions, and crystals of helical polymers suit-
The CA protein from type 1 immunodeficiency able for high-resolution crystallographic studies have not
virus spontaneously forms several different sizes of heli- been produced. One interesting example, which is
cal tubes. It forms two different tubes with distinct almost an exception to this uniform failure, is the
(–12, 11) and (–11, 10) surface lattices, respectively, that protofilament running through a crystal of the protein
each have a single generating helix of one strand with a encoded by the mreB gene in E. coli. This protein is a
left-handed pitch. The same protein, however, also homologue of actin and forms filaments similar to those
forms tubes with (–10, 13), and (–8, 5) surface lattices.261 formed by actin (Figure 9–1B) but without the helical
The radial angle relating successive subunits to twist, so a filament can be incorporated into a crystal.6
each other in a helix or each of the identical helices in a One approach to determining the structure of a hel-
set of helices in a helical polymer is determined by the ical polymeric protein is to crystallize the monomer, con-
interfaces among them. It is these interfaces that pro- struct its crystallographic molecular model at atomic
duce a helical filament such as actin or a singly or multi- resolution, and simultaneously determine the structure
ply threaded cylinder such as tobacco mosaic virus, of the helical polymer at high enough resolution to posi-
Helical Polymeric Proteins 501
tion crystallographic models of these monomers in the tering density can be produced at as high a resolution as
proper orientation at the locations they occupy in the possible, details of the structure of either the mold in
polymer. Such a strategy has been applied successfully to which the molecule of protein is encased or the molecule
the polymer of actin (Figure 9–1B).3* The details of the of protein itself can be observed. Image reconstruction
structure of an intact polymer at the resolution required is any computational method that is used to calculate
by this strategy can be determined by image reconstruc- q(x,y,z) from the images of molecules of proteins
tion263 of electron micrographs. observed in electron micrographs.263 In all cases, the
Under appropriate circumstances, an electron electron micrograph is the experimental data submitted
microscope is capable of magnifying the image of a pro- to these calculations, and the electron micrograph used
tein sufficiently that individual subunits can be distin- in a particular reconstruction must be presented to the
guished and their shapes almost distinguished. A beam reader so that she may appreciate the point of departure
of collimated electrons is impinged upon the sample, and (Figure 9–32A).
those that pass through it without being deflected suffi- A molecule of protein has a certain distribution of
ciently to leave the beam are then focused by electro- electron density r(x,y,z), and when molecules of protein
magnetic lenses. The contrast in the image is caused by are arrayed in a crystal they create a periodic, three-
the distribution within the sample of the ability to deflect dimensional distribution of electron density. This peri-
or scatter electrons from their path. The sample placed in odic array diffracts X-rays to produce a diffraction
an electron microscope usually has to reside in a cham- pattern that is also periodic. The angular dispositions of
ber under high vacuum. This means that the sample has the reflections in the diffraction pattern are determined
to be a solid with a low vapor pressure. In the case of mol- by both the angles among the axes of the fundamental
ecules of protein, this requires that they be encased in a unit cell and its dimensions. The dimensions and axial
solid matrix, which also must be a glass to avoid the prob- angles of the unit cell can be calculated from these angu-
lems of the diffraction caused by crystalline solids. lar dispositions. The diffraction pattern of the crystal is
To enhance contrast and provide a solid support the Fourier transform of the periodic distribution of elec-
simultaneously, the glass is often formed by drying a tron density it contains. The magnitudes and phases of
solution of a salt containing a heavy metal such as ura- the maxima in the diffraction pattern can be calculated
nium or tungsten. Either uranyl acetate or sodium phos- from the distribution of electron density within the unit
photungstate, for example, will form an electron-dense cell by digital Fourier transformation. Conversely, the
glass when a film of a solution containing it is dried. This distribution of electron density in the unit cell can be cal-
electron-dense glass will surround, encase, and support culated from the amplitudes and phases of the diffrac-
a molecule of protein that had been present in the solu- tion maxima by digital Fourier transformation.
tion from which the film was made. The glass of the salt A helical polymer of protein in its mold of negative
of heavy metal encasing the molecule of protein forms a stain or amorphous ice has a certain distribution of scat-
three-dimensional boundary or mold that has the shape tering density, q(x,y,z), which is a periodic function
of the molecule of protein. It is the dark image of this because the helix is periodic. Each protomer of the poly-
mold of electron-dense glass encasing the light image of mer is a unit cell in this helical array. The computed
the electron-translucent molecule of protein that is Fourier transform of this periodic array is also a periodic
observed in the micrograph. This procedure is known as function. From the spatial disposition of its maxima, the
negative staining. angle between successive unit cells in the helix, the rise
It is also possible to insert thin layers of amorphous for each unit cell, and the number of helical threads in
ice, which is a glass, into a cryoelectron microscope, an the structure can be calculated. From the amplitudes and
electron microscope not with cold electrons but with a phases of the maxima of the Fourier transform, the dis-
stage for the sample that is cooled to a very low temper- tribution of scattering density within the unit cell can be
ature.264,265 When a molecule of protein is embedded in calculated.
such a glass (Figure 9–32A),266 the contrast observed The depth of focus in an electron microscope is
results from the fact that the atoms in a molecule of pro- larger than the width of a specimen containing a helical
tein are more efficient at deflecting electrons than the polymer, and all points in the specimen are in focus in
molecules of water in amorphous ice. A positive image of the final micrograph. As such, the micrograph represents
the molecule, rather than a negative image, is observed. the projection of the three-dimensional distribution of
The three-dimensional distribution of the ability to scattering density onto a two-dimensional surface.263
deflect electrons within the sample is known as the dis- The Fourier transform, F(X,Y,Z), of any three-dimen-
tribution of scattering density, q(x,y,z). If a map of scat- sional distribution of scattering density is267
A B C D
180
F–3,5 a –3,5
–180
1.0
180
a 4,2
–180
F4,2
1.0
180
a 2,1
–180
F2,1
2.0
180
a 0,0
–180
F0,0
10.0
Distance
Figure 9–32: Image reconstruction of the helical array of subunits in a filament of actin.266 Filaments of human cytoskeletal b actin were sus-
pended in a buffered aqueous solution. A small sample of the suspension (4 mL) was spread over a grid for electron microscopy and rapidly
frozen to obtain a thin sheet of amorphous ice in which the filaments were embedded. (A) An electron micrograph (36000¥) of an actin fila-
ment embedded in the ice. The arrows mark points of helical crossovers. The electron micrograph was scanned and a Fourier transform of
the digitized image was performed. (B) The Fourier transform (Equation 9–5) of the image in panel A is presented graphically on a two-dimen-
sional field such that the brightness of the image is directly proportional to the amplitude of the function at that point. Because the array is
helical, the maxima in the Fourier transform are found along layer lines. (C) Distribution of the amplitude and phase of the Fourier trans-
form along several of the layer lines in panel A. Phases (an,l; degrees) are presented in the upper plot of each pair and amplitude (Fn,l; relative
units) in the lower plot. The indexing of the layer lines (n, l) designates the Bessel order (n) and the number of the layer line (l).
(D) Reconstructed image in stereo resulting from a Fourier–Bessel transform of the amplitudes and phases along the properly indexed layer
lines. Reprinted with permission from ref 206. Copyright 2000 Elsevier B.V.
twisted around each other to make a cable. The strands fiber. For example, a serial set of reflections is produced
from which ropes of protein are made are polypeptides by a tendon from the tail of a rat when the tendon is
read from messenger RNA, and for this reason, each placed in a beam of X-rays. This serial set of reflections
strand must have a discrete length. This in turn means arises from a helical array that repeats every 67 nm,280,281
that the cable is built from strands of uniform length that and this dimension, along with others, must be incorpo-
are overlapped to provide the necessary tensile strength. rated into the model for the complete cable of the colla-
The arrangement of the strands in the molecular gen from which the tendon is formed.
rope and the ropes in the cable is elucidated by permit- Collagen is the helical polymeric protein from
ting a macroscopic fiber to diffract X-radiation. A macro- which is formed the tough, flexible material composing
scopic fiber, built from billions of the molecular cables, is tendons, intercellular matrix, the matrix of bone, and
placed in a beam of X-rays. The cables are all more or less many strong, plastic sheets of various shapes and sizes
aligned with the axis of the macroscopic fiber. The heli- found in animals. The basic structural element in these
cal arrays of the strands in the ropes and the helical macroscopic structures is the fibril of collagen, which is
arrays of the segments of rope in the cable have certain a cylindrical thread of indefinite length, 200–800 nm
regularly recurring dimensions and angles associated wide. This thread in turn is formed from molecular
with them that give rise to diffraction. The dimensions cables of collagen, each probably as long as the fibril,
and helical parameters of these arrays can be established packed side by side in register.
from the angles at which the reflections emerge from the The strand from which a molecular cable of colla-
504 Symmetry
The collagen in a tendon is arranged in a crystalline these coiled coils of tropomyosin spans seven actins. The
array the unit cell of which is triclinic (Figure 4–2)294 with globular protein troponin decorates the microfilament at
a length of 67 nm, equal to only one-fifth of the 335 nm regular intervals. The structure of the thin filament has
repeat of the cable. Consequently, the cable must be been elucidated by image reconstruction.298–300 In trans-
twisted so that its structure repeats translationally every mission electron micrographs, thin filaments appear to
67 nm. This is accomplished by twisting it by –72 ∞ over be about 8 nm wide. Microtubules are hollow cylin-
each 67 nm segment of its length to produce repeating ders,301 constructed from the globular protein tubulin.
conformations that are translationally superposable on They are about 20 nm in width. Intermediate filaments
each other (Figure 9–34B).293 Because the regions with are the third class of filament. In transmission electron
only four ropes are the most flexible, the twist is confined micrographs, they appear to be about 10 nm wide, inter-
to them. The cable is kinked so that as it ascends at each mediate between thin filaments and microtubules.
step it shifts from one column of unit cells to the column Intermediate filaments were originally considered
diagonally adjacent to it. The pentagonal array of the five to be a heterogeneous class of polymeric proteins
ropes is compressed in one dimension into a layer of two grouped together only because they were similar in
and a layer of three ropes to permit the pentamer of width. Within this class are tonofilaments, neurofila-
ropes to pack in a hexagonal array. The helix of each ments, cellular keratin filaments, desmin filaments, glial
strand is left-handed, the superhelix of the triple helix of filaments, and vimentin filaments. Each of these sub-
the three strands forming the rope is right-handed, and classes occurs in a different set of tissues, and they form
the twist of the cable of five ropes is left-handed. intermediate filaments that often seem quite different in
Alternating the hand of the elements in a cable increases their appearance and their distribution through a cell
its strength.293 (Figure 9–35).302 It is now known, however, that all of
As the cables enter these rather contorted arrays, these filaments are constructed from polypeptides that
they are aligned by the crystallization in register, and the are homologous in sequence (Figure 9–36)303 and thus
crystallographic asymmetric units, each containing the necessarily share a common, superposable structure.
equivalent of 67 nm of cable, create a repeating pattern That an intermediate filament can be constructed from
on the surface of the fibril itself. This pattern can be one of these polypeptides all by itself has been demon-
seen in the electron microscope. It appears as alternat- strated by reassembling filaments from a pure homoge-
ing thickenings and thinnings along a desiccated fibril neous preparation of a given polypeptide.304 These
that repeat every 67 nm. These thickenings and thin- polypeptides form helical cables of indefinite length.
nings are believed to represent the alternation in regis- One of these intermediate filaments, keratin, com-
ter of regions within the molecular cables five ropes in poses the fibers in the composite material that forms
thickness and those of four ropes in thickness, respec- skin, hair, and horn. For example, the quill of a porcupine
tively (Figure 9–34). Fibrils of collagen also stain posi- represents a large array of more or less aligned keratin
tively by chelating heavy metals at specific locations on cables. Diffraction of X-radiation from such specimens305
the surfaces of the ropes where there are high, unbal- provides dimensions of the helical arrays in these
anced constellations of negative charge. Because the cables.306 Meridional reflections representing a repeat of
segments of rope are placed in register by the side to 0.51 nm are strong features of such diffraction patterns,
side crystallization of the cables, these positions to and this demonstrates that these cables are built from
which heavy metals bind form bands across the fibrils or coiled coils of a helices.
across sheets of collagen. The pattern of bands is quite The strand from which an intermediate filament is
reproducible,295 and it repeats every 67 nm.296 From an constructed is a polypeptide folded into an a helix. Two
examination of the patterns in which charged amino a helices twist around each other in a left-handed coiled
acids occur in the amino acid sequences of the polypep- coil (Figure 6–29) to produce the rope.307,308 The heptad
tides, it can be shown that the pattern in which these repeat permitting the formation of this rope can be
bands occur on the ropes is entirely consistent296 with noticed in those regions of the sequences that are
the triple-helical array of strand and pentahelical array involved in its formation when sequences from several of
of ropes in the hypothetical model (Figure 9–34B), the these polypeptides are aligned (Figure 9–36).303 The
Fourier transform of which mimics the reflections in the amino acids in the heptad positions at which the side
X-ray diffraction pattern from an oriented tendon.293 All chains are directed into the core of the coiled coil are not
of these correlations provide independent support for always nonpolar, but at four of the positions where they
this model of the structure of the cable. are not, remarkable conservation is displayed.
Three classes of filaments can be observed within As in collagen, the sequences producing the rope
animal cells. Thin filaments, or microfilaments, are fila- are found in the central portions of the polypeptides. In
ments of actin the basic structural element of which is each of the amino acid sequences from this central
the actin helix (Figure 9–1C). Tropomyosin, a fibrous region, there are three consecutive segments of heptad
protein that is one continuous coiled coil of two parallel repeat, about 30, 100, and 130 aa in length,309 separated
a helices,297 lies in the grooves of the actin helix. Each of by short segments (about 20 aa in length) lacking the
Helical Polymeric Proteins 507
human keratin II 5 219 FEQY INN LRRQ LDS IVGE RGR LDSE LRN MQDL VED FKNK YED EINK RTT AE
ovine keratin II 7c 158 FEGY IET LRRE AEC VEAD SGR LSSE LNH VQEV LEG YKKK YEQ EVAL RAT AE
ovine keratin I 8c1 106 YFRT IEE LQQK ILC AKSE NAR LVVQ IDN AKLA ADD FRTK YET ELGL RQL VE
human keratin I 14 164 YFKT IED LRNK ILT ATVD NAN VLLQ IDN ARLA ADD FRTK YET ELNL RMS VE
desmin from G. gallus 146 YEEE LRE LRRQ VDA LTGQ RAR VEVE RDN LLDN LQK LKQK LQE EIQL KQE AE
vimentin from C. griseus 148 YEEE MRE LRRQ VDQ LTND KAR VEVE RDN LAED IIR LREK LQE EMLQ REE AE
murine glial fibril 113 YQAE LRE LRLR LDQ LTAN SAR LEVE RDN FAQD LGT LRQK LQD ETNL RLE AE
porcine neurofilament-M 150 YDQE IRE LRAT LEL VNHE KAQ VQLD SDH LEED IHR LKER FEE EARL RDD TE
human neurofilament-H 146 YERE VRE MRGA VLR LGAA RGQ LRLE QEH LLED IAH VRQR LDD EARQ REE AE
human keratin II 5 270 NE FVM LKKD VDA AYMN KVE LEAK VDA LMDE INF MKMF FDAE LSQ MQTH VSD
ovine keratin II 7c 209 NE FVA LKKD VDC AYVR KSD LEAN SEA LIQE IDF LRRL YQEE IRV LQAN ISD
ovine keratin I 8c1 157 SD ING LRRI LDE LTLC KSD LEAQ VES LKEE LIC LKSN HEEE VNT LRSQ LGD
human keratin I 14 215 AD ING LRRV LDE LTLA RAD LEMQ IES LKEE LAY LKKN HEEE MNA LRGQ VGG
desmin from G. gallus 197 NN LAA FRAD VDA ATLA RID LERR IES LQEE IAF LKKV HEEE IRE LQAQ LQE
vimentin from C. griseus 199 ST LQS FRQD VDN ASLA RLD LERK VES LQEE IAF LKKL HDEE IQE LQAQ IQE
murine glial fibril 164 NN LAA YRQE AHE ATLA RVD LERK VES LEEE IQF LRKI YEEE VRD LREQ LAQ
porcine neurofilament-M 201 AA IRA LRKD IEE ASLV KVE LDKK VQS LQDE VAF LRSN HEEE VAD LLAQ IQA
human neurofilament-H 197 AA ARA LARF AQE AEAA RVD LQKK AQA LQEE CGY LRRH HQEE VGE LLGQ IQG
Figure 9–36: Alignment of portions of the amino acid sequences of the polypeptides composing various intermediate filaments.303 The
aligned segments come from different locations in the amino acid sequences of the various polypeptides. The numbers indicate the sequence
positions in the various polypeptides at which the alignment on that line commences. The proteins are isoform 5 of human type II cytoskele-
tal keratin, isoform 7c of ovine type II microfibrillar keratin, isoform 8c1 of type I keratin from intermediate filaments of ovine wool, isoform
14 of human type I cytoskeletal keratin, desmin from Gallus gallus, vimentin from Cricetulus griseus, glial fibrillary acidic protein from murine
astrocytes, triplet M protein from porcine neurofilaments, and triplet H protein from human neurofilaments. The pattern of the heptad
repeat is highlighted in boldface type. The highlighted amino acids are those the side chains of which are directed into the cores of the coiled
coils.
repeat. The approximately 300 aa strand–1 should pro- trude from its sides. These protrusions presumably cause
duce an interrupted rope 35–45 nm long. On either side each type of intermediate filament to have a different
of this rope are amino-terminal domains (100–200 aa) width and a different tissue-specific function.
and carboxy-terminal domains (100–1600 aa) that must An intermediate filament is a cable formed from
either be incorporated into the body of the cable or pro- these ropes. The filament is a helical polymeric protein
508 Symmetry
that has a (–1, 3) surface lattice. In the cables of keratin in Amos, L.A., & Klug, A. (1975) Three-Dimensional Image
a fiber of wool, the rise for each step of the left-handed Reconstructions of the Contractile Tail of T4 Bacteriophage, J.
single-stranded helix generating the structure is Mol. Biol. 99, 51–73.
6.7 nm,310 and each step is –111 ∞ from the preceding
one.311 The measured mass of protein in each nanometer
of an intermediate filament indicates that its cross sec- Heterologous Oligomeric Proteins
tion contains about 30 strands of a-helical polypeptide312
and that each rope is a coiled coil of two parallel The oligomeric proteins described so far, with a few
a helices. Adjacent ropes are antiparallel to each other exceptions, have been homooligomers of identical sub-
and staggered, and in the resulting staggered pattern units. There are many proteins in which the subunits are
there are two different alignments, one in which the not identical to each other but are homologous, such as
ropes are staggered by half their length and the other in the a and b subunits of hemoglobin or the VP1, VP2, and
which they are side by side, unstaggered.307,308 The pre- VP3 subunits of the protein coat of poliovirus (Figure
cise arrangement of the ropes within the helical lattice of 9–28). In such cases, the homologous subunits are
the cable, however, is as yet unknown. arrayed around rotational axes of pseudosymmetry that
In all of the cables that have been discussed, the are the descendants of the rotational axes of symmetry of
faces and interfaces can be divided into three groups. their homooligomeric ancestral proteins. For example,
There is a continuous interface between or among the there are seven distinct b subunits, each with a unique
strands, containing within itself the central axis of sequence and each present in two copies, in the 20S mul-
the rope and twisting around that axis with the twist of ticatalytic endopeptidase complex from S. cerevisiae.
the rope. Between the ropes in the cable there are inter- Although they have distinct sequences, their tertiary
faces, but they are formed from faces on each rope com- structures are all homologous and the 14 b subunits in
posed of small regions of surface on each strand the oligomer are arranged at the center of the protein
alternating between or among the strands as the rope is with dihedral pseudosymmetry of point group 722(D7).316
ascended (Figure 9–33). The cables themselves may have In the homologous complex from Thermoplasma aci-
faces on them to promote side to side associations, and dophilum, representing the ancestor of the complex
these faces are composed of small regions of surface on from S. cerevisiae, the 14 b subunits are all identical to
strands from the same or different ropes that are encoun- each other, homologous to those from S. cerevisiae, and
tered in turn as the cable is ascended (Figure 9–34). also arranged with the same dihedral symmetry.116 Often
There is a class of polymers that form extracellularly the homologous subunits in a heterooligomer are inter-
during the progress of systemic amyloidosis, maturity- changeable with each other, as are those in fructose-bis-
onset diabetes, Alzheimer’s disease, and spongiform phosphate aldolase (Figure 8–18) or those in the dimer of
encephalopathy, which are diseases affecting mammals. creatine kinase.317 In all of these uncertain het-
The diffraction of X-radiation by these fibers shows erooligomers, there is no significant difference between
reflections arising from a repeat of 0.48 nm along the axis the pseudosymmetric arrangement of their subunits
of the fiber.313–315 This dimension is the spacing between and the symmetric arrangement of the identical subunits
the strands of a continuous b sheet (Figure 6–9) with in their homooligomeric siblings or homooligomeric
individual b strands perpendicular to the axis of the fiber. ancestors.
The b sheets in these polymers can be either parallel There are also heterooligomeric proteins formed
b structure315 or antiparallel b structure.313 The length of from two or three distinct, unrelated subunits each pres-
each strand in one of the continuous b sheets forming ent in equal numbers of copies and held together by het-
these polymers varies from 2.5 nm (7 aa) to 3.5 nm erologous interfaces. A heterologous association is the
(10 aa) depending on the protein forming the fiber. This association between two folded polypeptides of unre-
length determines the width of the ribbon formed by lated sequence and unrelated tertiary structure. A het-
each of the continuous b sheets. Several of these ribbons erologous interface is the interface between two
of continuous b sheet are packed against each other unrelated subunits in heterologous association. The het-
about 1 nm apart, a spacing determined by the interdig- erologous association between two unrelated subunits
itations of the side chains (Figure 6–32) to create a fibril produces the protomer of aspartate carbamoyltrans-
about 3–4 nm in width. The b sheets can twist as they ferase.
usually would (Figure 6–9) to give a helical twist to the Aspartate carbamoyltransferase (Figure 9–37A)318 is
ribbons that are packed together,315 and hence to the a hexamer with dihedral symmetry of point group
fibril, or they can be untwisted.313 322(D3) in which each protomer contains two different
subunits, a catalytic a subunit and a regulatory b sub-
Suggested Reading unit. There are three types of interfaces holding the pro-
Kramer, R.Z., Bella, J., Mayville, P., Brodsky, B., & Berman, H.M. tein together, six interfaces among the six a subunits of
(1999) Sequence dependent conformational variations of colla- the two trimers, three interfaces between each of the
gen triple-helical structure, Nat. Struct. Biol. 6, 454–457. three pairs of b subunits, and six heterologous interfaces,
Figure 9–37: Heterologous interface in aspartate carbamoyltransferase.318,319 (A) a-Carbon
skeletal drawing of the protomers of aspartate carbamoyltransferase arranged around the molec-
ular axes of symmetry. The drawing was made from the crystallographic molecular model of this
heterododecameric protein. The six folded a polypeptides, each with 310 aa, are drawn with gray
line segments; the six folded b polypeptides, each with 152 aa, are drawn with black line seg-
ments. The view is down the 2-fold rotational axis of dihedral symmetry that passes through the
center of the b2 dimer in the rear. The two other 2-fold rotational axes of dihedral symmetry pass
through the centers of the other two b2 dimers, one to the left and one to the right. The 3-fold rota-
tional axis of symmetry is vertical, in the center of the molecule, and in the plane of the page. The
front a subunit of the upper a3 trimer forms its heterologous interface with the upper subunit of
the b2 dimer to the right to produce one of the protomers of the overall molecule. The participants
in each of the other heterologous interfaces, and hence each of the other protomers, can be
related to this one by rotations around the axes of symmetry. (B) Detailed view of the heterolo-
gous interface between a b subunit and an a subunit. The participants are drawn in exactly the
same orientation as those in the interface between the upper b subunit in the b2 dimer on the left
and the a subunit to the back left of the upper a3 trimer in panel A. Two segments from the b sub-
unit, from Proline b109 to Valine b120 and from Leucine b135 to Phenylalanine b144, are drawn
a 178 a 178
a 114 a 90 a 114 a 90
a 133 a 133
a 201 a 201
a 130 a 87 a 130 a 87
b135 b135
a 107 a 107
a 234 a 234
b144 b144
b109 b109
a 190 a 190
a 236 a 236
Heterologous Oligomeric Proteins
509
510 Symmetry
with MolScript.485
two a subunits. This drawing was produced
left above and below, into the space between
most readily observed to the right and to the
inserts two long loops of random meander,
try of point group 422(D4). Each b subunit
produce an oligomer with dihedral symme-
123 aa (drawn with thick line segments) to
and bottom by monomeric b subunits of
segments behind) are held together at top
with thin line segments in front and gray line
tein. Four a2 dimers of 2 ¥ 473 aa (drawn
crystallographic molecular model of the pro-
acea.41,321 The drawing was made from the
bisphosphate carboxylase from S. oler-
the (a2)4b8 heterohexadecamer of ribulose-
Figure 9–38: Heterologous associations in
heterologous interface between elongation factor Tu and
elongation factor Ts from E. coli, however, incorporates
27 amino acids from the latter protein and 22 amino
acids from the former, but they are situated in four sepa-
rate clusters, one of which is formed by the carboxy-ter-
minal a helix of elongation factor Ts that reaches over to
sit in a groove on the surface of elongation factor Tu.320
In ribulose-bisphosphate carboxylase from
Spinacia oleracea (Figure 9–38),41,321 heterologous inter-
faces between its constituent monomeric b subunits and
a2 dimers hold together the (a2)4b8 complex around the
rotational axes of its dihedral symmetry of point group
422(D4). Unlike the situations in aspartate carbamoyl-
transferase and the complex of elongation factors Tu and
Ts, each b subunit connects two a subunits from two dif-
ferent a2 dimers around the 4-fold rotational axis of sym-
metry. Consequently, each b subunit has two distinct
faces on its surface, one that forms a heterologous inter-
face with one a subunit from one dimer and one that
forms a different heterologous interface with another
a subunit from another dimer. The majority of each of
these two distinct faces on each b subunit is formed by
two long loops of random meander sandwiched between
the two respective a subunits (Figure 9–38). The loop
that is most detached from the b subunit and closest to
the center of the heterooligomer has a conserved
sequence and is required for proper assembly of the
complete protein.322 One side of each of these two loops
associates with one of the a subunits, and the other side
of each of these two loops, which necessarily is com-
pletely different, associates with the other of the a sub-
units. The four respective copies of the two different
interfaces alternate with each other around the 4-fold
rotational axis of symmetry. The protein from
Rhodospirillum rubrum, which lacks the gene for the
b subunit, is an a2 dimer.323
Steric exclusion and mismatched symmetry are
complications that can arise whenever an oligomer con-
tains both heterologous associations between different,
unrelated subunits and molecular rotational axes of sym-
metry relating subunits homooligomerically. Steric
exclusion is the blocking of a face for heterologous asso-
ciation on one subunit of a homooligomer by the associ-
ation of one of the heterologous subunits with the copy
of that face on another subunit of the homooligomer. For
example, the binding of a face on the extracellular
(Figures 6–23 through 6–35) in its interior. Because it domain of the immunoglobulin e receptor to one of the
incorporates a lysine, a histidine, and four glutamates, two identical, symmetrically displayed, complementary
this heterologous interface in aspartate carbamoyltrans- faces on the homodimeric Fc domains of immuno-
ferase also reemphasizes the fact that interfaces between globulin E sterically blocks the other face on that mole-
subunits incorporate twice as many charged side chains cule of immunoglobulin E from associating with another
as do interfaces between secondary structures within a molecule of the receptor.324
subunit. Transthyretin is an (a2)2 tetramer with dihedral
The heterologous interface between an a subunit symmetry. Its sole function is to bind retinol-binding
and a b subunit of aspartate carbamoyltransferase is protein. The four symmetrically arrayed faces for associ-
formed from two continuous faces, one on the surface of ation with retinol-binding protein are in two adjacent
each subunit, that fit together as a casting in a mold. The pairs situated on opposite sides of transthyretin. The two
Heterologous Oligomeric Proteins 511
identical faces in each adjacent pair, however, are too bound to a site on the succinyltransferase midway
close to the 2-fold rotational axis of symmetry around between one of its 2-fold rotational axes of symmetry and
which their two respective subunits are arrayed for each one of its 4-fold rotational axes of symmetry (Figure
to bind to a retinol-binding protein simultaneously. 9–21B). Therefore only one interface, formed from one
When a retinol-binding protein binds to one face of a face on the dimer and one face on the core, can attach
pair on one side of transthyretin, the other face of that each molecule of oxoglutarate dehydrogenase (succinyl-
pair cannot bind another retinol-binding protein transferring) to the core because if the two identical faces
because there is not enough room for it.325 Because of on the dimer of oxoglutarate dehydrogenase (succinyl-
this steric exclusion, only two of the four identical faces transferring) were both occupied with complementary
for associating with retinol-binding protein, one on each faces on the dihydrolipoyllysine-residue succinyltrans-
side of transthyretin, can be occupied at a time, and the ferase, the one 2-fold rotational axis of symmetry of the
complex is a closed b(a2)2b heterohexamer. During the dehydrogenase would have to coincide with one of the
evolution of this heterologous association, the face for 2-fold rotational axes of symmetry of the succinyltrans-
binding retinol-binding protein just happened to arise ferase rather than being displaced from it. Only in this
on the surface of a subunit of transthyretin at this site. arrangement would the two identical faces on the oxo-
The realization of the resulting heterologous interface glutarate dehydrogenase (succinyl-transferring) be able
accomplished the function of the protein even though to to associate with two complementary faces on the
an intelligent designer it has been accomplished inele- dihydrolipoyllysine-residue succinyltransferase. The
gantly. symmetry-related face on an attached dimer of the dehy-
Mismatched symmetry occurs when the rotational drogenase finds itself empty because it is sterically inac-
axis of symmetry relating two or more identical faces on cessible to a complementary face on another subunit of
one oligomer is of a different fold from or does not coin- the succinyltransferase. This arrangement is a result of
cide with the rotational axis of symmetry relating the two the fact that, as might be expected, the complementary
or more identical complementary faces on a different faces on the heterologous partners, the octahedral core
oligomer during the formation of the heterologous asso- and the dimer, happened to evolve with no concern for
ciation between the two oligomers. their positions relative to the axes of symmetry.
The 2-oxoglutarate dehydrogenase complex pro- Consequently, their symmetries were mismatched.
vides an example of mismatched symmetry. It is the In addition to this mismatch of symmetry, the
multienzymatic complex responsible for the oxidative complex between dihydrolipoyllysine-residue succinyl-
decarboxylation of 2-oxoglutarate. Three different pro- transferase and oxoglutarate dehydrogenase (succinyl-
teins combine together in the complex to accomplish transferring) also displays steric exclusion. For each of
this oxidative decarboxylation: dihydrolipoyllysine- the 24 equivalent faces on the octahedral tetracosamer of
residue succinyltransferase, oxoglutarate dehydrogenase dihydrolipoyllysine-residue succinyltransferase that is
(succinyl-transferring), and dihydrolipoyl dehydroge- occupied by a dimer of oxoglutarate dehydrogenase
nase. The octahedral core of the complex is formed from (succinyl-transferring), the three other identical faces
the 24 identical subunits of dihydrolipolipoyllysine- arrayed around the respective 4-fold rotational axis of
residue succinyltransferase arranged in octahedral symmetry (Figure 9–21B) remain empty326 because the
symmetry. Oxoglutarate dehydrogenase (succinyl-trans- complementary face on the octahedral succinyltrans-
ferring) is a symmetric homodimer, and dimers of this ferase to which a face on the dimer of the dehydrogenase
protein adorn the outer surface of the central octahedral attaches is too close to a 4-fold rotational axis of symme-
core. They must each be attached to the core through try and the dimer is too large to permit more than one
heterologous interfaces formed between a face on the dimer to bind to the four identical faces around this axis.
core and a face on the respective dimer. Because each of In the saturated complex between oxoglutarate dehydro-
the dimers is built around a 2-fold rotational axis of sym- genase (succinyl-transferring) and dihydrolipoyllysine-
metry, it necessarily has two identical faces, each com- residue succinyltransferase, there are six heterologous
plementary to a face on the core. Because the core is interfaces joining 12 polypeptides of the former and 24
octahedral, it necessarily has 24 identical faces, each polypeptides of the latter. These nonstoichiometric
complementary to a face on a dimer of oxoglutarate ratios of subunits are dictated by both mismatched sym-
dehydrogenase (succinyl-transferring). The two identical metry and steric exclusion. Because there are only six
faces on each dimer are arrayed about its 2-fold rota- heterologous interfaces connecting these two proteins
tional axis of symmetry; the 24 faces on the core are each together, only six of the 12 subunits of the dehydrogen-
arrayed about its octahedral rotational axes of symmetry. ase are directly attached to the succinyltransferase, and
Careful examination of electron micrographs of 18 of the faces on the succinyltransferase and six of the
complexes containing one tetracosamer of dihy- complementary faces on the dimers of the dehydroge-
drolipoyllysine-residue succinyltransferase but only two nase remain unoccupied.
dimers of oxoglutarate dehydrogenase (succinyl-trans- There are other examples within the family of oxo-
ferring)326 showed that a dimer of the dehydrogenase is acid dehydrogenase complexes in which the stoichiome-
512 Symmetry
ing was produced with MolScript.485
This peptide is essential for the formation of the complex. This draw-
the nonapeptide LLFEYPVYV, drawn with the thickest line segments.
ated in the lower right of the drawing. At the center of the complex is
tor is drawn with line segments of intermediate thickness and is situ-
Leucine 114) of the b subunit of isoform B7 of the human T-cell recep-
The amino-terminal immunoglobulin modular domain (Glycine 3 to
with thin line segments and is situated in the lower left of the drawing.
of the a subunit of isoform B7 of the human T-cell receptor is drawn
terminal immunoglobulin modular domain (Leucine 1 to Arginine 111)
modular domain is in the upper right of the drawing. The amino-
gen, is drawn with thin line segments. This single immunoglobulin
b2 microglobulin, which is the b subunit of the histocompatibility anti-
(Table 7–7) at the top left of the drawing. An entire molecule (99 aa) of
domain of the three domains is the immunoglobulin modular domain
oblate disk in the center of the complex, and the carboxy-terminal
thick line segments. The first two domains of this protein form the
Threonine 182, and Aspartate 183 to Glutamate 275) are drawn with
compatibility antigen A-2 (Glycine 1 to Alanine 90, Glycine 91 to
amino-terminal domains of the a subunit of human HLA class I histo-
graphic molecular model of this complex.332,333 The three extracellular
T-cell receptor (TCRb). The skeletal drawing is from the crystallo-
T-cell receptor (TCRa), and the b subunit of the B7 isoform human
b2 microglobulin (b2mic), the a subunit of the B7 isoform of human
human HLA class I histocompatibility antigen A-2 (HCAa),
Figure 9–39: Heterologous associations among the a subunit of
mophilus, only one of the two symmetrically displayed
faces on dimeric dihydrolipoyl dehydrogenase is able to
associate with the icosahedral dihydrolipoyllysine-
residue acetyltransferase because these faces are located
too close to the 2-fold rotational axis of symmetry on the
dimer to accommodate two of the complementary faces
on dihydrolipoyllysine-residue acetyltransferase simul-
taneously.327 In the pyruvate dehydrogenase complex
from S. cerevisiae, only one molecule of E3-binding pro-
tein can bind to each pentagonal face of the icosahedral
dihydrolipoyllysine-residue acetyltransferase even
though each pentagonal face must have five identical
sites for associating with E3-binding protein.328 That this
low stoichiometry results from steric exclusion follows
from the fact that when the pentagonal face is made less
crowded by truncating the dihydrolipoyllysine-residue
acetyltransferase, more molecules of E3-binding protein
can associate with it.
The stoichiometry of the heterologous association
of one homooligomer with another can also be affected
by conformational changes resulting from the associa-
tion itself. Only one low-affinity immunoglobulin
g Fc region receptor can associate with the symmetrically
dimeric Fc fragment of immunoglobulin G. Its associa-
tion induces an asymmetric conformational change in
the Fc fragment. This conformational change distorts the
other face on the Fc fragment so that it cannot assume
HLAa
Many heterooligomers are complexes between two between syntaxin-1A, synaptobrevin-II, and SNAP-
proteins held together by transitory heterologous asso- 25B.342 In the interface between protein G from
ciations rather than permanent heterologous associa- Streptococcus and murine immunoglobulin G, three
tions. These transitory complexes dissociate and b strands from the immunoglobulin and four b strands
associate during the lifetimes of their constituent pro- from the protein G (each six amino acids long) form a
teins as required by their function. For example, when continuous, paradigmatic antiparallel b sheet.343 In the
3¢,5¢-cyclic AMP is bound by the regulatory b subunits of heterologous association between the interleukin-1
cyclic AMP-dependent protein kinase, the catalytic receptor and interleukin-1b, the three consecutive
a subunits dissociate from them.334,335 The complex immunoglobulin modular domains of the receptor wrap
between the HLA class I histocompatibility antigen A-2 around the interleukin, surrounding it on three sides.344
and the B7 isoform of the T-cell receptor forms only after The leucine-rich repeats of ribonuclease inhibitor wrap
the histocompatibility antigen has bound a short peptide around ribonuclease, also surrounding it on three
of a sequence recognized by the T-cell receptor (Figure sides,345 as do 13 of the 18 HEAT repeats of karyo-
9–39). pherin b2 that wrap around GTP-binding nuclear protein
Many asymmetrical heterooligomers, both those RAN.346
that are permanent and those that are transitory, contain An examination347 has been made of the composi-
folded polypeptides composed of multiple copies of tion of heterologous interfaces that form transitorily
modular domains (Tables 7–7 and 7–8). For example, between two proteins during the performance of their
both the nidogen and the laminin in the permanent normal function. The fraction of the accessible surface
equimolar complex between these two proteins336 con- area of such a transitory interface formed by nonpolar
tain multiple copies of EGF modular domains among atoms (0.56) is somewhat lower than that in the inter-
others. Both the HLA class I histocompatibility antigen faces holding together permanent oligomers (0.65) and is
A-2 and the T-cell receptor are composed of internally indistinguishable from that on the solvent-accessible
repeating domains, most of which are immunoglobulin surface of a small globular protein (0.57). The fractions
modular domains (Figure 9–39). formed by polar but uncharged atoms (0.24) and polar
One of the main functions of modular domains is to and charged atoms (0.19) are consequently somewhat
participate in heterologous associations.337 For example, higher than those in the interfaces of permanent
the carboxy-terminal EF hand of a actinin forms a com- oligomers (0.22 and 0.13, respectively). The same eleva-
plex with the seventh Z-repeat of titin in which the tion in the fraction of arginine that is observed in the
former (see Figure 7–17) wraps around the single a helix interfaces of permanent oligomers is also observed in the
that comprises the latter.338 The WW modular domain of interfaces of transitory oligomers (0.10 of the amino
dystrophin forms a complex with the proline-rich motif acids in the interfaces) and the same decreases in aspar-
on b-dystroglycan in which a segment of the latter 5 aa tate and glutamate, but the transient interfaces also show
long lies in extended conformation within a complemen- a significant elevation in tyrosine (0.09), a hydrophilic
tary groove on the former.339 The fifth and sixth EF hands hydrophobe, and significant decreases in valine (0.04)
of thrombomodulin form a complex with the globular and leucine (0.05), both hydrophobic hydrophobes, rela-
protein a-thrombin through an interface that is formed tive to the composition of permanent interfaces (0.05,
from two complementary faces, one including surfaces 0.07, and 0.11, respectively). An extreme example of the
from both of the EF hands and the other a flat continu- elevation in the fraction of charged side chains in tran-
ous surface on the a-thrombin.340 sitory interfaces is the interface between human HLA
It is their lack of exact symmetry, their transitory class I histocompatibility antigen Cw3 and killer cell
nature, and their modularity that distinguishes asym- immunoglobulin-like receptor 2DL2 in which there are
metric heterooligomers held together entirely by heterol- five arginines, three aspartates, three glutamates, and
ogous associations from homooligomers. two lysines.348
The interfaces producing heterologous associa- The elevation in the polarity of transitory interfaces
tions, either permanent ones between two unrelated is in keeping with the fact that these interfaces are not
subunits or transitory ones between two unrelated pro- permanent features; and consequently, the complemen-
teins, are as variable in their structure as the interfaces tary faces on the two proteins must be exposed to the
between two identical subunits. They can be almost aqueous phase through much of their lives. Within the
planar interfaces between two flat faces, as is the inter- transitory, heterologous interface between a-thrombin
face between T-cell receptor and HLA class I histocom- and thrombomodulin, however, most of the side chains
patibility antigen (Figure 9–39);333 they can involve are hydrocarbon. In this instance, the problem of the sol-
terminal segments from one subunit that embrace the ubility of the separated proteins is solved by surrounding
body of the other subunit, as in the interface between the the hydrophobic patch forming the face on a-thrombin
a subunit and b subunit in nitrile hydratase from with a high density of positively charged side chains and
Rhodococcus;341 or they can be a coiled coil of a helices, the hydrophobic patch on thrombomodulin with a high
one from each of the participants, as in the complex density of negatively charged side chains.340 Only five of
514 Symmetry
these charged side chains actually participate in the another protein or in a portion of polypeptide in the inte-
interface itself. rior of its sequence of amino acids that loops out from its
A common strategy for heterologous association is surface.355 Nuclear localization signals are short struc-
the specific binding of a disordered, structureless seg- tureless sequences (5–20 aa) containing several lysines
ment of polypeptide within one protein by a structured and arginines356 on the surfaces of proteins destined for
binding site on another. Upon the binding of the struc- the nucleus of a cell. Such structureless segments are rec-
tureless partner to the structured partner, the structure- ognized by being bound to a structured domain of the
less segment of polypeptide assumes a structure nuclear import factor karyopherin a, composed of 10
complementary to the structured site.349 In this type of internally repeated armadillo modular domains.357 The
heterologous association, it is only the amino acid Eps15 homology domains are modular domains that
sequence of the disordered segment and not its confor- bind short segments of polypeptide containing the
mation that is recognized by the structured binding site. sequence -Asn-Pro-Phe-358 and in this way produce het-
Consequently, the structureless partner can be mim- erologous associations between a protein containing an
icked by a synthetic peptide of the proper sequence. For Eps15 homology domain and a protein containing a
example, the association between troponin I and tro- structureless segment of this sequence within its folded
ponin C can be blocked by a synthetic peptide with a polypeptide.
sequence only 12 aa in length from the center of tropo- Historically, homooligomers with globular sub-
nin I.350 In such a situation, each of the members of a set units, all arranged around rotational axes of symmetry,
of overlapping synthetic peptides comprising the com- accounted for most of the proteins initially purified and
plete sequence of the protein providing the structureless studied. There are several reasons for this fact. Most of
partner for the interface can be assayed for its ability to the proteins present at high concentrations in the cyto-
inhibit the heterologous association of the intact protein plasm and most of the proteins that have enzymatic
in order to identify the segment that participates in the activity, and hence for which there are obvious assays,
association.351 are globular homooligomers. Globular homooligomers
Importin-b associates with importin-a by binding are also compact, sturdy, and resistant to degradation
the structureless, highly charged (50% arginine, lysine, by endopeptidases. Consequently, globular, homo-
aspartate, and glutamate) amino-terminal segment oligomeric enzymes were the easiest proteins to purify.
(40 aa) of importin-a. The amino-terminal 876 aa of Proteins the sole function of which is to participate
importin-b form 19 internally repeating, modular HEAT in heterologous associations are more difficult to purify.
domains (46 aa), each consisting of a hairpin of two a he- Such proteins are often assembled by those associations
lices. In the crystallographic molecular model, the 38 into large, heterogeneous polymeric matrices, and until
a helices of these 19 internally repeating, modular recently, identifying the partners involved in a particular
domains wrap in a spiral around a synthetic peptide of heterologous association has been difficult. Proteins the
44 aa with the amino acid sequence of the amino-termi- enzymatic activities of which are regulated by transient
nal segment of importin-a. The formation of the complex heterologous associations with other proteins or the
induces the carboxy-terminal 28 amino acids of the oth- substrates of which are other proteins are difficult to
erwise structureless synthetic peptide to form an a helix assay because several components must be mixed
aligned with the 38 a helices from importin-b,352 in a together in the proper ratio. Proteins that control the
structure resembling a thick section from the trunk of a enzymatic activities of the classical enzymes are present
palm. In the natural heterologous association between in much lower concentrations than those enzymes.
importin-a and importin-b, the amino-terminal segment Nevertheless, it is proteins engaging in heterologous
of importin-a, represented by the synthetic peptide in associations that form large macromolecular structures
the crystallographic molecular model, is inserted into the within the cell and between cells, that control the metab-
middle of the spiral formed by importin-b to form a olism carried out by the classical enzymes, and that reg-
strong complex between the two proteins. ulate the expression of genes. Recently, proteins involved
There are a large number of examples of such het- in such functions have been purified, identified, and
erologous associations mediated by structureless expressed in amounts high enough to be studied func-
sequences of amino acids. The structureless segment can tionally and structurally.
be located anywhere in the amino acid sequence of a These new proteins (Table 9–4) are new only
protein. Annexin II associates with protein p11 by bind- because they are present normally in low concentrations,
ing its amino-terminal 12 amino acids.353 Tumor necro- are difficult to assay, or for some other reason are diffi-
sis factor receptor-associated factor 2 is a globular cult to purify. In addition to their novelty, however, they
trimer, each subunit of which has a binding site for a all seem to share the property of participating in heterol-
sequence of six amino acids in the structureless carboxy- ogous associations, either among their unrelated sub-
terminal portion of the CD40 tumor necrosis factor units or with other proteins. Most of these new proteins
receptor.354 PDZ Domains can bind to a structureless are peculiar to eukaryotic cells and the tissues of multi-
sequence located either at the carboxy terminus of cellular organisms. Now that complete genomes are
Heterologous Oligomeric Proteins 515
available for a number of eukaryotes, it has become clear nuclear pore to networks for nuclear import and for cel-
that most of the proteins encoded by the genes in those lular regulation.
genomes are new proteins. The examples listed in Table Many of these proteins are responsible for shifts in
9–4 illustrate various properties of these new proteins. the steady state of the cell such as changes in metabolism
One of the most unfortunate features of these new or the initiation of growth. Consequently, they must
proteins is the chaos of their nomenclature. Often the detect changes and respond to change by altering the
same protein from two different species of organisms will heterologous associations in which they participate.
have completely unrelated names. Often the proteins are Proteins such as SHC transforming protein 1 and proto-
designated by a number that either derives from the ini- oncogene tyrosine-protein kinase ABL1 form heterolo-
tial genetic screen or from the initial, invariably inaccu- gous associations with some proteins only when
rate estimate of the length of their constituent particular tyrosines on the surfaces of those proteins
polypeptide by electrophoresis on gels cast in solutions have been phosphorylated. In this way, they recognize
of dodecyl sulfate. Often the heterologous subunits of changes produced by intracellular signalling. The het-
one of these proteins will each have its own peculiar erologous associations in which cyclin participates are
name and the complex between them has another, unre- permanent, but their status is transitory, changing sys-
lated name. One has the suspicion that such confusion is tematically as the cyclin is rapidly degraded and then
intended to discourage individuals outside the narrow more is synthesized. Protein kinases form specific, tran-
field of investigators interested in one or the other of sitory heterologous associations with their substrate pro-
these proteins from learning about them or even realiz- teins in order to phosphorylate specific serines,
ing that they are normal, unremarkable proteins. threonines, or tyrosines on their surfaces.
Most of the new proteins contain internally repeat- Some of the heterologous associations, such as
ing domains or widely distributed modular domains464 those between a actinin and actin or between laminin
or both of these types of domains. The sole function of and nidogen, are exclusive. Others, such as those
many of the modular domains, such as the SH2 domain between ankyrin and various proteins embedded in the
of proto-oncogene tyrosine-protein kinase ABL1, is to plasma membrane serving as sites of attachment for the
form heterologous associations with other proteins. cytoskeleton or those between the SH2 modular domains
Many of these proteins, such as laminin a1b1g1, on proto-oncogene tyrosine-protein kinase ABL1 or
integrin a2b1, and guanine nucleotide binding protein SHC transforming protein 1 and an array of proteins con-
G(s), are heterooligomers. The b1 and g1 subunits of taining phosphorylated tyrosines signalling changes in
laminin are homologous, but the a1 subunit is unrelated the regulatory status of the cell, are promiscuous.
to the others. The subunits of the integrin are unrelated Transcription initiation factor TFIID recognizes TATA
to each other, as are those of the guanine nucleotide boxes that precede many different genes and then initi-
binding protein. The heterologous associations between ates the assembly of the large complex of different pro-
the subunits of integrin a2b1 are permanent, as are those teins responsible for initiating transcription.
between the subunits of laminin, but the heterologous Proteins involved in these networks of interactions
associations between the a, b, and g subunits of guanine responsible for particular functions are often identified
nucleotide binding protein G(s) are transitory, and they and assigned a role on the basis of the heterologous asso-
dissociate and reassociate during its normal operation. ciations themselves. There are several ways to detect a
Almost all of these new proteins are participants in heterologous association between two proteins. A com-
extensive, intricate networks of heterologous associa- plex between two native proteins can be detected by
tions among many proteins.465,466 Each of these networks their coelectrophoresis.459 A complex between two pro-
is responsible for a global function such as the produc- teins can be immunoprecipitated with immunoglobulins
tion of the extracellular matrix or controlling the growth specific for one of the two, and the fact that the other
and multiplication of the cell. Proteins such as SHC coprecipitates demonstrates the existence of the com-
transforming protein 1 form heterologous associations plex. Glutathione transferase can be fused to a protein
with many different partners and act as hubs in these during its expression, and any proteins that participate in
networks. Proteins such as gelsolin associate with only heterologous associations with that protein can be iden-
one or two proteins at the dead ends in a network, while tified after isolation of the complex by affinity adsorption
proteins such as a actinin 1 and guanine nucleotide with a solid phase to which glutathione has been
binding protein G(s) link one protein to the next protein attached.467
within the spokes radiating from the hubs. Some of these It is also possible to screen a library by phage dis-
proteins can connect one network to another network. play468 for a cDNA or a gene encoding a protein that par-
Integrin a2b1 connects the network of heterologous ticipates in a heterologous interaction with a protein of
associations forming the extracellular matrix to the net- interest. Fragments of cDNA or genomic DNA are
works for the cytoskeleton and for cellular regulation inserted at a particular position in the gene encoding the
through protein kinases. Nucleoporin Nup214 connects coat protein pIII of the f1 or M13 bacteriophage. A popu-
the network of heterologous associations forming the lation of E. coli is then infected with these bacteriophage.
516
Symmetry
Table 9–4: Examples of New Proteins
E-cadherin359 a2 cadherin modular (5) cellular adhesion integrin aEb7,360 b-catenin,361 other molecules of
(728 aa) serine-rich modular (1) E-cadherin362,363
integrin a2b1364 ab von Willebrand factor attaches cell to extracellular collagen,365 laminin,366 chondroadherin,367 interstitial
(1152 aa, 778 aa) type A modular (2), matrix collagenase,360 filamin,368 a actinin,369 skelemin,370
cysteine-rich repeat (4), integrin cytoplasmic associated protein I,371
FG-GAP repeat (7) receptor 1 for activated protein kinase C,372 paxillin,373
focal adhesion kinase,373 integrin-linked kinase,374
calnexin375
laminin a1 b1 g1376,377 abg laminin G modular (5), extracellular matrix nidogen,378 integrin a2b1,366 thrombospondin379
(3058 aa, 1765 aa, 1576 aa) EGF modular (41),
laminin modular (2),
laminin amino-terminal
modular (3)
vitronectin380–382 a hemopexin modular (2) protein in serum and integrin aVb1,383–385 proteoglycan,386
(459 aa) extracellular matrix plasminogen activator inhibitor type 1387
ankyrin 1388,389 a ankyrin repeat (23), cytoskeleton anion exchanger,390 spectrin,391 Na+, K+-exchanging
(1880 aa) death modular (1) ATPase,392 Na+ channel,393 neuroglian,394 CD44 antigen395
a actinin 1396 a2 calponin modular (2), cytoskeleton actin,396 vinculin,397–399 titin338,400,401
(892 aa) spectrin modular (4),
calcium-binding EF-hand
modular (2)
gelsolin402,403 a gelsolin repeat (6) sculpts actin actin,402 caspase-3404
(755 aa)
synaptotagmin I405,406 a C2 modular (2) controls traffic of synaptic neurexin,407 syntaxin,408 clathrin assembly protein 2409
(422 aa) vesicles
nucleolin410,411 Asp/Glu-rich repeat (3), nucleolar component412 chromatin,413 preribosomal particles,414
(706 aa) RNA-binding modular (4), insulin receptor substrate I,415 nucleophosmin416
nucleolin repeat (8)
nucleoporin Nup214417 Pro/Ser/Thr-rich region (1), component of nuclear pore other nucleoporins, mitogen-activated protein kinase,418
(2090 aa) coiled coil (2), nucleoporin nuclear RNA export factor 1,419 CRM1 protein,420
FG repeats (40) CREB-binding protein421
HLA type I histocompatibility ab immunoglobulin modular (4) mediates immune response T-cell receptor333
antigen A-2422,423
(341 aa, 99 aa)
cyclin A2424 control of cell cycle cyclin-dependent kinase 2,425 cell division control
(432 aa) protein 2 homologue,426 protein CDC20,427
b3 endonexin428
SHC transforming protein 1429 phosphotyrosine interaction intracellular signalling activated receptors for growth factors,429,430
(583 aa) domain (1), SH2 modular (1) proteins phosphorylated at tyrosine,431–433 Grb2
protein,434 mPAL protein,435 phosphotyrosine
phosphatase-PEST,436 Gads protein437
proto-oncogene eukaryotic protein kinase intracellular signalling, protein protein substrates, proteins phosphorylated at
protein-tyrosine kinase modular (1), tyrosine kinase tyrosine, EphB2 protein tyrosine kinase,442
ABL1438–441 SH3 modular (1), Wiskott–Aldrich syndrome protein family member 1,443
(1130 aa) SH2 modular (1), actin,444 Abl interactor protein 2b,445 proteins with
Pro-rich domain (1) proline-rich segments446,447
myosin light chain kinase448 protein kinase modular (1), intracellular signalling, protein calmodulin,449 myosin regulatory light chain 2
(1914 aa) flbronectin type III modular (1), serine/threonine kinase
immunoglobulin C2 modular (1),
myosin light chain kinase
repeats, type I (5) and type II (6)
CD45 protein-tyrosine flbronectin type III modular (2), intracellular signalling proteins phosphorylated at tyrosine, semaphorin 4D,451
phosphatase450 tyrosine-protein-phosphatase protein CD2,452 protein p561ck453
(1281 aa) modular (2)
guanine nucleotide-binding abg G-protein b WD-40 repeat (7) intracellular signalling adenylate cyclase,454 b-adrenergic receptor456
protein G(s)454,455
(394 aa, 340 aa, 75 aa)
transcription initiation polyglutamine domain (1), TATA box on DNA,457 transcription initiation
factor TFIID457 factor TFIIA,457,458 transcription initiation factor
a
As listed in the SwissProt data base (www.expasy.ch). Numbers in parentheses are the numbers of each type of domain in the protein.
517
518 Symmetry
ing one protein or a portion of one protein and fusing its Such experiments often identify a cluster of amino
complementary DNA in phase with the DNA encoding acids on the surface of the crystallographic molecular
the domain of regulatory protein GAL4 that binds to the model of the protein, and this cluster is then assumed to
DNA. This acts as the bait. It is then possible to fuse DNA represent the face participating in the interface holding the
encoding the domain responsible for activation to frag- two proteins together. The heterologous interface between
ments of genomic DNA at random. If a protein that asso- human somatotropin and human somatotropin receptor
ciates heterologously with the bait happens to be was probed by site-directed mutation of the soma-
encoded by the DNA in one of these fragments, it will be totropin.481,482 Sixty-six side chains, located consecutively
caught, the colony containing that fragment will turn in three segments of the overall amino acid sequence of
blue, and the DNA in the fragment can be sequenced to somatotropin, were mutated one by one to alanine, and
identify the protein it encodes.475 the effect of each of these mutations on the association
This assay has been automated and applied to dis- between somatotropin and its receptor was quantified by
cover large numbers of heterologous associations. For measuring the dissociation constant of the complex.
example, in one of these large screenings, 957 heterolo- Fourteen of these mutations produced what were judged
gous associations involving 1004 different proteins were to be significant increases in the dissociation constant, and
identified.466 Usually, however, the fishing is for the part- those 14 side chains were found to form a cluster on the
ners of a particular protein of interest to the investigator. surface of the crystallographic molecular model of soma-
For example, when cDNA for human cyclin A (Table 9–4) totropin, which was available at that time. In the crystallo-
was used as the bait and fragments of the yeast genome graphic molecular model of the complex between human
as the fish, three positive colonies were identified.428 One somatotropin and its receptor that became available sub-
of the proteins identified in this way was cyclin-depend- sequently, seven of those 14 side chains were found to be
ent kinase inhibitor 1, a protein already known to form located within one of the interfaces.330
complexes containing cyclin A, but the other two, protein The difficulty in evaluating such sets of site-
CDC20 and b3 endonexin, represented novel associa- directed mutations is that often the change in the
tions. Once candidates for heterologous association have strength of the interaction produced by each individual
been identified with the yeast two-hybrid system, the mutation is not large,479,483 so a distinction between a
validity of the associations must be established in more mutation within the interface and one without the inter-
extensive studies of the two isolated proteins, as was done face is difficult to make. When an amino acid critically
for the associations between cyclin A and b3 endonexin428 involved in an interface is mutated, changes as large as
and between cyclin A and protein CDC20.427 500-fold in the dissociation constant have been
To establish the strength of the heterologous asso- observed,484 so it is difficult to evaluate changes of less
ciation between two proteins, its dissociation constant than 10-fold. For example, those 14 mutations judged to
can be measured. As with any measurement of a dissoci- have a significant effect on the association of soma-
ation constant, the molar concentration of the complex totropin with its receptor increased the dissociation con-
is followed as a function of the molar concentrations of stant by factors of only 4–20, while 14 of the mutations
the two unassociated proteins (Problem 5–7). The com- judged to be insignificant nevertheless increased the dis-
plex between the proto-oncogene protein c-fos and tran- sociation constant by factors of 2–3. The distinction
scription factor AP-1 could be identified by the appears to be arbitrary. Another problem is that during
quenching of a fluorescent functional group on proto- the formation of an interface of any kind, a conforma-
oncogene protein c-fos by a fluorescent functional group tional change may be required to occur in a portion of
on transcription factor AP-1, and a dissociation constant one or the other of the proteins in order for the proper fit
of 20 nM could be calculated from the changes in fluo- between the faces to be achieved. Any mutation that hin-
rescence as a function of the concentrations of the two ders this conformational change will disrupt the associa-
proteins.476 tion even if it is not in a side chain that ends up within the
The heterologous interfaces between two proteins interface. In the case of somatotropin, four of the muta-
are often probed by cross-linking477,478 or by site- tions to side chains that are not incorporated into the
directed mutation. To make sense of either of these interface, but which nevertheless did increase the disso-
types of experiments, a crystallographic molecular ciation constant between somatotropin and its receptor
model of at least one of the participants must be avail- by factors of 4–15, are to side chains that are located in a
able. In the case of site-directed mutation, changes are segment of random meander in free somatotropin that
made at particular sites on the surface of the model, and becomes ordered upon its association with the receptor.
the effects of these mutations on the strength of the asso-
ciation between the two proteins are assessed.479 It has
also been possible to identify neighbors across the inter- Suggested Reading
face by discovering mutations in one of the proteins that Cingolani, G., Petosa, C., Weis, K., & Muller, C.W. (1999) Structure
compensate for mutations in the other protein that of importin-b bound to the IBB domain of importin-a, Nature
weaken the association by restrengthening it.480 399, 221–229.
520 Symmetry
References 28. Adams, M.J., Ford, G.C., Koekoek, R., Lentz, P.J.,
McPherson, A., Jr., Rossmann, M.G., Smiley, I.E.,
1. Monod, J., Wyman, J., & Changeux, J.P. (1965) J. Mol. Schevitz, R.W., & Wonacott, A.J. (1970) Nature 227,
Biol. 12, 88–118. 1098–1103.
2. Kim, S.Y., Hwang, K.Y., Kim, S.H., Sung, H.C., Han, Y.S., 29. Lindqvist, Y., & Braenden, C.I. (1985) Proc. Natl. Acad.
& Cho, Y. (1999) J. Biol. Chem. 274, 11761–11767. Sci. U.S.A. 82, 6855–6859.
3. Holmes, K.C., Popp, D., Gebhard, W., & Kabsch, W. 30. Mattevi, A., Obmolova, G., Kalk, K.H., Westphal, A.H.,
(1990) Nature 347, 44–49. de Kok, A., & Hol, W.G. (1993) J. Mol. Biol. 230,
4. Kabsch, W., Mannherz, H.G., Suck, D., Pai, E.F., & 1183–1199.
Holmes, K.C. (1990) Nature 347, 37–44. 31. DeRosier, D.J., Oliver, R.M., & Reed, L.J. (1971) Proc.
5. Smith, P.R., Fowler, W.E., Pollard, T.D., & Aebi, U. Natl. Acad. Sci. U.S.A. 68, 1135–1137.
(1983) J. Mol. Biol. 167, 641–660. 32. Konig, P., & Richmond, T.J. (1993) J. Mol. Biol. 233,
6. van den Ent, F., Amos, L.A., & Lowe, J. (2001) Nature 139–154.
413, 39–44. 33. Steegborn, C., Messerschmidt, A., Laber, B., Streber,
7. Ohlendorf, D.H., Weber, P.C., & Lipscomb, J.D. (1987) W., Huber, R., & Clausen, T. (1999) J. Mol. Biol. 290,
J. Mol. Biol. 195, 225–227. 983–996.
8. Ohlendorf, D.H., Orville, A.M., & Lipscomb, J.D. (1994) 34. Schuller, D.J., Wilks, A., Ortiz de Montellano, P.R., &
J. Mol. Biol. 244, 586–608. Poulos, T.L. (1999) Nat. Struct. Biol. 6, 860–867.
9. Steitz, T.A., Fletterick, R.J., Anderson, W.F., & 35. Buehner, M., Ford, G.C., Moras, D., Olsen, K.W., &
Anderson, C.M. (1976) J. Mol. Biol. 104, 197–122. Rossmann, M.G. (1974) J. Mol. Biol. 82, 563–585.
10. Finch, J.T., Perutz, M.F., Bertles, J.F., & Dobler, J. (1973) 36. Buehner, M., Ford, G.C., Olsen, K.W., Moras, D., &
Proc. Natl. Acad. Sci. U.S.A. 70, 718–722. Rossmann, M.G. (1974) J. Mol. Biol. 90, 25–49.
11. Li, H., & Abelson, J. (2000) J. Mol. Biol. 302, 639–648. 37. Lamzin, V.S., Dauter, Z., Popov, V.O., Harutyunyan,
12. Bailey, D.L., Fraser, M.E., Bridger, W.A., James, M.N., & E.H., & Wilson, K.S. (1994) J. Mol. Biol. 236, 759–785.
Wolodko, W.T. (1999) J. Mol. Biol. 285, 1655–1666. 38. Lolis, E., Alber, T., Davenport, R.C., Rose, D., Hartman,
13. Borchert, T.V., Abagyan, R., Jaenicke, R., & Wierenga, F.C., & Petsko, G.A. (1990) Biochemistry 29, 6609–6618.
R.K. (1994) Proc. Natl. Acad. Sci. U.S.A. 91, 1515–1518. 39. Nikkola, M., Lindqvist, Y., & Schneider, G. (1994) J. Mol.
14. Oefner, C., & Suck, D. (1986) J. Mol. Biol. 192, 605–632. Biol. 238, 387–404.
15. Hahn, T. (1983) International Tables for 40. Harutyunyan, E.H., Kuranova, I.P., Vainshtein, B.K.,
Crystallography, Volume A. Space-Group Symmetry, D. Hohne, W.E., Lamzin, V.S., Dauter, Z., Teplyakov, A.V.,
Reidel, Dordrecht, The Netherlands. & Wilson, K.S. (1996) Eur. J. Biochem. 239, 220–228.
16. Einspahr, H., Parks, E.H., Suguna, K., Subramanian, E., 41. Andersson, I. (1996) J. Mol. Biol. 259, 160–174.
& Suddath, F.L. (1986) J. Biol. Chem. 261, 16518–16527. 42. Waldrop, G.L., Rayment, I., & Holden, H.M. (1994)
17. Bourne, Y., Abergel, C., Cambillau, C., Frey, M., Rouge, Biochemistry 33, 10249–10256.
P., & Fontecilla-Camps, J.C. (1990) J. Mol. Biol. 214, 43. Sixma, T.K., Kalk, K.H., van Zanten, B.A., Dauter, Z.,
571–584. Kingma, J., Witholt, B., & Hol, W.G. (1993) J. Mol. Biol.
18. Holden, H.M., Ito, M., Hartshorne, D.J., & Rayment, I. 230, 890–918.
(1992) J. Mol. Biol. 227, 840–851. 44. Birktoft, J.J., Rhodes, G., & Banaszak, L.J. (1989)
19. Weiss, M.S., & Schulz, G.E. (1992) J. Mol. Biol. 227, Biochemistry 28, 6065–6081.
493–509. 45. Niefind, K., Hecht, H.J., & Schomburg, D. (1995) J. Mol.
20. Tsukihara, T., Fukuyama, K., Mizushima, M., Harioka, Biol. 251, 256–281.
T., Kusunoki, M., Katsube, Y., Hase, T., & Matsubara, H. 46. Antson, A.A., Brzozowski, A.M., Dodson, E.J., Dauter,
(1990) J. Mol. Biol. 216, 399–410. Z., Wilson, K.S., Kurecki, T., Otridge, J., & Gollnick, P.
21. Bianchet, M.A., Hullihen, J., Pedersen, P.L., & Amzel, (1994) J. Mol. Biol. 244, 1–5.
L.M. (1998) Proc. Natl. Acad. Sci. U.S.A. 95, 47. Jia, Z., Vandonselaar, M., Hengstenberg, W., Quail, J.W.,
11065–11070. & Delbaere, L.T. (1994) J. Mol. Biol. 236, 1341–1355.
22. Banner, D.W., Bloomer, A.C., Petsko, G.A., Phillips, 48. Darnell, D.W., & Klotz, I.M. (1975) Arch. Biochem.
D.C., Pogson, C.I., Wilson, I.A., Corran, P.H., Furth, A.J., Biophys. 166, 651–682.
Milman, J.D., Offord, R.E., Priddle, J.D., & Waley, S.G. 49. Dewan, J.C., Grant, G.A., & Sacchettini, J.C. (1994)
(1975) Nature 255, 609–614. Biochemistry 33, 13147–13154.
23. Epp, O., Ladenstein, R., & Wendel, A. (1983) Eur. J. 50. Britton, K.L., Asano, Y., & Rice, D.W. (1998) Nat. Struct.
Biochem. 133, 51–69. Biol. 5, 593–601.
24. Sprang, S., & Fletterick, R.J. (1979) J. Mol. Biol. 131, 51. Argiriadi, M.A., Morisseau, C., Hammock, B.D., &
523–551. Christianson, D.W. (1999) Proc. Natl. Acad. Sci. U.S.A.
25. Eklund, H., Nordstreom, B., Zeppezauer, E., 96, 10637–10642.
Seoderlund, G., Ohlsson, I., Boiwe, T., Seoderberg, 52. Schulz, G.E., Schirmer, R.H., Sachsenheimer, W., & Pai,
B.O., Tapia, O., Breandaen, C.I., & Akeson, A. (1976) J. E.F. (1978) Nature 273, 120–124.
Mol. Biol. 102, 27–59. 53. Leistler, B., & Perham, R.N. (1994) Biochemistry 33,
26. Leslie, A.G., Moody, P.C., & Shaw, W.V. (1988) Proc. 2773–2781.
Natl. Acad. Sci. U.S.A. 85, 4133–4137. 54. Aleshin, A.E., Zeng, C., Bourenkov, G.P., Bartunik, H.D.,
27. Koellner, G., Luic, M., Shugar, D., Saenger, W., & Fromm, H.J., & Honzatko, R.B. (1998) Structure 6,
Bzowska, A. (1997) J. Mol. Biol. 265, 202–216. 39–50.
References 521
55. Somers, W.S., & Phillips, S.E. (1992) Nature 359, 81. Le Bras, G., Auzat, I., & Garel, J.R. (1995) Biochemistry
387–393. 34, 13203–13210.
56. Raumann, B.E., Rould, M.A., Pabo, C.O., & Sauer, R.T. 82. Riley-Lovingshimer, M.R., Ronning, D.R., Sacchettini,
(1994) Nature 367, 754–757. J.C., & Reinhart, G.D. (2002) Biochemistry 41,
57. Hegde, R.S., Grossman, S.R., Laimins, L.A., & Sigler, 12967–12974.
P.B. (1992) Nature 359, 505–512. 83. Dayhoff, M.O. (1972) Atlas of Protein Sequence and
58. Leslie, A.G. (1990) J. Mol. Biol. 213, 167–186. Structure, Vol. 5, National Biomedical Research
59. Adachi, M., Takenaka, Y., Gidamis, A.B., Mikami, B., & Foundation, Washington, DC.
Utsumi, S. (2001) J. Mol. Biol. 305, 291–305. 84. Perutz, M.F., Muirhead, H., Cox, J.M., & Goaman, L.C.
60. Ealick, S.E., Rule, S.A., Carter, D.C., Greenhough, T.J., (1968) Nature 219, 131–139.
Babu, Y.S., Cook, W.J., Habash, J., Helliwell, J.R., 85. Ip, S.H., & Ackers, G.K. (1977) J. Biol. Chem. 252, 82–87.
Stoeckler, J.D., Parks, R.E., Jr., et al. (1990) J. Biol. Chem. 86. Park, C.M. (1970) J. Biol. Chem. 245, 5390–5394.
265, 1812–1820. 87. Burrows, S.D., Doyle, M.L., Murphy, K.P., Franklin,
61. Larsson, G., Svensson, L.A., & Nyman, P.O. (1996) Nat. S.G., White, J.R., Brooks, I., McNulty, D.E., Scott, M.O.,
Struct. Biol. 3, 532–538. Knutson, J.R., Porter, D., et al. (1994) Biochemistry 33,
62. Benning, M.M., Taylor, K.L., Liu, R.Q., Yang, G., Xiang, 12741–12745.
H., Wesenberg, G., Dunaway-Mariano, D., & Holden, 88. Pereira, P.J., Bergner, A., Macedo-Ribeiro, S., Huber, R.,
H.M. (1996) Biochemistry 35, 8103–8109. Matschiner, G., Fritz, H., Sommerhoff, C.P., & Bode, W.
63. Campobasso, N., Mathews, I.I., Begley, T.P., & Ealick, (1998) Nature 392, 306–311.
S.E. (2000) Biochemistry 39, 7868–7877. 89. Kai, Y., Matsumura, H., Inoue, T., Terada, K., Nagara,
64. Tebbe, J., Bzowska, A., Wielgus-Kutrowska, B., Y., Yoshinaga, T., Kihara, A., Tsumura, K., & Izui, K.
Schroder, W., Kazimierczuk, Z., Shugar, D., Saenger, (1999) Proc. Natl. Acad. Sci. U.S.A. 96, 823–828.
W., & Koellner, G. (1999) J. Mol. Biol. 294, 1239–1255. 90. Friedman, A.M., Fischmann, T.O., & Steitz, T.A. (1995)
65. Jiang, J., Zhang, Y., Krainer, A.R., & Xu, R.M. (1999) Proc. Science 268, 1721–1727.
Natl. Acad. Sci. U.S.A. 96, 3572–3577. 91. Banerjee, R., Mande, S.C., Ganesh, V., Das, K.,
66. Matthews, B.W., Fenna, R.E., Bolognesi, M.C., Schmid, Dhanaraj, V., Mahanta, S.K., Suguna, K., Surolia, A., &
M.F., & Olson, J.M. (1979) J. Mol. Biol. 131, 259– Vijayan, M. (1994) Proc. Natl. Acad. Sci. U.S.A. 91,
285. 227–231.
67. Xia, Z.X., & Mathews, F.S. (1990) J. Mol. Biol. 212, 92. Kopp, J., Kopriva, S., Suss, K.H., & Schulz, G.E. (1999) J.
837–863. Mol. Biol. 287, 761–771.
68. Dreyer, M.K., & Schulz, G.E. (1993) J. Mol. Biol. 231, 93. Harrison, D.H., Runquist, J.A., Holub, A., & Miziorko,
549–553. H.M. (1998) Biochemistry 37, 5074–5085.
69. Luo, Y., Samuel, J., Mosimann, S.C., Lee, J.E., Tanner, 94. Alphey, M.S., Bond, C.S., Tetaud, E., Fairlamb, A.H., &
M.E., & Strynadka, N.C. (2001) Biochemistry 40, Hunter, W.N. (2000) J. Mol. Biol. 300, 903–916.
14763–14771. 95. MacRae, I.J., Segel, I.H., & Fisher, A.J. (2001)
70. Sintchak, M.D., Fleming, M.A., Futer, O., Raybuck, S.A., Biochemistry 40, 6795–6804.
Chambers, S.P., Caron, P.R., Murcko, M.A., & Wilson, 96. Frankenberg, N., Erskine, P.T., Cooper, J.B.,
K.P. (1996) Cell 85, 921–930. Shoolingin-Jordan, P.M., Jahn, D., & Heinz, D.W.
71. Zhang, R., Evans, G., Rotella, F.J., Westbrook, E.M., (1999) J. Mol. Biol. 289, 591–602.
Beno, D., Huberman, E., Joachimiak, A., & Collart, F.R. 97. Katti, S.K., Katz, B.A., & Wyckoff, H.W. (1989) J. Mol.
(1999) Biochemistry 38, 4691–4700. Biol. 205, 557–571.
72. Brejc, K., van Dijk, W.J., Klaassen, R.V., Schuurmans, 98. Cherfils, J., Morera, S., Lascu, I., Veron, M., & Janin, J.
M., van Der Oost, J., Smit, A.B., & Sixma, T.K. (2001) (1994) Biochemistry 33, 9062–9069.
Nature 411, 269–276. 99. Brandstetter, H., Kim, J.S., Groll, M., & Huber, R. (2001)
73. Emsley, J., White, H.E., O’Hara, B.P., Oliva, G., Nature 414, 466–470.
Srinivasan, N., Tickle, I.J., Blundell, T.L., Pepys, M.B., & 100. Fritz-Wolf, K., Schnyder, T., Wallimann, T., & Kabsch,
Wood, S.P. (1994) Nature 367, 338–345. W. (1996) Nature 381, 341–345.
74. Dreveny, I., Kondo, H., Uchiyama, K., Shaw, A., Zhang, 101. Helin, S., Kahn, P.C., Guha, B.L., Mallows, D.G., &
X., & Freemont, P.S. (2004) EMBO J. 23, 1030–1039. Goldman, A. (1995) J. Mol. Biol. 254, 918–941.
75. Niedenzu, T., Roleke, D., Bains, G., Scherzinger, E., & 102. Remaut, H., Bompard-Gilles, C., Goffin, C., Frere, J.M.,
Saenger, W. (2001) J. Mol. Biol. 306, 479–487. & Van Beeumen, J. (2001) Nat. Struct. Biol. 8, 674–678.
76. Lee, S.Y., De La Torre, A., Yan, D., Kustu, S., Nixon, B.T., 103. Du, X., Choi, I.G., Kim, R., Wang, W., Jancarik, J.,
& Wemmer, D.E. (2003) Genes Dev. 17, 2552–2563. Yokota, H., & Kim, S.H. (2000) Proc. Natl. Acad. Sci.
77. Mura, C., Cascio, D., Sawaya, M.R., & Eisenberg, D.S. U.S.A. 97, 14079–14084.
(2001) Proc. Natl. Acad. Sci. U.S.A. 98, 5532–5537. 104. Koellner, G., Luic, M., Shugar, D., Saenger, W., &
78. Chen, X., Antson, A.A., Yang, M., Li, P., Baumann, C., Bzowska, A. (1998) J. Mol. Biol. 280, 153–166.
Dodson, E.J., Dodson, G.G., & Gollnick, P. (1999) J. Mol. 105. Murley, L.L., & MacKenzie, R.E. (1995) Biochemistry 34,
Biol. 289, 1003–1016. 10358–10364.
79. Toney, M.D., Hohenester, E., Keller, J.W., & Jansonius, 106. Gulick, A.M., Schmidt, D.M., Gerlt, J.A., & Rayment, I.
J.N. (1995) J. Mol. Biol. 245, 151–179. (2001) Biochemistry 40, 15716–15724.
80. Shirakihara, Y., & Evans, P.R. (1988) J. Mol. Biol. 204, 107. Momany, C., Ernst, S., Ghosh, R., Chang, N.L., &
973–994. Hackert, M.L. (1995) J. Mol. Biol. 252, 643–655.
522 Symmetry
108. Webb, P.A., Perisic, O., Mendola, C.E., Backer, J.M., & 135. Groft, C.M., Beckmann, R., Sali, A., & Burley, S.K. (2000)
Williams, R.L. (1995) J. Mol. Biol. 251, 574–587. Nat. Struct. Biol. 7, 1156–1164.
109. Cooper, J.B., McIntyre, K., Badasso, M.O., Wood, S.P., 136. Muller, Y.A., Schumacher, G., Rudolph, R., & Schulz,
Zhang, Y., Garbe, T.R., & Young, D. (1995) J. Mol. Biol. G.E. (1994) J. Mol. Biol. 237, 315–335.
246, 531–544. 137. Wright, C.S. (1992) J. Biol. Chem. 267, 14345–14352.
110. Jelakovic, S., Kopriva, S., Suss, K.H., & Schulz, G.E. 138. Huber, R., Romisch, J., & Paques, E.P. (1990) EMBO J.
(2003) J. Mol. Biol. 326, 127–135. 9, 3867–3874.
111. Blickling, S., Beisel, H.G., Bozic, D., Knablein, J., Laber, 139. Lawrence, M.C., Suzuki, E., Varghese, J.N., Davis, P.C.,
B., & Huber, R. (1997) J. Mol. Biol. 274, 608–621. Van Donkelaar, A., Tulloch, P.A., & Colman, P.M. (1990)
112. Gallagher, T., Snell, E.E., & Hackert, M.L. (1989) J. Biol. EMBO J. 9, 9–15.
Chem. 264, 12737–12743. 140. Subramanya, H.S., Roper, D.I., Dauter, Z., Dodson, E.J.,
113. Yamashita, M.M., Almassy, R.J., Janson, C.A., Cascio, Davies, G.J., Wilson, K.S., & Wigley, D.B. (1996)
D., & Eisenberg, D. (1989) J. Biol. Chem. 264, Biochemistry 35, 792–802.
17681–17690. 141. Janin, J., Miller, S., & Chothia, C. (1988) J. Mol. Biol. 204,
114. Hohenester, E., Hutchinson, W.L., Pepys, M.B., & 155–164.
Wood, S.P. (1997) J. Mol. Biol. 269, 570–578. 142. De Simone, G., Menchise, V., Manco, G., Mandrich, L.,
115. Hutchinson, E.G., Tichelaar, W., Hofhaus, G., Weiss, Sorrentino, N., Lang, D., Rossi, M., & Pedone, C. (2001)
H., & Leonard, K.R. (1989) EMBO J. 8, 1485–1490. J. Mol. Biol. 314, 507–518.
116. Lowe, J., Stock, D., Jap, B., Zwickl, P., Baumeister, W., 143. Roujeinikova, A., Raasch, C., Burke, J., Baker, P.J., Liebl,
& Huber, R. (1995) Science 268, 533–539. W., & Rice, D.W. (2001) J. Mol. Biol. 312, 119–131.
117. Whitby, F.G., Luecke, H., Kuhn, P., Somoza, J.R., Huete- 144. Kuhnel, K., & Luisi, B.F. (2001) J. Mol. Biol. 313,
Perez, J.A., Phillips, J.D., Hill, C.P., Fletterick, R.J., & 583–592.
Wang, C.C. (1997) Biochemistry 36, 10666–10674. 145. Mondragon, A., & Harrison, S.C. (1991) J. Mol. Biol. 219,
118. Delbaere, L.T., Vandonselaar, M., Prasad, L., Quail, 321–334.
J.W., Wilson, K.S., & Dauter, Z. (1993) J. Mol. Biol. 230, 146. Kreusch, A., & Schulz, G.E. (1994) J. Mol. Biol. 243,
950–965. 891–905.
119. Mondragon, A., Wolberger, C., & Harrison, S.C. (1989) 147. Romao, M.J., Turk, D., Gomis-Ruth, F.X., Huber, R.,
J. Mol. Biol. 205, 179–188. Schumacher, G., Mollering, H., & Russmann, L. (1992)
120. Spraggon, G., Kim, C., Nguyen-Huu, X., Yee, M.C., J. Mol. Biol. 226, 1111–1130.
Yanofsky, C., & Mills, S.E. (2001) Proc. Natl. Acad. Sci. 148. Clausen, T., Huber, R., Laber, B., Pohlenz, H.D., &
U.S.A. 98, 6021–6026. Messerschmidt, A. (1996) J. Mol. Biol. 262, 202–224.
121. Colloc’h, N., el Hajji, M., Bachet, B., L’Hermite, G., 149. Borgstahl, G.E., Rogers, P.H., & Arnone, A. (1994) J. Mol.
Schiltz, M., Prange, T., Castro, B., & Mornon, J.P. (1997) Biol. 236, 817–830.
Nat. Struct. Biol. 4, 947–952. 150. Ji, X., Zhang, P., Armstrong, R.N., & Gilliland, G.L.
122. Voegtli, W.C., Ge, J., Perlstein, D.L., Stubbe, J., & (1992) Biochemistry 31, 10169–10184.
Rosenzweig, A.C. (2001) Proc. Natl. Acad. Sci. U.S.A. 98, 151. Feese, M.D., Kato, Y., Tamada, T., Kato, M., Komeda,
10073–10078. T., Miura, Y., Hirose, M., Hondo, K., Kobayashi, K., &
123. Schulz, G.E., & Schirmer, R.H. (1979) Principles of Kuroki, R. (2000) J. Mol. Biol. 301, 451–464.
Protein Structure, p 94, Springer-Verlag, New York. 152. Schlunegger, M.P., & Grutter, M.G. (1992) Nature 358,
124. Weiss, V.H., McBride, A.E., Soriano, M.A., Filman, D.J., 430–434.
Silver, P.A., & Hogle, J.M. (2000) Nat. Struct. Biol. 7, 153. Strater, N., Klabunde, T., Tucker, P., Witzel, H., & Krebs,
1165–1171. B. (1995) Science 268, 1489–1492.
125. Bell, C.E., & Lewis, M. (2001) J. Mol. Biol. 314, 154. Fischmann, T.O., Hruza, A., Niu, X.D., Fossetta, J.D.,
1127–1136. Lunn, C.A., Dolphin, E., Prongay, A.J., Reichert, P.,
126. Bell, C.E., Frescura, P., Hochschild, A., & Lewis, M. Lundell, D.J., Narula, S.K., & Weber, P.C. (1999) Nat.
(2000) Cell 101, 801–811. Struct. Biol. 6, 233–242.
127. Ploegman, J.H., Drent, G., Kalk, K.H., & Hol, W.G. 155. Gonzalez, B., Pajares, M.A., Hermoso, J.A., Alvarez, L.,
(1978) J. Mol. Biol. 123, 557–594. Garrido, F., Sufrin, J.R., & Sanz-Aparicio, J. (2000) J.
128. Roderick, S.L., & Matthews, B.W. (1993) Biochemistry Mol. Biol. 300, 363–375.
32, 3907–3912. 156. Hecht, H.J., Kalisz, H.M., Hendle, J., Schmid, R.D., &
129. Sielecki, A.R., Fedorov, A.A., Boodhoo, A., Andreeva, Schomburg, D. (1993) J. Mol. Biol. 229, 153–172.
N.S., & James, M.N. (1990) J. Mol. Biol. 214, 143–170. 157. Stoddard, B.L., Howell, P.L., Ringe, D., & Petsko, G.A.
130. Gilliland, G.L., & Quiocho, F.A. (1981) J. Mol. Biol. 146, (1990) Biochemistry 29, 8885–8893.
341–362. 158. Freymann, D., Down, J., Carrington, M., Roditi, I.,
131. McLachlan, A.D. (1979) J. Mol. Biol. 128, 49–79. Turner, M., & Wiley, D. (1990) J. Mol. Biol. 216, 141–
132. Crane, B.R., Siegel, L.M., & Getzoff, E.D. (1995) Science 160.
270, 59–67. 159. Luo, Y., Frey, E.A., Pfuetzner, R.A., Creagh, A.L.,
133. Priestle, J.P., Schar, H.P., & Grutter, M.G. (1988) EMBO Knoechel, D.G., Haynes, C.A., Finlay, B.B., & Strynadka,
J. 7, 339–343. N.C. (2000) Nature 405, 1073–1077.
134. Eriksson, A.E., Cousens, L.S., Weaver, L.H., & 160. Sheriff, S., Chang, C.Y., & Ezekowitz, R.A. (1994) Nat.
Matthews, B.W. (1991) Proc. Natl. Acad. Sci. U.S.A. 88, Struct. Biol. 1, 789–794.
3441–3445. 161. Weaver, T.M., Levitt, D.G., Donnelly, M.I., Stevens,
References 523
P.P., & Banaszak, L.J. (1995) Nat. Struct. Biol. 2, 187. Goldberg, J.D., Yoshida, T., & Brick, P. (1994) J. Mol.
654–662. Biol. 236, 1123–1140.
162. Schwede, T.F., Retey, J., & Schulz, G.E. (1999) 188. Matsuda, K., Mizuguchi, K., Nishioka, T., Kato, H., Go,
Biochemistry 38, 5355–5361. N., & Oda, J. (1996) Protein Eng. 9, 1083–1092.
163. Toth, E.A., Worby, C., Dixon, J.E., Goedken, E.R., 189. Xiang, S., Short, S.A., Wolfenden, R., & Carter, C.W., Jr.
Marqusee, S., & Yeates, T.O. (2000) J. Mol. Biol. 301, (1996) Biochemistry 35, 1335–1341.
433–450. 190. Boisset, N., & Mouche, F. (2000) J. Mol. Biol. 296,
164. Kostrewa, D., D’Arcy, A., Takacs, B., & Kamber, M. 459–472.
(2001) J. Mol. Biol. 305, 279–289. 191. Crick, F.H.C., & Watson, J.D. (1956) Nature 177, 473–
165. Hardman, K.D., & Ainsworth, C.F. (1972) Biochemistry 475.
11, 4910–4919. 192. Cromwell, P.R. (1997) Polyhedra, Cambridge
166. Hirotsu, S., Abe, Y., Okada, K., Nagahara, N., Hori, H., University Press, Cambridge,
Nishino, T., & Hakoshima, T. (1999) Proc. Natl. Acad. 193. Gourley, D.G., Shrive, A.K., Polikarpov, I., Krell, T.,
Sci. U.S.A. 96, 12333–12338. Coggins, J.R., Hawkins, A.R., Isaacs, N.W., & Sawyer, L.
167. Banerjee, R., Das, K., Ravishankar, R., Suguna, K., Surolia, (1999) Nat. Struct. Biol. 6, 521–525.
A., & Vijayan, M. (1996) J. Mol. Biol. 259, 281–296. 194. Hawkins, A.R., Lamb, H.K., Moore, J.D., Charles, I.G., &
168. Lott, J.S., Halbig, D., Baker, H.M., Hardman, M.J., Roberts, C.F. (1993) J. Gen. Microbiol. 139, 2891–2899.
Sprenger, G.A., & Baker, E.N. (2000) J. Mol. Biol. 304, 195. Sun, S.M., McLeester, R.C., Bliss, F.A., & Hall, T.C.
575–584. (1974) J. Biol. Chem. 249, 2118–2121.
169. Hennig, M., D’Arcy, A., Hampele, I.C., Page, M.G., 196. Marcq, S., Diaz-Ruano, A., Charlier, P., Dideberg, O.,
Oefner, C., & Dale, G.E. (1998) Nat. Struct. Biol. 5, Tricot, C., Pierard, A., & Stalon, V. (1991) J. Mol. Biol.
357–362. 220, 9–12.
170. Arnold, E., & Rossmann, M.G. (1990) J. Mol. Biol. 211, 197. Baur, H., Stalon, V., Falmagne, P., Luethi, E., & Haas, D.
763–801. (1987) Eur. J. Biochem. 166, 111–117.
171. Bennett, M.J., Choe, S., & Eisenberg, D. (1994) Proc. 198. Isupov, M.N., Dalby, A.R., Brindley, A.A., Izumi, Y.,
Natl. Acad. Sci. U.S.A. 91, 3127–3131. Tanabe, T., Murshudov, G.N., & Littlechild, J.A. (2000)
172. Liu, Y., Hart, P.J., Schlunegger, M.P., & Eisenberg, D. J. Mol. Biol. 299, 1035–1049.
(1998) Proc. Natl. Acad. Sci. U.S.A. 95, 3437–3442. 199. Kim, K.K., Kim, R., & Kim, S.H. (1998) Nature 394,
173. Valegard, K., Liljas, L., Fridborg, K., & Unge, T. (1990) 595–599.
Nature 345, 36–41. 200. Ritsert, K., Huber, R., Turk, D., Ladenstein, R., Schmidt-
174. Kim, S.J., Jeong, D.G., Chi, S.W., Lee, J.S., & Ryu, S.E. Base, K., & Bacher, A. (1995) J. Mol. Biol. 253, 151–167.
(2001) Nat. Struct. Biol. 8, 459–466. 201. Ban, N., & McPherson, A. (1995) Nat. Struct. Biol. 2,
175. Hadden, J.M., Convery, M.A., Declais, A.C., Lilley, D.M., 882–890.
& Phillips, S.E. (2001) Nat. Struct. Biol. 8, 62–67. 202. Tsao, J., Chapman, M.S., Agbandje, M., Keller, W.,
176. Saint-Jean, A.P., Phillips, K.R., Creighton, D.J., & Stone, Smith, K., Wu, H., Luo, M., Smith, T.J., Rossmann, M.G.,
M.J. (1998) Biochemistry 37, 10345–10353. Compans, R.W., et al. (1991) Science 251, 1456–1464.
177. Schymkowitz, J.W., Rousseau, F., Wilkinson, H.R., 203. Xie, Q., & Chapman, M.S. (1996) J. Mol. Biol. 264,
Friedler, A., & Itzhaki, L.S. (2001) Nat. Struct. Biol. 8, 497–520.
888–892. 204. Braden, B.C., Velikovsky, C.A., Cauerhff, A.A.,
178. Janowski, R., Kozak, M., Jankowska, E., Grzonka, Z., Polikarpov, I., & Goldbaum, F.A. (2000) J. Mol. Biol. 297,
Grubb, A., Abrahamson, M., & Jaskolski, M. (2001) Nat. 1031–1036.
Struct. Biol. 8, 316–320. 205. Trikha, J., Theil, E.C., & Allewell, N.M. (1995) J. Mol.
179. Ciglic, M.I., Jackson, P.J., Raillard, S.A., Haugg, M., Biol. 248, 949–967.
Jermann, T.M., Opitz, J.G., Trabesinger-Ruf, N., & 206. Hempstead, P.D., Yewdall, S.J., Fernie, A.R., Lawson,
Benner, S.A. (1998) Biochemistry 37, 4008–4022. D.M., Artymiuk, P.J., Rice, D.W., Ford, G.C., & Harrison,
180. Green, S.M., Gittis, A.G., Meeker, A.K., & Lattman, E.E. P.M. (1997) J. Mol. Biol. 268, 424–448.
(1995) Nat. Struct. Biol. 2, 746–751. 207. Ilari, A., Stefanini, S., Chiancone, E., & Tsernoglou, D.
181. Jeffery, C.J., Bahnson, B.J., Chien, W., Ringe, D., & (2000) Nat. Struct. Biol. 7, 38–43.
Petsko, G.A. (2000) Biochemistry 39, 955–964. 208. van Montfort, R.L., Basha, E., Friedrich, K.L., Slingsby,
182. Milburn, M.V., Hassell, A.M., Lambert, M.H., Jordan, C., & Vierling, E. (2001) Nat. Struct. Biol. 8, 1025–
S.R., Proudfoot, A.E., Graber, P., & Wells, T.N. (1993) 1030.
Nature 363, 172–176. 209. Wang, G.F., Kuriki, T., Roy, K.L., & Kaneda, T. (1993)
183. Gabelli, S.B., Bianchet, M.A., Bessman, M.J., & Amzel, Eur. J. Biochem. 213, 1091–1099.
L.M. (2001) Nat. Struct. Biol. 8, 467–472. 210. Izard, T., Aevarsson, A., Allen, M.D., Westphal, A.H.,
184. Melik-Adamyan, W.R., Barynin, V.V., Vagin, A.A., Perham, R.N., de Kok, A., & Hol, W.G. (1999) Proc. Natl.
Borisov, V.V., Vainshtein, B.K., Fita, I., Murthy, M.R., & Acad. Sci. U.S.A. 96, 1240–1245.
Rossmann, M.G. (1986) J. Mol. Biol. 188, 63–72. 211. Mattevi, A., Obmolova, G., Schulze, E., Kalk, K.H.,
185. Royer, W.E., Jr., Strand, K., van Heel, M., & Westphal, A.H., de Kok, A., & Hol, W.G. (1992) Science
Hendrickson, W.A. (2000) Proc. Natl. Acad. Sci. U.S.A. 255, 1544–1550.
97, 7107–7111. 212. Knapp, J.E., Mitchell, D.T., Yazdi, M.A., Ernst, S.R.,
186. Royer, W.E., Jr., Heard, K.S., Harrington, D.J., & Reed, L.J., & Hackert, M.L. (1998) J. Mol. Biol. 280,
Chiancone, E. (1995) J. Mol. Biol. 253, 168–186. 655–668.
524 Symmetry
213. Stoops, J.K., Baker, T.S., Schroeter, J.P., Kolodziej, S.J., B.V., King, J., Prevelige, P.E., Jr., & Chiu, W. (1996) J.
Niu, X.D., & Reed, L.J. (1992) J. Biol. Chem. 267, Mol. Biol. 260, 85–98.
24769–24775. 238. Liddington, R.C., Yan, Y., Moulai, J., Sahli, R.,
214. McKenna, R., Xia, D., Willingmann, P., Ilag, L.L., Benjamin, T.L., & Harrison, S.C. (1991) Nature 354,
Krishnaswamy, S., Rossmann, M.G., Olson, N.H., 278–284.
Baker, T.S., & Incardona, N.L. (1992) Nature 355, 239. Baker, T.S., Newcomb, W.W., Olson, N.H., Cowsert,
137–143. L.M., Olson, C., & Brown, J.C. (1991) Biophys. J. 60,
215. Montelius, I., Liljas, L., & Unge, T. (1988) J. Mol. Biol. 1445–1456.
201, 353–363. 240. Belnap, D.M., Olson, N.H., Cladel, N.M., Newcomb,
216. Grimes, J.M., Burroughs, J.N., Gouet, P., Diprose, J.M., W.W., Brown, J.C., Kreider, J.W., Christensen, N.D., &
Malby, R., Zientara, S., Mertens, P.P., & Stuart, D.I. Baker, T.S. (1996) J. Mol. Biol. 259, 249–263.
(1998) Nature 395, 470–478. 241. Metcalf, P., Cyrklaff, M., & Adrian, M. (1991) EMBO J.
217. Reinisch, K.M., Nibert, M.L., & Harrison, S.C. (2000) 10, 3129–3136.
Nature 404, 960–967. 242. Zhou, Z.H., Prasad, B.V., Jakana, J., Rixon, F.J., & Chiu,
218. Caspar, D.L.D., & Klug, A. (1962) Cold Spring Harbor W. (1994) J. Mol. Biol. 242, 456–469.
Symp. Quant. Biol. 27, 1–24. 243. Zhou, Z.H., Dougherty, M., Jakana, J., He, J., Rixon, F.J.,
219. Fuller, R.B. (1954) U.S. Patent 2682235. & Chiu, W. (2000) Science 288, 877–880.
220. Olson, A.J., Bricogne, G., & Harrison, S.C. (1983) J. Mol. 244. Trus, B.L., Booy, F.P., Newcomb, W.W., Brown, J.C.,
Biol. 171, 61–93. Homa, F.L., Thomsen, D.R., & Steven, A.C. (1996) J.
221. Rossmann, M.G., Abad-Zapatero, C., Hermodson, Mol. Biol. 263, 447–462.
M.A., & Erickson, J.W. (1983) J. Mol. Biol. 166, 37–73. 245. Athappilly, F.K., Murali, R., Rux, J.J., Cai, Z., & Burnett,
222. Canady, M.A., Larson, S.B., Day, J., & McPherson, A. R.M. (1994) J. Mol. Biol. 242, 430–455.
(1996) Nat. Struct. Biol. 3, 771–781. 246. Furcinitti, P.S., van Oostrum, J., & Burnett, R.M. (1989)
223. Wery, J.P., Reddy, V.S., Hosur, M.V., & Johnson, J.E. EMBO J. 8, 3563–3570.
(1994) J. Mol. Biol. 235, 565–586. 247. Horne, R.W., Brenner, S., Waterson, A.P., & Wildy, P.
224. Prasad, B.V., Matson, D.O., & Smith, A.W. (1994) J. Mol. (1959) J. Mol. Biol. 1, 84–86.
Biol. 240, 256–264. 248. Valentine, R.C., & Pereira, H.G. (1965) J. Mol. Biol. 13,
225. Chen, Z.G., Stauffacher, C., Li, Y., Schmidt, T., Bomu, 13–20.
W., Kamer, G., Shanks, M., Lomonossoff, G., & Johnson, 249. Ungewickell, E., & Branton, D. (1981) Nature 289,
J.E. (1989) Science 245, 154–159. 420–422.
226. Rossmann, M.G., Arnold, E., Erickson, J.W., 250. Pearse, B.M., & Robinson, M.S. (1984) EMBO J. 3,
Frankenberger, E.A., Griffith, J.P., Hecht, H.J., Johnson, 1951–1957.
J.E., Kamer, G., Luo, M., Mosser, A.G., et al. (1985) 251. Vigers, G.P., Crowther, R.A., & Pearse, B.M. (1986)
Nature 317, 145–153. EMBO J. 5, 529–534.
227. Hogle, J.M., Chow, M., & Filman, D.J. (1985) Science 252. Story, R.M., Weber, I.T., & Steitz, T.A. (1992) Nature
229, 1358–1365. 355, 318–325.
228. Luo, M., Vriend, G., Kamer, G., Minor, I., Arnold, E., 253. Watson, J.D. (1954) Biochim. Biophys. Acta 13, 10–
Rossmann, M.G., Boege, U., Scraba, D.G., Duke, G.M., 19.
& Palmenberg, A.C. (1987) Science 235, 182–191. 254. Klug, A. (1972) Fed. Proc. 31, 30–42.
229. Acharya, R., Fry, E., Stuart, D., Fox, G., Rowlands, D., & 255. Finch, J.T., & Klug, A. (1974) J. Mol. Biol. 87, 633–640.
Brown, F. (1989) Nature 337, 709–716. 256. Namba, K., Pattanayek, R., & Stubbs, G. (1989) J. Mol.
230. Liljas, L., Unge, T., Jones, T.A., Fridborg, K., Leovgren, Biol. 208, 307–325.
S., Skoglund, U., & Strandberg, B. (1982) J. Mol. Biol. 257. Klug, A., Crick, F.H.C., & Wyckoff, H.W. (1958) Acta
159, 93–108. Crystallogr. 11, 199–213.
231. Rossmann, M.G., Abad-Zapatero, C., Murthy, M.R., 258. Namba, K., & Stubbs, G. (1986) Science 231, 1401–1406.
Liljas, L., Jones, T.A., & Strandberg, B. (1983) J. Mol. 259. Namba, K., Yamashita, I., & Vonderviszt, F. (1989)
Biol. 165, 711–736. Nature 342, 648–654.
232. Erickson, J.W., Silva, A.M., Murthy, M.R., Fita, I., & 260. Amos, L.A., & Klug, A. (1975) J. Mol. Biol. 99, 51–64.
Rossmann, M.G. (1985) Science 229, 625–629. 261. Li, S., Hill, C.P., Sundquist, W.I., & Finch, J.T. (2000)
233. Choi, H.K., Tong, L., Minor, W., Dumas, P., Boege, U., Nature 407, 409–413.
Rossmann, M.G., & Wengler, G. (1991) Nature 354, 262. Otterbein, L.R., Graceffa, P., & Dominguez, R. (2001)
37–43. Science 293, 708–711.
234. Munshi, S., Liljas, L., Cavarelli, J., Bomu, W., McKinney, 263. DeRosier, D.J., & Klug, A. (1968) Nature 217, 130–
B., Reddy, V., & Johnson, J.E. (1996) J. Mol. Biol. 261, 134.
1–10. 264. Dubochet, J., LePault, J., Freeman, R., Berriman, J.A., &
235. Al-Khayat, H.A., Bhella, D., Kenney, J.M., Roth, J.F., Homo, J.C. (1982) J. Microsc. 128, 219–237.
Kingsman, A.J., Martin-Rendon, E., & Saibil, H.R. 265. McDowall, A.W., Chang, J.J., Freeman, R., Lepault, J.,
(1999) J. Mol. Biol. 292, 65–73. Walter, C.A., & Dubochet, J. (1983) J. Microsc. 131, 1–9.
236. Prasad, B.V., Prevelige, P.E., Marietta, E., Chen, R.O., 266. Moores, C.A., Keep, N.H., & Kendrick-Jones, J. (2000) J.
Thomas, D., King, J., & Chiu, W. (1993) J. Mol. Biol. 231, Mol. Biol. 297, 465–480.
65–74. 267. Crowther, R.A., DeRosier, D.J., & Klug, A. (1970) Proc.
237. Thuman-Commike, P.A., Greene, B., Jakana, J., Prasad, R. Soc. London, A 317, 319–340.
References 525
268. Cochran, W., Crick, F.H.C., & Vand, V. (1952) Acta 299. Narita, A., Yasunaga, T., Ishikawa, T., Mayanagi, K., &
Crystallogr. 5, 581–586. Wakabayashi, T. (2001) J. Mol. Biol. 308, 241–261.
269. Toyoshima, C., & Unwin, N. (1990) J. Cell Biol. 111, 300. Brown, J.H., Kim, K.H., Jun, G., Greenfield, N.J.,
2623–2635. Dominguez, R., Volkmann, N., Hitchcock-DeGregori,
270. Lowe, J., Li, H., Downing, K.H., & Nogales, E. (2001) J. S.E., & Cohen, C. (2001) Proc. Natl. Acad. Sci. U.S.A. 98,
Mol. Biol. 313, 1045–1057 and Nogales, E., Wolf, S.G., & 8496–8501.
Downing, K.H. (1998) Nature 391, 199–203. 301. Amos, L., & Klug, A. (1974) J. Cell Sci. 14, 523–549.
271. Nogales, E., Whittaker, M., Milligan, R.A., & Downing, 302. Franke, W.W., Schmid, E., Osborn, M., & Weber, K.
K.H. (1999) Cell 96, 79–88. (1978) Proc. Natl. Acad. Sci. U.S.A. 75, 5034–5038.
272. Meurer-Grob, P., Kasparian, J., & Wade, R.H. (2001) 303. Geisler, N., Fischer, S., Vandekerckhove, J., Van
Biochemistry 40, 8000–8008. Damme, J., Plessmann, U., & Weber, K. (1985) EMBO J.
273. Samatey, F.A., Imada, K., Nagashima, S., Vonderviszt, 4, 57–63.
F., Kumasaka, T., Yamamoto, M., & Namba, K. (2001) 304. Renner, W., Franke, W.W., Schmid, E., Geisler, N.,
Nature 410, 331–337. Weber, K., & Mandelkow, E. (1981) J. Mol. Biol. 149,
274. Milligan, R.A., Whittaker, M., & Safer, D. (1990) Nature 285–306.
348, 217–221. 305. Fraser, R.D., & MacRae, T.P. (1971) Nature 233,
275. Trachtenberg, S., & DeRosier, D.J. (1988) J. Mol. Biol. 138–140.
202, 787–808. 306. Fraser, R.D., MacRae, T.P., & Suzuki, E. (1976) J. Mol.
276. Jeng, T.W., Crowther, R.A., Stubbs, G., & Chiu, W. Biol. 108, 435–452.
(1989) J. Mol. Biol. 205, 251–257. 307. Steinert, P.M., Marekov, L.N., & Parry, D.A. (1993) J.
277. Miyazawa, A., Fujiyoshi, Y., Stowell, M., & Unwin, N. Biol. Chem. 268, 24916–24925.
(1999) J. Mol. Biol. 288, 765–786. 308. Steinert, P.M., Marekov, L.N., & Parry, D.A. (1993)
278. Song, Y.H., & Mandelkow, E. (1993) Proc. Natl. Acad. Biochemistry 32, 10046–10056.
Sci. U.S.A. 90, 1671–1675. 309. Geisler, N., & Weber, K. (1982) EMBO J. 1, 1649–1656.
279. Wang, H., Culver, J.N., & Stubbs, G. (1997) J. Mol. Biol. 310. Fraser, R.D.B., & MacRae, T.P. (1983) Biosci. Rep. 3,
269, 769–779. 517–525.
280. Miller, A., & Wray, J.S. (1971) Nature 230, 437–439. 311. Fraser, R.D., & MacRae, T.P. (1985) Biosci. Rep. 5,
281. Miller, A., & Parry, D.A. (1973) J. Mol. Biol. 75, 441–447. 573–579.
282. Kramer, R.Z., Bella, J., Mayville, P., Brodsky, B., & 312. Parry, D.A., & Steinert, P.M. (1999) Q. Rev. Biophys. 32,
Berman, H.M. (1999) Nat. Struct. Biol. 6, 454–457. 99–187.
283. Rich, A., & Crick, F.H.C. (1961) J. Mol. Biol. 3, 483–506. 313. Serpell, L.C., Blake, C.C., & Fraser, P.E. (2000)
284. Fraser, R.D., MacRae, T.P., & Suzuki, E. (1979) J. Mol. Biochemistry 39, 13269–13275.
Biol. 129, 463–481. 314. Blake, C., & Serpell, L. (1996) Structure 4, 989–998.
285. Kramer, R.Z., Bella, J., Brodsky, B., & Berman, H.M. 315. Sunde, M., Serpell, L.C., Bartlam, M., Fraser, P.E.,
(2001) J. Mol. Biol. 311, 131–147. Pepys, M.B., & Blake, C.C. (1997) J. Mol. Biol. 273,
286. Dolz, R., Engel, J., & Kuhn, K. (1988) Eur. J. Biochem. 729–739.
178, 357–366. 316. Groll, M., Ditzel, L., Lowe, J., Stock, D., Bochtler, M.,
287. McLaughlin, S.H., & Bulleid, N.J. (1998) Matrix Biol. 16, Bartunik, H.D., & Huber, R. (1997) Nature 386, 463–471.
369–377. 317. Wyss, M., Schlegel, J., James, P., Eppenberger, H.M., &
288. Okuyama, K., Okuyama, K., Arnott, S., Takayanagi, M., Wallimann, T. (1990) J. Biol. Chem. 265, 15900–15908.
& Kakudo, M. (1981) J. Mol. Biol. 152, 427–443. 318. Krause, K.L., Volz, K.W., & Lipscomb, W.N. (1985) Proc.
289. Li, M.H., Fan, P., Brodsky, B., & Baum, J. (1993) Natl. Acad. Sci. U.S.A. 82, 1643–1647.
Biochemistry 32, 7377–7387. 319. Ke, H.M., Lipscomb, W.N., Cho, Y.J., & Honzatko, R.B.
290. Rehn, M., & Pihlajaniemi, T. (1994) Proc. Natl. Acad. (1988) J. Mol. Biol. 204, 725–747.
Sci. U.S.A. 91, 4234–4238. 320. Kawashima, T., Berthet-Colominas, C., Wulff, M.,
291. Dublet, B., & van der Rest, M. (1991) J. Biol. Chem. 266, Cusack, S., & Leberman, R. (1996) Nature 379, 511–518.
6853–6858. 321. Knight, S., Andersson, I., & Branden, C.I. (1990) J. Mol.
292. Chu, M.L., Zhang, R.Z., Pan, T.C., Stokes, D., Conway, Biol. 215, 113–160.
D., Kuo, H.J., Glanville, R., Mayer, U., Mann, K., 322. Wasmann, C.C., Ramage, R.T., Bohnert, H.J., & Ostrem,
Deutzmann, R., et al. (1990) EMBO J. 9, 385–393. J.A. (1989) Proc. Natl. Acad. Sci. U.S.A. 86, 1198–1202.
293. Wess, T.J., Hammersley, A.P., Wess, L., & Miller, A. 323. Schneider, G., Lindqvist, Y., Braenden, C.I., & Lorimer,
(1998) J. Mol. Biol. 275, 255–267. G. (1986) EMBO J. 5, 3409–3415.
294. Fraser, R.D., MacRae, T.P., Miller, A., & Suzuki, E. 324. Garman, S.C., Wurzburg, B.A., Tarchevskaya, S.S.,
(1983) J. Mol. Biol. 167, 497–521. Kinet, J.P., & Jardetzky, T.S. (2000) Nature 406, 259–266.
295. Hodge, A.J., & Schmidt, F.O. (1960) Proc. Natl. Acad. Sci. 325. Monaco, H.L., Rizzi, M., & Coda, A. (1995) Science 268,
U.S.A. 46, 186–206. 1039–1041.
296. Meek, K.M., Chapman, J.A., & Hardcastle, R.A. (1979) J. 326. Wagenknecht, T., Francis, N., & DeRosier, D.J. (1983) J.
Biol. Chem. 254, 10710–10714. Mol. Biol. 165, 523–539.
297. McLachlan, A.D., & Stewart, M. (1975) J. Mol. Biol. 98, 327. Mande, S.S., Sarfaty, S., Allen, M.D., Perham, R.N., &
293–304. Hol, W.G. (1996) Structure 4, 277–286.
298. Wakabayashi, T., Huxley, H.E., Amos, L.A., & Klug, A. 328. Maeng, C.Y., Yazdi, M.A., & Reed, L.J. (1996)
(1975) J. Mol. Biol. 93, 477–497. Biochemistry 35, 5879–5882.
526 Symmetry
329. Sondermann, P., Huber, R., Oosthuizen, V., & Jacob, U. 357. Conti, E., Uy, M., Leighton, L., Blobel, G., & Kuriyan, J.
(2000) Nature 406, 267–273. (1998) Cell 94, 193–204.
330. de Vos, A.M., Ultsch, M., & Kossiakoff, A.A. (1992) 358. de Beer, T., Carter, R.E., Lobel-Rice, K.E., Sorkin, A., &
Science 255, 306–312. Overduin, M. (1998) Science 281, 1357–1360.
331. Somers, W., Ultsch, M., De Vos, A.M., & Kossiakoff, A.A. 359. Takeichi, M. (1990) Annu. Rev. Biochem. 59, 237–252.
(1994) Nature 372, 478–481. 360. van der Flier, A., & Sonnenberg, A. (2001) Cell Tissue
332. Ding, Y.H., Smith, K.J., Garboczi, D.N., Utz, U., Res. 305, 285–298.
Biddison, W.E., & Wiley, D.C. (1998) Immunity 8, 361. Ozawa, M., & Kemler, R. (1992) J. Cell Biol. 116,
403–411. 989–996.
333. Garboczi, D.N., Ghosh, P., Utz, U., Fan, Q.R., Biddison, 362. Shapiro, L., Fannon, A.M., Kwong, P.D., Thompson, A.,
W.E., & Wiley, D.C. (1996) Nature 384, 134–141. Lehmann, M.S., Grubel, G., Legrand, J.F., Als-Nielsen,
334. Tao, M., Salas, M.L., & Lipmann, F. (1970) Proc. Natl. J., Colman, D.R., & Hendrickson, W.A. (1995) Nature
Acad. Sci. U.S.A. 67, 408–414. 374, 327–337.
335. Kumon, A., Yamamura, H., & Nishizuka, Y. (1970) 363. Nagar, B., Overduin, M., Ikura, M., & Rini, J.M. (1996)
Biochem. Biophys. Res. Commun. 41, 1290–1297. Nature 380, 360–364.
336. Aeschlimann, D., & Paulsson, M. (1991) J. Biol. Chem. 364. Hynes, R.O. (1987) Cell 48, 549–554.
266, 15308–15317. 365. Wayner, E.A., Carter, W.G., Piotrowicz, R.S., & Kunicki,
337. Jeon, H., Meng, W., Takagi, J., Eck, M.J., Springer, T.A., T.J. (1988) J. Cell Biol. 107, 1881–1891.
& Blacklow, S.C. (2001) Nat. Struct. Biol. 8, 499–504. 366. Carter, W.G., Wayner, E.A., Bouchard, T.S., & Kaur, P.
338. Atkinson, R.A., Joseph, C., Kelly, G., Muskett, F.W., (1990) J. Cell Biol. 110, 1387–1404.
Frenkiel, T.A., Nietlispach, D., & Pastore, A. (2001) Nat. 367. Camper, L., Heinegard, D., & Lundgren-Akerlund, E.
Struct. Biol. 8, 853–857. (1997) J. Cell Biol. 138, 1159–1167.
339. Huang, X., Poy, F., Zhang, R., Joachimiak, A., Sudol, M., 368. Loo, D.T., Kanner, S.B., & Aruffo, A. (1998) J. Biol. Chem.
& Eck, M.J. (2000) Nat. Struct. Biol. 7, 634–638. 273, 23304–23312.
340. Fuentes-Prior, P., Iwanaga, Y., Huber, R., Pagila, R., 369. Otey, C.A., Vasquez, G.B., Burridge, K., & Erickson, B.W.
Rumennik, G., Seto, M., Morser, J., Light, D.R., & Bode, (1993) J. Biol. Chem. 268, 21193–21197.
W. (2000) Nature 404, 518–525. 370. Reddy, K.B., Gascard, P., Price, M.G., Negrescu, E.V., &
341. Nakasako, M., Odaka, M., Yohda, M., Dohmae, N., Fox, J.E. (1998) J. Biol. Chem. 273, 35039–35047.
Takio, K., Kamiya, N., & Endo, I. (1999) Biochemistry 38, 371. Chang, D.D., Wong, C., Smith, H., & Liu, J. (1997) J. Cell
9887–9898. Biol. 138, 1149–1157.
342. Sutton, R.B., Fasshauer, D., Jahn, R., & Brunger, A.T. 372. Liliental, J., & Chang, D.D. (1998) J. Biol. Chem. 273,
(1998) Nature 395, 347–353. 2379–2383.
343. Derrick, J.P., & Wigley, D.B. (1992) Nature 359, 752–754. 373. Schaller, M.D., Otey, C.A., Hildebrand, J.D., & Parsons,
344. Vigers, G.P., Anderson, L.J., Caffes, P., & Brandhuber, J.T. (1995) J. Cell Biol. 130, 1181–1187.
B.J. (1997) Nature 386, 190–194. 374. Hannigan, G.E., Leung-Hagesteijn, C., Fitz-Gibbon, L.,
345. Kobe, B., & Deisenhofer, J. (1995) Nature 374, 183– Coppolino, M.G., Radeva, G., Filmus, J., Bell, J.C., &
186. Dedhar, S. (1996) Nature 379, 91–96.
346. Chook, Y.M., & Blobel, G. (1999) Nature 399, 230–237. 375. Lenter, M., & Vestweber, D. (1994) J. Biol. Chem. 269,
347. Lo Conte, L., Chothia, C., & Janin, J. (1999) J. Mol. Biol. 12263–12268.
285, 2177–2198. 376. Chung, A.E., Jaffe, R., Freeman, I.L., Vergnes, J.P.,
348. Boyington, J.C., Motyka, S.A., Schuck, P., Brooks, A.G., Braginski, J.E., & Carlin, B. (1979) Cell 16, 277–287.
& Sun, P.D. (2000) Nature 405, 537–543. 377. Timpl, R., Rohde, H., Robey, P.G., Rennard, S.I.,
349. Wu, G., Chen, Y.G., Ozdamar, B., Gyuricza, C.A., Chong, Foidart, J.M., & Martin, G.R. (1979) J. Biol. Chem. 254,
P.A., Wrana, J.L., Massague, J., & Shi, Y. (2000) Science 9933–9937.
287, 92–97. 378. Fox, J.W., Mayer, U., Nischt, R., Aumailley, M.,
350. Van Eyk, J.E., Kay, C.M., & Hodges, R.S. (1991) Reinhardt, D., Wiedemann, H., Mann, K., Timpl, R.,
Biochemistry 30, 9974–9981. Krieg, T., Engel, J., & et al. (1991) EMBO J. 10,
351. Nardese, V., Longhi, R., Polo, S., Sironi, F., Arcelloni, C., 3137–3146.
Paroni, R., DeSantis, C., Sarmientos, P., Rizzi, M., 379. Mumby, S.M., Raugi, G.J., & Bornstein, P. (1984) J. Cell
Bolognesi, M., Pavone, V., & Lusso, P. (2001) Nat. Biol. 98, 646–652.
Struct. Biol. 8, 611–615. 380. Holmes, R. (1967) J. Cell Biol. 32, 297–308.
352. Cingolani, G., Petosa, C., Weis, K., & Muller, C.W. (1999) 381. Barnes, D.W., Silnutzer, J., See, C., & Shaffer, M. (1983)
Nature 399, 221–229. Proc. Natl. Acad. Sci. U.S.A. 80, 1362–1366.
353. Becker, T., Weber, K., & Johnsson, N. (1990) EMBO J. 9, 382. Hayman, E.G., Pierschbacher, M.D., Ohgren, Y., &
4207–4213. Ruoslahti, E. (1983) Proc. Natl. Acad. Sci. U.S.A. 80,
354. McWhirter, S.M., Pullen, S.S., Holton, J.M., Crute, J.J., 4003–4007.
Kehry, M.R., & Alber, T. (1999) Proc. Natl. Acad. Sci. 383. Vogel, B.E., Tarone, G., Giancotti, F.G., Gailit, J., &
U.S.A. 96, 8408–8413. Ruoslahti, E. (1990) J. Biol. Chem. 265, 5934–5937.
355. Harris, B.Z., Hillier, B.J., & Lim, W.A. (2001) 384. Pytela, R., Pierschbacher, M.D., & Ruoslahti, E. (1985)
Biochemistry 40, 5921–5930. Proc. Natl. Acad. Sci. U.S.A. 82, 5766–5770.
356. Dingwall, C., & Laskey, R.A. (1991) Trends Biochem. Sci. 385. Bodary, S.C., & McLean, J.W. (1990) J. Biol. Chem. 265,
16, 478–481. 5938–5941.
References 527
386. Suzuki, S., Pierschbacher, M.D., Hayman, E.G., 415. Burks, D.J., Wang, J., Towery, H., Ishibashi, O., Lowe,
Nguyen, K., Ohgren, Y., & Ruoslahti, E. (1984) J. Biol. D., Riedel, H., & White, M.F. (1998) J. Biol. Chem. 273,
Chem. 259, 15307–15314. 31061–31067.
387. Owensby, D.A., Morton, P.A., Wun, T.C., & Schwartz, 416. Li, Y.P., Busch, R.K., Valdez, B.C., & Busch, H. (1996)
A.L. (1991) J. Biol. Chem. 266, 4334–4340. Eur. J. Biochem. 237, 153–158.
388. Bennett, V., & Stenbuck, P.J. (1979) J. Biol. Chem. 254, 417. Kraemer, D., Wozniak, R.W., Blobel, G., & Radu, A.
2533–2541. (1994) Proc. Natl. Acad. Sci. U.S.A. 91, 1519–1523.
389. Lux, S.E., John, K.M., & Bennett, V. (1990) Nature 344, 418. Matsubayashi, Y., Fukuda, M., & Nishida, E. (2001) J.
36–42. Biol. Chem. 276, 41755–41760.
390. Hargreaves, W.R., Giedd, K.N., Verkleij, A., & Branton, 419. Katahira, J., Strasser, K., Podtelejnikov, A., Mann, M.,
D. (1980) J. Biol. Chem. 255, 11965–11972. Jung, J.U., & Hurt, E. (1999) EMBO J. 18, 2593–2609.
391. Bennett, V., & Stenbuck, P.J. (1980) J. Biol. Chem. 255, 420. Kehlenbach, R.H., Dickmanns, A., Kehlenbach, A.,
2540–2548. Guan, T., & Gerace, L. (1999) J. Cell Biol. 145, 645–
392. Koob, R., Zimmermann, M., Schoner, W., & 657.
Drenckhahn, D. (1988) Eur. J. Cell Biol. 45, 230–237. 421. Kasper, L.H., Brindle, P.K., Schnabel, C.A., Pritchard,
393. Srinivasan, Y., Elmer, L., Davis, J., Bennett, V., & C.E., Cleary, M.L., & van Deursen, J.M. (1999) Mol. Cell
Angelides, K. (1988) Nature 333, 177–180. Biol. 19, 764–776.
394. Dubreuil, R.R., MacVicar, G., Dissanayake, S., Liu, C., 422. Turner, M.J., Cresswell, P., Parham, P., Strominger, J.L.,
Homer, D., & Hortsch, M. (1996) J. Cell Biol. 133, Mann, D.L., & Sanderson, A.R. (1975) J. Biol. Chem. 250,
647–655. 4512–4519.
395. Kalomiris, E.L., & Bourguignon, L.Y. (1988) J. Cell Biol. 423. Saper, M.A., Bjorkman, P.J., & Wiley, D.C. (1991) J. Mol.
106, 319–327. Biol. 219, 277–319.
396. Ebashi, S., & Ebashi, F. (1965) J. Biochem. (Tokyo) 58, 424. Evans, T., Rosenthal, E.T., Youngblom, J., Distel, D., &
7–12. Hunt, T. (1983) Cell 33, 389–396.
397. Belkin, A.M., & Koteliansky, V.E. (1987) FEBS Lett. 220, 425. Connell-Crowley, L., Solomon, M.J., Wei, N., & Harper,
291–294. J.W. (1993) Mol. Biol. Cell 4, 79–92.
398. Wilkins, J.A., Chen, K.Y., & Lin, S. (1983) Biochem. 426. Pagano, M., Pepperkok, R., Verde, F., Ansorge, W., &
Biophys. Res. Commun. 116, 1026–1032. Draetta, G. (1992) EMBO J. 11, 961–971.
399. Wachsstock, D.H., Wilkins, J.A., & Lin, S. (1987) 427. Ohtoshi, A., Maeda, T., Higashi, H., Ashizawa, S., &
Biochem. Biophys. Res. Commun. 146, 554–560. Hatakeyama, M. (2000) Biochem. Biophys. Res.
400. Ohtsuka, H., Yajima, H., Maruyama, K., & Kimura, S. Commun. 268, 530–534.
(1997) FEBS Lett. 401, 65–67. 428. Ohtoshi, A., Maeda, T., Higashi, H., Ashizawa, S.,
401. Joseph, C., Stier, G., O’Brien, R., Politou, A.S., Atkinson, Yamada, M., & Hatakeyama, M. (2000) Biochem.
R.A., Bianco, A., Ladbury, J.E., Martin, S.R., & Pastore, Biophys. Res. Commun. 267, 947–952.
A. (2001) Biochemistry 40, 4957–4965. 429. Pelicci, G., Lanfrancone, L., Grignani, F., McGlade, J.,
402. Yin, H.L., & Stossel, T.P. (1979) Nature 281, 583–586. Cavallo, F., Forni, G., Nicoletti, I., Grignani, F., Pawson,
403. McLaughlin, P.J., Gooch, J.T., Mannherz, H.G., & T., & Pelicci, P.G. (1992) Cell 70, 93–104.
Weeds, A.G. (1993) Nature 364, 685–692. 430. Zhou, M.M., Ravichandran, K.S., Olejniczak, E.F.,
404. Kamada, S., Kusano, H., Fujita, H., Ohtsu, M., Koya, Petros, A.M., Meadows, R.P., Sattler, M., Harlan, J.E.,
R.C., Kuzumaki, N., & Tsujimoto, Y. (1998) Proc. Natl. Wade, W.S., Burakoff, S.J., & Fesik, S.W. (1995) Nature
Acad. Sci. U.S.A. 95, 8532–8537. 378, 584–592.
405. Matthew, W.D., Tsavaler, L., & Reichardt, L.F. (1981) J. 431. Cohen, G.B., Ren, R., & Baltimore, D. (1995) Cell 80,
Cell Biol. 91, 257–269. 237–248.
406. Perin, M.S., Fried, V.A., Mignery, G.A., Jahn, R., & 432. Zhou, S., Margolis, B., Chaudhuri, M., Shoelson, S.E., &
Sudhof, T.C. (1990) Nature 345, 260–263. Cantley, L.C. (1995) J. Biol. Chem. 270, 14863–14866.
407. Petrenko, A.G., Perin, M.S., Davletov, B.A., Ushkaryov, 433. Pawson, T. (1995) Nature 373, 573–580.
Y.A., Geppert, M., & Sudhof, T.C. (1991) Nature 353, 434. Rozakis-Adcock, M., McGlade, J., Mbamalu, G., Pelicci,
65–68. G., Daly, R., Li, W., Batzer, A., Thomas, S., Brugge, J.,
408. Bennett, M.K., Calakos, N., & Scheller, R.H. (1992) Pelicci, P.G., et al. (1992) Nature 360, 689–692.
Science 257, 255–259. 435. Schmandt, R., Liu, S.K., & McGlade, C.J. (1999)
409. Zhang, J.Z., Davletov, B.A., Sudhof, T.C., & Anderson, Oncogene 18, 1867–1879.
R.G. (1994) Cell 78, 751–760. 436. Charest, A., Wagner, J., Jacob, S., McGlade, C.J., &
410. Bugler, B., Caizergues-Ferrer, M., Bouche, G., Bourbon, Tremblay, M.L. (1996) J. Biol. Chem. 271, 8424–8429.
H., & Amalric, F. (1982) Eur. J. Biochem. 128, 475–480. 437. Liu, S.K., & McGlade, C.J. (1998) Oncogene 17,
411. Lapeyre, B., Bourbon, H., & Amalric, F. (1987) Proc. 3073–3082.
Natl. Acad. Sci. U.S.A. 84, 1472–1476. 438. Eckhart, W., Hutchinson, M.A., & Hunter, T. (1979) Cell
412. Orrick, L.R., Olson, M.O., & Busch, H. (1973) Proc. Natl. 18, 925–933.
Acad. Sci. U.S.A. 70, 1316–1320. 439. Reddy, E.P., Smith, M.J., & Srinivasan, A. (1983) Proc.
413. Olson, M.O., & Thompson, B.A. (1983) Biochemistry 22, Natl. Acad. Sci. U.S.A. 80, 3623–3627.
3187–3193. 440. Goff, S.P., Gilboa, E., Witte, O.N., & Baltimore, D. (1980)
414. Prestayko, A.W., Klomp, G.R., Schmoll, D.J., & Busch, Cell 22, 777–785.
H. (1974) Biochemistry 13, 1945–1951. 441. Fainstein, E., Einat, M., Gokkel, E., Marcelle, C., Croce,
528 Symmetry
C.M., Gale, R.P., & Canaani, E. (1989) Oncogene 4, 464. Marcotte, E.M., Pellegrini, M., Yeates, T.O., &
1477–1481. Eisenberg, D. (1999) J. Mol. Biol. 293, 151–160.
442. Yu, H.H., Zisch, A.H., Dodelet, V.C., & Pasquale, E.B. 465. Maslov, S., & Sneppen, K. (2002) Science 296, 910–913.
(2001) Oncogene 20, 3995–4006. 466. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson,
443. Westphal, R.S., Soderling, S.H., Alto, N.M., Langeberg, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan,
L.K., & Scott, J.D. (2000) EMBO J. 19, 4589–4600. M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B.,
444. Van Etten, R.A., Jackson, P.K., Baltimore, D., Sanders, Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang,
M.C., Matsudaira, P.T., & Janmey, P.A. (1994) J. Cell M., Johnston, M., Fields, S., & Rothberg, J.M. (2000)
Biol. 124, 325–340. Nature 403, 623–627.
445. Dai, Z., & Pendergast, A.M. (1995) Genes Dev. 9, 467. Park, J., Leong, M.L., Buse, P., Maiyar, A.C., Firestone,
2569–2582. G.L., & Hemmings, B.A. (1999) EMBO J. 18, 3024–
446. Ren, R., Mayer, B.J., Cicchetti, P., & Baltimore, D. (1993) 3033.
Science 259, 1157–1161. 468. Smith, G.P. (1985) Science 228, 1315–1317.
447. Musacchio, A., Saraste, M., & Wilmanns, M. (1994) Nat. 469. Roberts, B.L., Markland, W., & Ladner, R.C. (1996)
Struct. Biol. 1, 546–551. Methods Enzymol. 267, 68–82.
448. Pires, E.M., & Perry, S.V. (1977) Biochem. J. 167, 470. Fields, S., & Song, O. (1989) Nature 340, 245–246.
137–146. 471. Johnston, M. (1987) Microbiol. Rev. 51, 458–476.
449. Yazawa, M., Kuwayama, H., & Yagi, K. (1978) J. 472. Marmorstein, R., Carey, M., Ptashne, M., & Harrison,
Biochem. (Tokyo) 84, 1253–1258. S.C. (1992) Nature 356, 408–414.
450. Charbonneau, H., Tonks, N.K., Walsh, K.A., & Fischer, 473. Keegan, L., Gill, G., & Ptashne, M. (1986) Science 231,
E.H. (1988) Proc. Natl. Acad. Sci. U.S.A. 85, 7182–7186. 699–704.
451. Herold, C., Elhabazi, A., Bismuth, G., Bensussan, A., & 474. Pause, A., Peterson, B., Schaffar, G., Stearman, R., &
Boumsell, L. (1996) J. Immunol. 157, 5262–5268. Klausner, R.D. (1999) Proc. Natl. Acad. Sci. U.S.A. 96,
452. Verhagen, A.M., Schraven, B., Wild, M., Wallich, R., & 9533–9538.
Meuer, S.C. (1996) Eur. J. Immunol. 26, 2841–2849. 475. Chien, C.T., Bartel, P.L., Sternglanz, R., & Fields, S.
453. Marie-Cardine, A., Maridonneau-Parini, I., & Fischer, (1991) Proc. Natl. Acad. Sci. U.S.A. 88, 9578–9582.
S. (1994) Eur. J. Immunol. 24, 1255–1261. 476. Patel, L.R., Curran, T., & Kerppola, T.K. (1994) Proc.
454. Pfeuffer, T. (1977) J. Biol. Chem. 252, 7224–7234. Natl. Acad. Sci. U.S.A. 91, 7360–7364.
455. Northup, J.K., Sternweis, P.C., Smigel, M.D., Schleifer, 477. Itoh, Y., Cai, K., & Khorana, H.G. (2001) Proc. Natl.
L.S., Ross, E.M., & Gilman, A.G. (1980) Proc. Natl. Acad. Acad. Sci. U.S.A. 98, 4883–4887.
Sci. U.S.A. 77, 6516–6520. 478. Cai, K., Itoh, Y., & Khorana, H.G. (2001) Proc. Natl.
456. Brandt, D.R., Asano, T., Pedersen, S.E., & Ross, E.M. Acad. Sci. U.S.A. 98, 4877–4882.
(1983) Biochemistry 22, 4357–4362. 479. Reidhaar-Olson, J.F., De Souza-Hart, J.A., & Selick, H.E.
457. Davison, B.L., Egly, J.M., Mulvihill, E.R., & Chambon, P. (1996) Biochemistry 35, 9034–9041.
(1983) Nature 301, 680–686. 480. Jespers, L., Lijnen, H.R., Vanwetswinkel, S., Van Hoef,
458. Geiger, J.H., Hahn, S., Lee, S., & Sigler, P.B. (1996) B., Brepoels, K., Collen, D., & De Maeyer, M. (1999) J.
Science 272, 830–836. Mol. Biol. 290, 471–479.
459. Buratowski, S., Hahn, S., Guarente, L., & Sharp, P.A. 481. Cunningham, B.C., Jhurani, P., Ng, P., & Wells, J.A.
(1989) Cell 56, 549–561. (1989) Science 243, 1330–1336.
460. Nikolov, D.B., Chen, H., Halay, E.D., Usheva, A.A., 482. Cunningham, B.C., & Wells, J.A. (1989) Science 244,
Hisatake, K., Lee, D.K., Roeder, R.G., & Burley, S.K. 1081–1085.
(1995) Nature 377, 119–128. 483. Salter, R.D., Benjamin, R.J., Wesley, P.K., Buxton, S.E.,
461. Meisterernst, M., Roy, A.L., Lieu, H.M., & Roeder, R.G. Garrett, T.P., Clayberger, C., Krensky, A.M., Norment,
(1991) Cell 66, 981–993. A.M., Littman, D.R., & Parham, P. (1990) Nature 345,
462. Inostroza, J.A., Mermelstein, F.H., Ha, I., Lane, W.S., & 41–46.
Reinberg, D. (1992) Cell 70, 477–489. 484. Park, C.S., & Miller, C. (1992) Biochemistry 31,
463. Roy, A.L., Malik, S., Meisterernst, M., & Roeder, R.G. 7749–7755.
(1993) Nature 365, 355–359. 485. Kraulis, P.J. (1991) J. Applied Crystallogr. 24, 946–950.
Chapter 10
Chemical Probes of Structure
mates and lysines. The most common use of covalent In designing an experiment that involves the cova-
modification to study the function of a protein is the lent modification of a protein, the usual desire is that the
observation of the inactivation of an enzyme by cova- reagent chosen react with only one type of amino acid.
lent modification of amino acids in its active site. For Because cysteines, methionines, lysines, histidines, and
example, the chemical modification of Lysine 116 in tyrosines are similarly reactive nucleophiles, this is not a
spinach ferredoxin–NADP+ reductase inactivates the simple task. The issue of specificity is best addressed by
enzyme.4 examining the reaction of a simple alkylating agent, such
There are, however, many other purposes for cova- as iodoacetamide, with the nucleophiles present in a
lently modifying proteins. Covalent modification can be protein.
used to dissociate the subunits of a protein. For exam- Iodoacetamide has been shown to alkylate cys-
ple, succinylation of its lysines caused the hemerythrin of teines, lysines, histidines, and methionines.14 The four
Goldfingia gouldi to dissociate into its eight identical reactions are
subunits.5 Covalent modification can also be used to
change the electrophoretic mobility of a protein by con-
verting, for example, positively charged lysines into neg-
atively charged carboxylates.6 When such a modification K aCys ) H I
O
is performed reversibly, the protein will travel with a dif- S
H 1
± H+
S
H
”:
”
”:
”
ferent electrophoretic mobility before the modification
O NH 2
has been reversed than after it has been is reversed.
Covalent modification of a protein can be used to pre-
vent endopeptidolytic enzymes, for example, trypsin,
from digesting that protein at particular amino acids, for HH
example, arginine.7 Covalent modifications are also used Æ NH 2 + I-
k Cys S
”:
”
to introduce foreign functional groups into proteins.
For example, functional groups that absorb visible light8 O
or have strong fluorescence9 may be introduced so that (10–1)
their spectral properties can be used in physical studies.
When a protein is modified covalently, either the
modification is performed under conditions that pro-
duce a high yield of the desired product, and this is used HH K aLys HH
H
in further experiments, or the chemical reaction itself NH
( 1
± H+
N
O H
I
between the protein and the reagent is monitored, and
its kinetics are used to make arguments about the prop- O NH 2
erties of a particular amino acid in the native protein. For
example, the dependence of the rate of the modification
of a particular amino acid on the pH of the solution can
be used to estimate the pKa of that amino acid in the HH
N H
native protein.10,11 Differences in rate constants for the Æ ( H + I-
reaction of amino acids of a particular type with the same k Lys
electrophile can be used to assess differences in their O NH 2
accessibility to the solution in the native structure of the (10–2)
protein.12
In all of these experiments, the possibility that the
covalent modification itself disrupts the global confor-
mation of the protein must always be kept in mind. If this
happens, effects of the modification on the function of K aHis H
( I
the protein might be attributed to local changes around HN : NH 1 HN : N H
O
∫ ∫
H I H [Lys]TOT t
S H d[ Lys ]TOT
”:
H Æ (S -
””:
+ I = – k ¢Lys dt (10–10)
H 3C k Met
H 3C NH 2 [ Lys ]TOT
O NH 2 [Lys]0,TOT 0
O
(10–4)
where [Lys]0,TOT is the initial concentration of unmodified
lysine
When a protein is exposed to iodoacetamide, all four of
these amino acids disappear from amino acid analyses of
[ Lys ]TOT t
the reaction mixtures, at rates that depend on the pH,15
and the carboxymethyl products16,17 appear in concert. ln[ Lys ]TOT = – k ¢Lys t (10–11)
The first three reactions require that the amino acid be in
the form of its conjugate base. [ Lys ]0,TOT 0
d[ Lys ]TOT ln[ Lys ]TOT = ln[ Lys ]0,TOT – k ¢Lys t (10–12)
= – k Lys [ iodoacetamide ][ RH2N9 ]
dt
(10–5) and
( )
k Lys K aLys analysis or from the slope of the line obtained when
d[ Lys ]TOT
= – [ iodoacetamide ][ Lys ]TOT ln [Lys]TOT is plotted against time.
dt K aLys + [ H+ ] The other two of the first three reactions (Reactions
(10–6) 10–1 through 10–3) have a formally equivalent mecha-
nism, and the pseudo-first-order rate constants of each
of them, k¢Cys and k¢His, are of the same form as k¢Lys
where KaLys is the acid dissociation constant for lysine.
(Equation 10–7) with the appropriate rate constants, kCys
If the concentration of iodoacetamide is so large that
or kHis, and acid dissociation constants, KaCys or KaHis, sub-
it remains constant throughout the reaction and the
stituted for kLys and KaLys, respectively. The variation in
pH does not change, Equation 10–6 describes a pseudo-
each of these rate constants can be presented graphically
first-order reaction. The pseudo-first-order rate con-
(Figure 10–1).15
stant, k ¢Lys, governing the disappearance of lysine with
At values of pH greater than the pKa of lysine
time is
([H+] < KaLys), almost all of the primary amine is the con-
( )
jugate base, the rate constant k¢Lys is equal to kLys
k Lys K aLys (Equation 10–7), and the rate of the reaction of lysine
k ¢Lys = [ iodoacetamide ] (10–7) with iodoacetamide is independent of pH. At values of
K aLys + [ H+ ]
pH below pKaLys ([H+] > KaLys), the concentration of the
unprotonated conjugate base of lysine, and hence the
and rate of its reaction with iodoacetamide (Equation 10–7),
decreases by a factor of 10 (1 logarithmic unit) for each
d[ Lys ]TOT decrease of 1 unit in pH (Figure 2–6).
= – k ¢Lys[ Lys ]TOT (10–8) The rate of the reaction of the unprotonated conju-
dt gate base of histidine with iodoacetamide, which is gov-
erned solely by kHis (Reaction 10–3), is correlated to the
When this is rearranged rate of the reaction of the unprotonated conjugate base
of lysine with iodoacetamide, kLys (Reaction 10–2),
d[ Lys ]TOT through the Brønsted relationship:
= – k ¢Lys dt (10–9)
( ) ( )
[ Lys ]TOT
k His K aHis
log = – b log (10–14)
k Lys K aLys
and integrated from t = 0 to t = t
532 Chemical Probes of Structure
lysine, even though the pKa (8.7) associated with its lone
pair of electrons is less than the pKa (10.5) associated
0 Cys with the lone pair of electrons on lysine.* With the appro-
Lys priate substitutions, Equation 10–7 governs the behavior
–2 of the rate of the reaction of cysteine with iodoacetamide
log k i¢
1
1
1
O ”
”: where [AI]TOT is the total concentration of methyl acet-
O OCH 3
( N OCH 3 H 2N imidate, both conjugate acid, R¢=NH2+, and conjugate
NH 3 H base, R¢=NH.
N CH 3
H
O
CH 3 If [AI]TOT were high and constant throughout the
course of the reaction and the pH did not change,
lysine methyl tetravalent
Equation 10–16 would describe a pseudo-first-order
acetimidate intermediate
¢ , gov-
reaction. The pseudo-first-order rate constant, k AI
1 erning the disappearance of lysine with time would be
) H H ± H+ KaTV
A
O
N: N
(
CH3O + H k AI K aLys [ H+ ]
””:
k ¢AI = [ AI ]TOT
CH 3
H 3N
( OCH 3 (K aAI + [ H+ ]) (K aLys + [ H+ ])
amidine
N (10–17)
CH 3
H
O
cross-link ( k AI K aLys
N N
H H k ¢AI = [ AI ]TOT (10–19)
K aAI
+ HOCH 3
Figure 10–2: Mechanism for the reaction between the free base of and the rate of the reaction should be almost invariant
lysine and the cationic conjugate acid of methyl acetimidate.18 The with pH, and when pH < pKaAI < pKaLys
products of the modification can be either the amidine of one
lysine, the acetamide of one lysine, or the amidine of two lysines. k AI K aLys
The latter reaction produces cross-links within the protein but k ¢AI = [ AI ]TOT (10–20)
rarely occurs. [ H+ ]
range of concentrations. On the other hand, because Another way to direct the modification exclusively
water is itself a nucleophile, decomposition of the to lysines is to take advantage of the fact that primary
reagent through hydrolysis often occurs. Methyl acetim- amines such as lysine are the only functional groups on
idate, unlike iodoacetamide, reacts quite readily with a protein that react with aldehydes to form imines
water. Between pH 6.8 and 8.4, the rate constant for its (Figure 10–3). The conjugate acid of the resulting imine
hydrolysis at 20 ∞C is 0.02 min–1.19 can then be reduced with sodium borohydride or sodium
When the reagent chosen for a particular modifica- cyanoborohydride to produce the secondary amine.
tion decomposes rapidly, measurements of its rate of Both formaldehyde22 and pyridoxal 5¢-phosphate23 have
reaction with the protein are complicated by this decom- been used as the aldehyde. The former is a more reactive
position. In the case of methyl acetimidate, the situation aldehyde and produces much higher yields of alkylated
can be represented by the kinetic mechanism lysine; the latter is more selective and under the proper
conditions will modify only the most accessible and
lysine + methyl acetimidate Æ amidine nucleophilic lysines in a protein.
k1 Isothiocyanates are also specific for the primary
Æ
k2
amino groups of lysines, as well as the amino terminus
(Figure 3–1), of a protein:
products of hydrolysis
(10–21) +
H
In this situation, where hydrolysis is occurring coinci-
dentally with modification, it can be shown20 that
HH ON –H + H
N
H
N
N Æ
O C
:
( )
C
k 1 [AI]0,TOT S
f amidine = 1 – exp – (10–22) S
k2 N,N’-dialkylthiourea
(10–24)
where [AI]0,TOT is the initial total molar concentration of
methyl acetimidate and famidine is the fraction of the lysine The products of the modification are N,N ¢-dialkyl-
that has been modified when all of the methyl acetimi- thioureas. Presumably, isothiocyanates are specific for
date has been consumed either by reaction with lysine or lysine because the products they would form with the
by hydrolysis. From Equation 10–22 it follows that other nucleophilic amino acids are unstable under the
ln
( 1
1 – f amidine ) =
k1
k2
[AI]0,TOT (10–23)
+
H
O
O
HH
O 1
ied more leisurely. A series of mixtures containing the H O N H O)
N (
protein are prepared with increasing concentrations of O :
””
methyl acetimidate at constant temperature and pH. The
reaction is allowed to reach completion. The various +
H
fractions of the lysine modified are assessed. From these
OH ”:
O
reaction conditions. A similar reaction occurs with iso- The general reaction performed by these reagents is
cyanates: acyl transfer (Figure 10–4). As in synthetic organic
chemistry, the appropriate reagent is chosen on the
+ basis of its electrophilicity. The properties of the leaving
H
group X determine both the electrophilicity and the rate
at which the reagent will modify the lysines in a protein.
HH ON H H
N – H+ N N Because the leaving group departs from a Lewis acid,
O Æ C
:
the tetravalent intermediate, the tendency for the leav-
C ing group to depart from a proton will reflect its ten-
O
O dency to depart from the tetravalent intermediate
N, N'-dialkylurea (Figure 10–4). Therefore, the larger the acid dissociation
(10–25) constant KaLG of the conjugate acid of the leaving group
X, the more reactive will be the reagent. If the leaving
The products are N,N ¢-dialkylureas. Alkyl and aryl iso- group is the carboxylic acid itself, the reagent is
cyanates react with cysteine and tyrosine as well but pro- an anhydride such as trifluoroacetic anhydride
duce products that can be hydrolyzed back to the (pKaLG = 0.2) or acetic anhydride25 (pKaLG = 4.8). Other
unmodified amino acids under alkaline conditions,24 to leaving groups on acyl derivatives that have been used
leave only the lysines modified. in the modification of lysine are azide24 (pKaLG = 4.7),
A large collection of acylating agents react with N-hydroxysuccinimide26 (pKaLG = 6.0),27 imidazole25
lysine and also acylate other nucleophilic amino acids. (pKaLG = 7.0), ethyl carbonate28 (pKaLG = 7), and
ethanethiol29 (pKaLG = 10.5).
All of these acylating agents react as readily with
cysteine, tyrosine, and histidine as they do with lysine to
O form the respective S-, O-, or N-acyl derivatives. Unlike
HH R O HH
O
H H
O
K aTV R
1
± H+
N O) Æ N O
O Often an acylating agent the structure of which causes
:
X
””: R
the undesired derivatives to be particularly unstable can
be chosen. For example, the ethoxycarbonyl group is
added to tyrosine and histidine as well as lysine when
one uses the carbonic acid anhydride, diethyl pyrocar-
bonate:
O
O O CH 3
(
O
: O O
Where X = O N N N N)
O R O O C
O H C
NH
:
NH Æ H 3C O
O O
O O C
(
N: NH O
O O CH 3 S CH 3
CH 3 + CO2 + HOC 2H 5
Figure 10–4: Acylation of lysine with any one of a number of acy- diethyl pyrocarbonate
lating agents in which the acyl carbon is activated by attaching a
good leaving group. In the tetravalent intermediate, the leaving (10–26)
group is expelled, in preference to the nitrogen of the lysine, to pro-
duce the amide of the lysine. The activating groups that are used to The ethoxycarbonyl group, however, can be removed
produce the derivative of the carboxylic acid that is the acylating from the histidine and tyrosine by treatment of the mod-
agent are the carboxylic acid itself to form the anhydride,
ified protein with hydroxylamine.28
N-hydroxysuccinimide to form the N-hydroxysuccinimide ester,
azide anion to form the acyl azide, ethyl carbonic acid to form the Cyclic anhydrides such as succinic anhydride
acyl ethyl carbonate, imidazole to form the acyl imidazole, or a
thiol to form the thioester.
536 Chemical Probes of Structure
:
:O Æ
O
HH only cysteine24 and lysine by the proper choice of pH;33
N but the derivative formed with cysteine is unstable, so in
O O O the end only lysine is modified. In situations in which
O
O
H 3C CH 3
N
H Br HH
””: H (
– Br -
”:
S S
Æ
CH 3
F S O CH 3
O
sulfonium cation
10–1
(10–29)
which is a fluorescent reagent for the covalent modifica-
tion of proteins.32
Both 2,4-dinitrofluorobenzene (FDNB) and
2,4,6-trinitrobenzenesulfonate (TNBS)33 The reagent that has shown the greatest selectivity
for histidine is diethyl pyrocarbonate (Equation
10–26).28 Usually this selectivity is obtained by running
SO3- the reaction at a pH slightly below the pKa for histidine,
HH
O2 N NO2 where the greatest discrimination in favor of histidine,
N
O 1 relative to lysine (Equation 10–26), should be manifested
± H+ (Figure 10–1). Histidine is also susceptible to photooxi-
dation in the presence of dyes such as methylene blue or
NO2 rose bengal.37 Under carefully controlled conditions such
trinitrobenzenesulfonate photooxidation can be confined to histidine,37,38 but usu-
ally several other amino acids are destroyed simultane-
ously.37
H HN One of the most readily modified amino acids in a
:
O2 N NO2 – O2N NO2 of its pKa (Figure 10–1), cysteine is preferentially alky-
Æ
): lated by alkyl halides such as iodoacetamide and iodoac-
etate (Equation 3–17), but one must remain aware of the
fact that other nucleophilic amino acids can also be
NO2 NO2 modified. N-Ethylmaleimide (NEM) is another reagent
(10–28) often used to modify cysteine:39
Covalent Modification 537
O
O cysteine by modification with aziridine (ethyl-
H eneimine):42–45
””
)S
N 1
:
O H CH 3 H+ ”:
H +
O + H+ NH 3
””
+ N Æ
)S
:
S
N-ethylmaleimide O
H
+ O) O aziridine
O
O
O : O (10–32)
H H
H
H N Æ H
N The modified side chain is isosteric with the side chain of
CH 3 CH 3 a lysine and has become a site for tryptic digestion.46
S S
5,5¢-Dithiobis(2-nitrobenzoate)
””:
O O
”:
This reaction is an example of nucleophilic addition to an
a,b-unsaturated acyl compound. 2-Vinylpyridine40 is
selective for modification of cysteine in a similar reac- 10–2
tion. The specificity of these alkylating agents for cys-
teine depends both upon the fact that sulfur is more participates readily in disulfide interchange with a cys-
nucleophilic than either nitrogen or oxygen and upon teine (Figure 3–20). The reagent contains a disulfide
the use during the reaction of a pH just below the pKa of that is particularly electrophilic47 because the nitro-
cysteine (Table 2–2) so that modification of lysine is sup- thiobenzoate dianion is such a good leaving group
pressed (Figure 10–1). The possibility of alkylation at (pKa < 5). This causes the equilibrium to lie in favor of
other nucleophiles such as lysine41 must always be con- mixed disulfides between the cysteines on the protein
sidered. For example, the inactivation of spinach ferre- and 5-thio-2-nitrobenzoate. Unfortunately, these mixed
doxin–NADP+ reductase by N-ethylmaleimide results disulfides are also electrophilic. Consequently, the pH
from alkylation of a lysine rather than a cysteine.4 of the solution should be well-buffered to prevent fur-
Organic mercurials, such as p-chloromercuriben- ther reaction of the mixed disulfide with nucleophiles
zoate (PCMB),39 are usually specific for cysteine: such as hydroxide ion. The nitrothiobenzoate dianion
released during the disulfide interchange between cys-
teine and 5,5¢-dithiobis(2-nitrobenzoate) (10–2) is
brightly colored (e412 = 13,600 M–1 cm–1), and its
absorbance can be used to follow the reaction. The situ-
””
electron-releasing hydroxyl. A simple example of this It reacts readily with histidine as well, but at low pH his-
susceptibility is the facile iodination of tyrosine: tidine will be mainly protonated, and the imidazolium
cation is inert to electrophilic aromatic substitution.
O
Tetranitromethane is the reagent used most fre-
:
I2 + O 1
± HI quently to modify tyrosine:
H
O2N NO2
C
O
O
:
O Æ O O2N NO2
O H 10–3
I H I
The nitration that produces the o-nitrotyrosine proceeds
(10–33)
by a free radical mechanism.53 The o-nitrotyrosine pro-
Iodide ion is used as the source of the iodine, and it is oxi- duced absorbs strongly at 428 nm as the nitrophenolate
dized to I2, IOH, or ICl either chemically48 or enzymati- anion. It can be reduced to o-aminotyrosine with
cally.49 Histidine is also iodinated under similar dithionite.54 The hydroxyl of o-aminotyrosine has a
conditions but not so readily as tyrosine.41 A particularly uniquely low pKa (4.8), and this fact can be exploited to
advantageous method for activating I– chemically uses direct further modification to this location in the pro-
N-chlorosulfamylphenyl groups covalently incorporated tein.54 Unlike O-acylation and O-alkylation, which
into a solid phase by the N-chlorosulfamylation of poly- require the tyrosine to be anionic to react as a nucle-
styrene beads.50 The covalently attached N-chlorosul- ophile, the reaction with tetranitromethane proceeds
famylphenyl groups produce ICl from I– in a reaction with the neutral tyrosine.
identical to that performed by N-chlorobenzenesulfon- Tryptophan is susceptible to electrophilic aromatic
amide in free solution: substitution because of its similarity to aniline. For
example, tritium can be incorporated specifically into
O O tryptophan in a protein under strongly acidic condi-
O ± H+ O
I- + S 1 ICl + S tions:55
NH NH
H
:
Cl H
3
(10–34) 3 +
H 1 H
(
:
The most peculiar position in tryptophan is the When the reagent is diphenylethanedione, this dehy-
p bond between carbons 2 and 3. This bond displays the drated adduct is the final product;59 when the reagent is
properties of an olefin during bromination with mild p-nitrophenylethanedione60 or 4-(oxoacetyl)phenoxy-
brominating reagents (Equation 3–1) by participating in acetic acid, the hydrogen (R2) permits a Cannizzaro
addition rather than substitution. Under mild conditions rearrangement that produces iminoimidazolidone
a relatively inert brominating agent, 2-[(2-nitrophenyl) 10–661
sulfenyl]-3-methyl-3¢-bromoindolenine (BNPS-skatole),
oxidized the tryptophan in micrococcal nuclease to the H H(
N H
oxindole,57 presumably through an intermediate halo- H R1 H N
hydrin: N N
N O N O
(
Br 10–6 10–7
Br
H Æ H and when the reagent is cyclohexanedione, a similar
: OH
N - N rearrangement produces iminoimidazolidone 10–7.62
(H OH H The modification of arginine by vicinal diones can
halohydrin also yield products that incorporate additional mole-
cules of the dione. 2,3-Butanedione (10–4, R1 = R2 = CH3)
H under appropriate conditions self-condenses to dimers
– HBr and trimers that both react with arginine to yield poorly
Æ O
characterized, heterogeneous mixtures of products con-
:
N
H taining about 3 mol of dione for every mole of arginine.63
Phenyl glyoxal (10–4, R1 = phenyl, R2 = H) reacts with argi-
oxindole nine to produce a product containing 2 mol of dione for
(10–38) every mole of arginine.58
The earliest modifications of arginine, with either
Only methionine was oxidized at the same time, and it diphenylethanedione64 or 1,2-cyclohexanedione,62 were
could be regenerated readily by reduction. The use of performed at alkaline pH (0.2 M NaOH), and the prod-
addition reactions to the olefin in tryptophan to incorpo- ucts were quite stable. The conditions, however, were too
rate nucleophiles other than water might be feasible. The harsh to avoid destruction of the polypeptide. It was sub-
bromination, however, often results in cleavage of the sequently noted that the addition of borate during the
polypeptide at the tryptophan (Equation 3–1). reaction of a protein with 2,3-butanedione accelerated
Arginine is modified specifically by vicinal diones: the rate of the reaction at neutral pH and rendered the
modification irreversible as long as the borate was pres-
O ent.65 The initial product of the reaction of 1,2-cyclo-
O
O
10–4 O )
OH
B
OH
O
Reagents normally used are diphenylethanedione (R1 =
R2 = phenyl), p-nitrophenylethanedione (R1 = C6H4NO2; 10–8
R2 = H), 4-(oxoacetyl)phenoxyacetic acid (R1 =
C6H4OCH2COOH; R2 = H), and 1,2-cyclohexanedione (R1 The addition of borate to cause the reaction with diones
= CH2CH2CH2CH2 = R2). In all cases the initial product is to be irreversible under mild conditions has permitted
the cyclic adduct 10–57,58 that then dehydrates: the isolation of modified peptides from proteins modi-
fied by 1,2-cyclohexanedione.66
Glutamates and aspartates are modified with
H O carbodiimides (Figure 10–5).67 The carbodiimides used
N R1 N R1
H OH – H2O H can be either hydrophobic, such as dicyclohexyl-
:
N OH
Æ N OH carbodiimide (DCCD; R1 = R2 = cyclohexyl), or
N R2 N R2 hydrophilic and water-soluble, such as N-ethyl-
( (
N ¢-[3-(dimethylamino)propyl]carbodiimide [EDC; R1 =
10–5 . (10–39) C2H5, R2 = (CH3)2+NHC3H6]. The initial product of the
540 Chemical Probes of Structure
OO OO ON R2 ON R2
O
K aD,E K aCDI
1 1
O
: H C C
O O ± H+ :O )
O ( ± H+
N N
aspartate R1 H R1 O
glutamate carbodiimide
1
) OO
O
:
H+ ””O
O
1
H ( R2 R2 ( R2 R1
OO OO ON
O
O
1
N
2
N :N
K aAU 2 :
C H 1 C H 1 O C 1 OO H
:
N
O
O N ± H+ :O N NH O
O O R2
R1 R1 R1
NH
R3 H O-acylurea N-acylurea
1
) )
H ( R2 : H ( R2 OO
O
””O OO
O
O
O
”O
”:
N N
Ka
C 1 C Æ + H H
:
H H H
:
:
:
O N ± H+ O N N N N
R3 N H R3 N H O R1 R2
(H R1 O R1 R3
amide N,N'-dialkylurea
Figure 10–5: Outcomes of the reactions of carbodiimides with aspartate or glutamate. The O-acylurea is formed by the direct addition of the
carboxylate anion to the protonated carbodiimide. In a rigid, isolated, aprotic environment, the O-acylurea might be the final product, but
there are two other possible outcomes. If a nucleophile (usually an amine) has been added, or if there is an adjacent nucleophile in the pro-
tein (usually a lysine), the O-acylurea is an activated carboxylic acid derivative capable of acylating that nucleophile in an acyl exchange reac-
tion Ú to give the N,N¢-dialkylurea as the leaving group and the acyl derivative (usually the amide) of the glutamate or aspartate with either
the added nucleophile or the lysine in the protein. If there is no accessible nucleophile, the O-acylurea can rearrange, by intramolecular acyl
exchange ‚, to the N-acylurea. Pathway Ú is initiated by the attack of the extraneous nucleophile on the activated carboxyl group, and path-
way ‚ is initiated by intramolecular attack of the unprotonated urea nitrogen on the acyl carbon.
reaction is an O-acylurea67 in which the acyl carbon of protein, for example a lysine, in the vicinity of the
the original glutamate or aspartate has been activated by O-acylurea, an intramolecular adduct, for example an
forming an acyl derivative, the leaving group of which amide, between that amino acid and the glutamate or
is an excellent one because it is the oxygen of an aspartate will form (pathway Ú in Figure 10–5).68
N,N ¢-dialkylurea (pKa = 1). Fourth, if an external amine, such as the methyl ester of
Four fates await this O-acylurea. First, if it is glycine, has been added in high concentration to the
buried in a nonnucleophilic environment within the solution, it can react with the O-acylurea as it is formed,
protein and sterically constrained, it will remain as the if it is sterically accessible, and produce the amide
O-acylurea until the protein is unfolded, at which point between the external amine and the glutamate or aspar-
it will usually hydrolyze back to the unmodified gluta- tate.67 In this way, a defined covalent modification of
mate or aspartate. Second, if it is somewhat buried in a the carboxylate can be made. If the external nucleophile
polar environment, but not sterically constrained, the is ammonia, glutamates and aspartates are converted to
O-acylurea will rearrange to the N-acylurea, which is glutamines and asparagines, respectively.69
stable (pathway ‚ in Figure 10–5). Dicyclohexyl- The practical outcome of each of these four fates is
carbodiimide is often incorporated into a protein in this unique. In the first, the native protein is modified by the
way. It usually reacts with buried carboxylic acids carbodiimide at glutamate or aspartate but loses the
because it is so hydrophobic, and the buried O-acylurea modification upon unfolding. In the second, a stable
usually survives long enough to rearrange as the protein derivative between the protein and the carbodiimide is
sits around after the investigator believes the reaction formed. In the third, the glutamate or aspartate is stably
has finished; but the reaction rarely proceeds in high modified by being intramolecularly cross-linked,68 but
yield. Third, if there is a nucleophilic amino acid in the neither the carbodiimide nor an external amine is incor-
Covalent Modification 541
OO
O
± Asp
CH 3 or ( OO H
O
O
:
:
N ± Glu N H )O N
:O
H 1
± C2H 5O-
1 O 1
O O CH 3 O O CH 3 )O
OO
O O
O
CH 3
O H ( OO OO
O
O
N O N O
(
””O: 1 ”: ”O 1 +
:
) O ) O H 3C O O
ON
O O O
O
porated into the protein. In the fourth, an external amine In this intermediate enol ester, the acyl carbon of the
but not the carbodiimide is incorporated. Often, regard- aspartyl or glutamyl side chain is activated sufficiently to
less of the intentions of the investigator, a combination react readily with nucleophiles elsewhere on the protein
of all of these outcomes occurs, and the complex mixture or with nucleophiles such as amines that have been pur-
of products that results defies any attempt at quantifica- posely added to the solution, as does the O-acylurea that
tion. is the intermediate in the reactions of carbodiimides
Another reagent used to activate the carboxylates of (Figure 10–5). The acyl group that has been activated
glutamates and aspartates is N-(ethoxycarbonyl)- with N-ethyl-5-phenylisoxazolium-3¢-sulfonate is also
2-ethoxy-1,2-dihydroquinoline (EEDQ). It activates the reactive enough to be reduced with borohydride ion,
carboxylate (Figure 10–6) by forming a mixed ethyl car- (BH4–) to convert the side chain of the aspartate or gluta-
bonic anhydride (pKaLG = 7),70 which is an acylating agent mate to the respective alcohol,74 just as an ester can be
(Figure 10–4) that is somewhat less reactive than an reduced to the corresponding alcohol by AlH4– ion.
O-acylurea (pKaLG = 1) but capable of the same types of Compounds that serve as precursors to nitrenes or
intramolecular or intermolecular reactions with nucle- carbenes through photolytic reactions are reagents that
ophiles such as a lysine on the same protein or another display even less specificity than alkylating agents in the
protein71 or an amine that has been added to the solution. modification of the amino acids in a protein. The fact
N-Ethyl-5-phenylisoxazolium-3¢¢-sulfonate (Wood- that they are generated photolytically permits an added
ward’s reagent K)72 activates glutamates and aspartates73 level of control over the reaction. The reagent can be
by forming an enol ester: equilibrated with the protein and then activated.
Aryl azides, such as phenyl azides or nitrophenyl
O azides, are the usual precursors for nitrenes. A nitroaryl
+ O ( C H azide produces a nitroaryl nitrene upon photolysis:
-
N 2 5
O -
O3S
O
O
hn
Æ
R N R N
:
O (N ) O
O2N N O2N N
O
+
O
O O nitrene
Æ N
C2H5 O
N
H (10–41)
2-nitrofluorobenzene. Aliphatic azides have also been ethylamino-3-hydroazepines 10–11 (90–100% yield),
observed to insert photolytically into proteins.75 respectively:80
A nitrene is a nitrogen the four valence orbitals of
which are occupied by only six valence electrons.
O
H
Therefore, it is dramatically electron-deficient and elec- N N
trophilic. In a singlet nitrene, three of these orbitals are :
O OCH 3
occupied by pairs of electrons and one orbital is vacant. Y Y
In theory, a singlet nitrene, because of its vacant orbital,
has a higher preference for insertion into nitrogen–hydro- 10–9 10–10
O
gen or oxygen–hydrogen bonds than carbon–hydrogen
bonds because atoms of oxygen or nitrogen attached to N CH 3
:
carbon are electron-rich. In a triplet nitrene, two of the N
orbitals on nitrogen are each occupied by only one Y
unpaired electron and the other two are occupied by two CH 3
pairs of electrons. Consequently, a triplet nitrene is a 10–11
diradical. Theoretically, triplet nitrenes should be able to
modify proteins by hydrogen abstraction followed by The products of the photolytic reaction of 4-methyl-
rebound of the two adjacent monoradicals.76 Because amino-3-nitrophenyl nitrene with methanol as solvent
hydrogen is usually more easily abstracted from carbon or 1% diethylamine in methanol are aniline 10–12 (40%
than from oxygen or nitrogen, triplet nitrenes should yield) and aniline 10–13 (40% yield) or aniline 10–12
abstract hydrogen more readily from carbon–hydrogen (30% yield) and aniline 10–14 (70% yield), respec-
bonds than either nitrogen–hydrogen or oxygen–hydro- tively:80
gen bonds.
When light is absorbed by an aryl azide, which itself
CH3 NO2 CH3 NO2
is a singlet, the excited state is initially a singlet excited
state that must produce a singlet nitrene because N2 is a N NH2 N
H H
singlet molecule. If the singlet excited state lasts long
enough, it can convert to a triplet excited state by inter- OCH3 NH2
system crossing. The triplet excited state produces a
10–12 10–13
triplet nitrene and singlet N2. The yield of triplet excited
state can be increased by adding a triplet sensitizer.77 CH3 NO2
Singlet nitrene itself can turn into triplet nitrene if it sur- N NH2
H
vives long enough. In the absence of a sensitizer, only
about 10% of the nitrene produced by photolysis of
N(C2H5)2
phenyl azide is triplet.78
Although it is widely believed that aryl nitrenes, such 10–14
as the phenyl nitrenes or the 3-nitro-4-(alkylamino)phenyl
nitrenes usually employed in the modification of proteins, These products can be explained as the results of
should insert into carbon–hydrogen bonds, a reaction that nucleophilic addition of solvent or solute to the two
would require significant yields of the triplet state, the electrophilic species engaged in the following equilib-
chemistry of such nitrenes belies this belief. In ideal situ- rium:
ations, such as the intramolecular insertion in the vapor
phase of an aryl nitrene into a tertiary carbon–hydrogen
bond four carbons away, a reasonable yield of the N-al- X
kylaniline (50%)79 is obtained. When, however, phenyl X
O
1 N
O
N (10–42)
nitrene is generated in cyclohexane by photolysis, no
insertion (<30%)78 into the solvent is observed, and most
of the reaction proceeds with either dimerization of the
nitrene itself or the production of aniline by two succes- Consequently, the majority of the products from the
sive hydrogen abstractions by the triplet. Phenyl nitrene reaction of an aryl nitrene with a protein should result
generated by photolysis under the same conditions in from reaction with nucleophilic functional groups.
hydroxylic solvents such as methanol or propanol inserts The identity of the amino acids modified by aryl
into those solvents in high yield (80%).78 The products of nitrenes are consistent with these general considera-
the photolytic reactions of 4-substituted phenyl nitrenes tions. The modification of rabbit glyceraldehyde-3-phos-
with water, methyl alcohol, or diethylamine as solvent are phate dehydrogenase (phosphorylating) by a
the lactams 10–9 (60–90% yield), the 2-methoxy- p-amino-m-nitrophenyl nitrene produces the sulfen-
3-hydroazepines 10–10 (40–80% yield), or the 2-di- amide of Cysteine 149 (10–15)81
Covalent Modification 543
O
which, however, could have arisen from either the triplet F 3C N F 3C O
or the singlet. The photolytic modification of rat :
hν N
phosphoenolpyruvate carboxykinase (GTP) with N Æ +
8-azidoguanosine 5¢-triphosphate produces an O N
intramolecular disulfide between two cysteines in the O
protein,82 presumably arising from the nucleophilic dis- X X
placement of the 8-aminoguanosine from the initially (10–44)
formed sulfenamide (10–16) by an adjacent cysteine. In
addition to cysteine, tyrosine and lysine have usually The carbene is generated by photolysis. Prior to the
been identified as the reactants with aryl nitrenes, but a advent of these reagents, a-diazoketones, a-diazoacetyl
leucine, two alanines, and a phenylalanine have also esters, and ethyldiazomalonyl esters were used as pre-
been reported to be modified.76,83,84 Singlet aryl nitrenes cursors of carbenes:76
can insert intramolecularly by electrophilic aromatic
substitution into phenyl rings,77 and this reaction would O
explain the modification of phenylalanine.
O
O N
Carbenes, like nitrenes, have only six valence elec-
(N hν
Æ +
trons on one atom, but they are distributed around a
) ON
:
carbon instead of nitrogen. The carbenes generally used N
O
O
to at least some of these carbenes than the respective 144 and positions 146 and 147. If the protein itself has a
tryptophan or glutamic acid into which they eventually site at which a structural metallic cation is bound, oxida-
inserted. These results demonstrate that carbenes, like tive cleavage induced by the appropriate cation can map
nitrenes, are not so promiscuous in their choices of reac- the polypeptide surrounding that site.104 One advantage
tants as they are often thought to be. of oxidative cleavage of the polypeptide is that it can be
The products of the reactions between nitrenes and readily detected by submitting complexes between
carbenes and proteins have rarely been characterized. In dodecyl sulfate and the protein and its fragments to elec-
part, this is due to the low yields encountered in most of trophoresis. Mass spectrometry of the resulting frag-
these reactions, presumably because of the tendency of ments can be used to locate the exact points of
the singlet carbene or singlet nitrene and its rearranged cleavage.103
products to insert into water,95 the inescapable solvent. Site-directed mutation105,106 produces the covalent
There are other compounds that can be photoacti- modification of a protein by converting one particular
vated to form reactive species that modify proteins. For amino acid in its sequence into another of the 20 amino
example, a 5-bromouracil was inserted in place of a acids. It is also possible to delete amino acids from the
thymine within the DNA sequence recognized by general sequence of a polypeptide or insert extra amino acids at
control protein GCN4. When the complex between the a particular location with similar techniques. A common
protein and the modified DNA was irradiated with ultra- goal of both chemical modification and site-directed
violet light (lmax = 254 nm), Alanine 238 of the protein mutation is to correlate the structure of the protein with
had been covalently modified in low yield.96 Vanadate its function. An assessment of the effects of either type of
anion binds specifically to proteins such as rabbit modification on the normal function of the protein pro-
myosin97 and isocitrate lyase from Escherichia coli.98 vides information about the role of the modified amino
Upon photolysis of these complexes, serines in the sites acid in that function. In order for either approach to
at which the vanadate had bound were photooxidized to place that side chain in a structural context, a crystallo-
products that could be converted by reduction with graphic molecular model of the protein is required to
Na[3H]BH4 to [3H]serine.97 The incorporated tritium define the location of the modified amino acid.
tagged the specific serines modified. A strategic distinction, however, exists between
Oxidative cleavage is used to modify the covalent modification with chemical reagents and cova-
polyamide backbone of a protein. In the presence of lent modification by site-directed mutation. In the
reducing agents such as ascorbate or dithiothreitol, com- former procedure, the protein of interest is modified, and
plexes between Fe2+ or Cu+ and chelators such as tetracy- the effects of the modification on the function of the pro-
cline, N,N,N ¢,N ¢-tetracarboxymethyl-1,2-diaminoethane tein are assessed before the outcome of the reaction is
(EDTA), phenanthroline, or the protein itself convert O2 defined by digesting the protein and then identifying the
or H2O2 into reactive species capable of cleaving peptide modified peptides either by mass spectrometry or by
bonds.99–101 The products of this cleavage that have been chromatographic separation and sequencing. In the
identified so far suggest that several different reactions latter procedure, any amino acid in the sequence of the
can bring it about, so no unique mechanism seems to protein can be chosen, and this particular modification is
predominate.102 When the chelated metal is attached in then performed before the results are assessed by the
some way to the protein, the cleavages that occur are effect of the modification on some property of the pro-
confined to a few peptide bonds rather than being widely tein. In the former procedure, the selectivity of the chem-
distributed along the polypeptide,103,104 an observation ical modification is determined by the accessibility and
suggesting that the reactive species responsible for the the inherent nucleophilicity of the side chains that are
cleavage either remain bound to the metallic cation or potential targets, properties controlled by the protein
cannot diffuse very far without being discharged. and not by the investigator, so the results often contain
The chelating ligand surrounding the metallic unexpected information about the relationship between
cation is usually attached purposefully to a defined loca- structure and function. In the latter, modification can be
tion on the surface of the protein to be modified. The performed at any site selected by the investigator with
activated products of the reaction cleave a peptide bond absolute specificity, but the choice of which amino acid
immediately adjacent to the place where the cation ends to modify and which of the 19 mutations to perform at
up. For example, the polypeptide in the vicinity of the site that site relies on his intuition. Because the intuition of
with which tetracycline associates on the tetracycline the investigator is usually fallible, informative experi-
repressor was identified by the oxidative cleavage of the ments using site-directed mutation usually require that a
protein produced by the complex between tetracycline crystallographic molecular model be already available to
and Fe2+.103 In this case, the major targets for cleavage assist in the choice of the site to be modified.
were the peptide bonds between positions 103 and 104 It is also possible to decide to focus one’s attention
and positions 104 and 105. Lower yields of cleavage were on the nucleophilicity of a particular side chain in the
observed between positions 55 and 56 and positions 135 sequence of a protein on the basis of the location of that
and 136 and even lower yields between positions 143 and side chain in a crystallographic molecular model, its
Covalent Modification 545
tein can be identified when its covalent modification this protein,121 a fact suggesting that these two histidines
inhibits that function. The acid dissociation constant for and the lysine participate in this function.
a particular amino acid in a protein can be determined The pKa for a particular amino acid is often esti-
from the rate of its modification as a function of the pH. mated from the rate of its reaction with an electrophile as
The accessibility of particular amino acids to the solvent a function of pH. The reaction of acetic anhydride with
can be inferred from the rates of their covalent modifica- particular lysines in a protein has been used to monitor
tion. The amino acids incorporated into a transient het- their individual acid dissociation constants and nucle-
erologous interface can be identified by changes in the ophilicities.10 Because acetic anhydride is rapidly
rates of their modification upon formation of the com- hydrolyzed, the yield of its incorporation at a set pH into
plex between the two proteins. The close proximity of a particular lysine in the protein, relative to the yield of its
two amino acids in a protein can be revealed by their incorporation into an added standard amine, after the
covalent cross-linking. reaction has reached completion provides a direct meas-
The most common use of covalent modification is urement of the relative bimolecular rate constant for its
to identify amino acids in a site on a protein for binding reaction with that particular lysine at that pH. The situa-
a specific ligand. One way this can be done is to measure tion is formally equivalent to Equation 10–21 with k2
the change in the accessibility of a particular amino acid being the rate constant for reaction of the acetic anhy-
upon the binding of the ligand to the protein. For exam- dride with the standard amine. If the absolute rate con-
ple, the greater than 4-fold decrease in the rate constant stant for the reaction between acetic anhydride and the
for the reaction between acetic anhydride and Lysine standard amine is known, the absolute bimolecular rate
501 in ovine Na+/K+-exchanging ATPase when MgATP is constant for the reaction between the lysine of interest
bound to the enzyme suggested that this lysine partici- and acetic anhydride at that pH can be calculated from
pates in the specific interactions between the protein the relative rate constant, k1(k2)–1 in Equation 10–23. The
and MgATP when it is bound.117 This suggestion was behavior of this absolute rate constant as a function of pH
later verified by a crystallographic molecular model of (Figure 10–1) provides an estimate of the pKa of the lysine.
the complex between ATP and a closely related It is also possible to measure a pKa directly. For
enzyme.118 Lysines involved in the interface forming the example, the pKa of Cysteine 25 in papain122 was deter-
complex between DNA topoisomerase of vaccinia virus mined to be 8.5 at 25 ∞C and an ionic strength of 0.5 M123
and a double helix of DNA were identified by noting sig- by following the rate of its reaction with chloroacetamide
nificant decreases in the yield of their modification by as a function of pH, the pKa of Cysteine 115 in UDP-
citraconic anhydride upon formation of the complex.119 N-acetylglucosamine 1-carboxyvinyltransferase from
In this latter experiment, advantage was taken of the Enterobacter cloacae was determined to be 8.3 by follow-
reversibility of the acylation of lysines by citraconic ing the rate of its reaction with iodoacetamide as a
anhydride. The products of the modification were function of pH,11 and the pKa of Lysine 166 in ribulose-
unfolded and modified at the unacylated lysines with bisphosphate carboxylase from Rhodospirillum rubrum
N-hydroxysuccinimide acetate (Figure 10–4), the citra- was determined to be 7.9 by following the rate of its reac-
conyl groups were removed, and the locations of the pre- tion with 2,4,6-trinitrobenzenesulfonate.35
viously citraconylated lysines were identified by The reactivity of particular amino acids in the
digesting the protein at the deprotected lysines with lysyl sequence of a protein can provide an indication of their
endopeptidase and examining the pattern of fragments accessibility to the aqueous phase. For example, a com-
produced. parison between the observed rate constant for the reac-
Another way that an amino acid in a binding site on tion of a particular lysine in a protein with acetic
a protein is identified is to incorporate a reactive func- anhydride and the rate constant calculated from its
tional group into the ligand itself. For example, it was observed pKa provides an estimate of the accessibility of
demonstrated that the amino-terminal threonine pro- that lysine to the solvent. Knowledge of the Brønsted
duced upon the normal posttranslational cleavage of coefficient b (0.48) and the absolute bimolecular rate
g-glutamyltransferase from E. coli between Glutamine constant for the free base of an unhindered lysine
390 and Threonine 391 is in the active site of the enzyme (2700 M–1 s–) of normal pKa (10.8) with acetic anhydride
because the threonine was covalently modified by at 10 ∞C10 permits the bimolecular rate constant expected
2-amino-4-(fluorophosphono)butanoic acid, an elec- for the modification of the free base of a fully accessible
trophilic mimic of the g-glutamyl group in glutathione, a lysine with a particular pKa to be calculated. For example,
substrate of the enzyme.120 the pKa of Lysine 501 in ovine Na+/K+-exchanging ATPase
Covalent modification often leads to the inactiva- was found to be 10.4, so the rate constant of the reaction
tion of a protein and identifies candidates for function- of its free base with acetic anhydride at 10 ∞C should have
ally important amino acids. The modification of Lysine been 1700 M–1 s–1.109 The fact that the rate constant was
85, Histidine 88, and Histidine 161 in bovine only 400 M–1 s–1 indicated that Lysine 501 was not fully
cytochrome b561 by diethyl pyrocarbonate leads to the exposed on the surface of the protein. In most instances,
inactivation of the fast electron transfer performed by the apparent accessibility of a particular amino acid is
Covalent Modification 547
determined by steric effects, engendered by neighboring Isoleucine 16 and between Tyrosine 146 and Alanine 149
amino acids in the folded polypeptide or a decrease or to produce a chymotrypsin occur within two such
increase in its nucleophilicity brought about by its loops.126 In the case of deoxyribonuclease I, however, a
participation in intramolecular hydrogen bonds. less easily explained endopeptidolytic cleavage of the
Tetranitromethane, which is large and quite polar folded polypeptide has been observed. Under the proper
(10–3), reacts with the un-ionized, neutral form of tyro- set of conditions, chymotrypsin cleaves deoxyribonucle-
sine. At neutral pH, all of the tyrosines in a protein should ase I completely and exclusively at the peptide bond on
be un-ionized, and their modification by tetrani- the carboxy-terminal side of Tryptophan 178.130 In the
tromethane should reflect only their accessibility.12 refined crystallographic molecular model of deoxyri-
There are eight tyrosines in human carbonate dehy- bonuclease I,131 Tryptophan 178 is found in the middle of
dratase B, and only three of them, Tyrosine 20, Tyrosine an a helix that is a rigid feature of the structure. This
88, and Tyrosine 114, react with tetranitromethane.12 a helix traverses the outer surface of the protein, but
Subsequent to this assessment, the protein was studied Tryptophan 178 is on the side of the a helix pointed
crystallographically, and in the map of electron density toward the interior and itself is inaccessible. There are,
only Tyrosine 20, Tyrosine 88, Tyrosine 114, and Tyrosine however, no more accessible sites in the protein at which
129 were found to be “located on the surface of the mol- chymotrypsin could cleave, and it may be the case that in
ecule.”124 Aspartate 194 in bovine chymotrypsinogen A solution the a helix containing Tryptophan 178 is in
could not be modified in the native protein with ethyl equilibrium with a disordered loop.
glycinate and N-ethyl-N¢-[3-(dimethylamino)propyl]car- Changes in the accessibility of amino acids on the
bodiimide even under conditions where 13 of its 15 car- surface of a protein brought about by its participation in
boxylates were modified completely.125 Subsequently it an association with another protein can be monitored
was observed that Aspartate 194 is “buried” in the inte- as changes in the yields of their covalent modification.
rior of the crystallographic molecular model of chy- For example, the yields of the reductive methylations of
motrypsinogen.126 When fructose-bisphosphate aldolase Lysines 50, 61, 68, 113, 284, and 291 with [14C]formalde-
was modified with methyl acetimidate at high concen- hyde and NaCNBH3 decreased 2–4-fold upon conversion
trations, only 20 of its 30 lysines were modified.127 The 10 of monomeric actin to helical filaments of actin.132
unmodified lysines reacted readily when the protein was Covalent modification can also introduce bulky groups
unfolded, and it could be shown that these were 10 that sterically inhibit an association between two pro-
unique lysines in the sequence of the protein, presum- teins. For example, when Histidine 40 of actin is modified
ably made unreactive by their surroundings in the folded with diethyl pyrocarbonate133 or when Lysine 61 of actin
polypeptide. is modified with fluorescein isothiocyanate,134 the modi-
Site-directed mutation can also be used to monitor fied actin is no longer able to form helical polymers. All
the accessibility of particular locations in the amino acid of these observations were used as evidence in favor of
sequence of the protein. An a helix passes across the sur- the molecular model of the helical polymer of actin
face of the crystallographic molecular model of l repres- (Figure 9–1B),135 in which all of these amino acids ended
sor.128 In this a helix Isoleucine 84 and Methionine 87 are up in the interfaces between the monomers. If the model
on the face of the a helix directed toward the interior of is correct, the formation of the interfaces in the
the protein while Tyrosine 85, Glutamate 86, Tyrosine 88, homopolymer should sterically hinder the reductive
and Glutamate 89 are on the surface of the a helix that is methylation of the lysines, and the addition of bulky
accessible to the solution. After this observation had functional groups to these histidines or these lysines on
been made crystallographically, it was shown that only the actin monomers (Equations 10–24 and 10–26) should
isoleucine at position 84 and either methionine or sterically hinder their polymerization.135
isoleucine at position 87, of all of the 20 amino acids, pro- The Fe3+ chelate 10–17
duces a functional protein, but 10–14 of the 20 amino
acids can be substituted at the other four positions and H(
still produce a functional protein.129 O NH
Another covalent modification that has been used O N
to assess the accessibility of particular amino acids in a - N O H
O
3+
folded polypeptide is endopeptidolytic cleavage. For an - Fe O
O- S
endopeptidase to cleave a peptide bond, the polypeptide O N N
at that location must be able to enter its active site. This O H
has usually been assumed to require that the susceptible
peptide bond be located on a somewhat flexible loop, on O
the outside surface of the protein, well exposed to the 10–17
solvent. In the case of chymotrypsinogen A, the
endopeptidolytic cleavages of the folded polypeptide was covalently attached at random to lysines on the sur-
that remove the amino acids between Leucine 13 and face of sigma factor rpoD isolated from DNA-directed
548 Chemical Probes of Structure
RNA polymerase of E. coli. When the normally occurring functional groups at the two ends of a bifunctional cross-
complex was formed between this modified protein and linking reagent are usually electrophiles commonly used
DNA-directed RNA polymerase, the tethered Fe3+ was in monofunctional reagents for the modification of pro-
able to catalyze cleavage at specific peptide bonds in the teins. They can be identical to each other (8–1 and 8–2),
DNA-directed RNA polymerase when ascorbate and H2O2 or they can be two electrophiles with different specifici-
were added.136 These sites of cleavage identified locations ties (8–3). They can be connected by a chain of atoms
on the surface of the RNA polymerase within or adjacent stably bonded or a chain of atoms containing a bond that
to the interface between it and sigma factor rpoD. This can be cleaved as desired (8–3) to permit later separation
interface has also been probed by footprinting. and identification of the cross-linked species.
Footprinting is the identification of those peptide Intramolecular cross-linking can be used to deter-
bonds on the surface of a protein that are protected from mine juxtapositions in a single folded polypeptide. The
random nonspecific cleavage when that protein forms a simplest example of such cross-links are naturally occur-
complex with another protein. The peptide bonds pro- ring cystines, which automatically provide evidence for
tected are assumed to be within the footprint of the other the juxtaposition of two segments of polypeptide,138 but
protein upon the surface of the protein being examined. there are unnatural, chemical methods for forming
That footprint is the portion of the surface of the protein cross-links. The bifunctional reagent 2-(p-nitrophenyl)-
being examined that falls within the heterologous inter- 3-(3-carboxy-4-nitrophenyl)thio-1-propene (10–18) can
face. DNA-Directed RNA polymerase from E. coli undergo a series of reversible addition–eliminations to
stripped of its sigma factor rpoD is cleaved at 83 different form bridges between two nucleophilic amino acids
peptide bonds on its surface when it is exposed to the (Figure 10–8), either lysines or cysteines.139 The reaction
Fe3+ chelate of N,N,N¢,N¢-tetracarboxymethyl-1,2-di- is reversible as long as the nitrophenyl group is present to
aminoethane in the presence of ascorbate and H2O2. stabilize the carbanion but can be made irreversible by
When sigma factor rpoD is reassociated with the DNA- reducing the nitro group with dithionite. Therefore, the
directed RNA polymerase, the cleavage at seven of these reagent can be permitted to step around the protein until
sites is prevented.137 It was assumed that these seven the most stable cross-link is formed, and this cross-link
peptide bonds are within the footprint of sigma factor can then be locked in by reduction. In this way, two pairs
rpoD upon DNA-directed RNA polymerase. of intramolecular cross-links on bovine pancreatic
Covalent cross-linking uses covalent modification ribonuclease, one between Lysine 7 and Lysine 37 and
to assess the proximity of particular amino acids. The the other between Lysine 31 and Lysine 41, could be
A A A
O
”:
NH NH NH
H ) – C 7H 3O4NS2–
O2N 1 O2N
:
O2N 1
””
±H +
:
S S H
NH
O
COO- COO-
10–18
B
NO2 NO2
A A A
N N reduction N
): H H H
O2N
H H
+ 1 O2N H
H
H2 N H
H
N N N
”:
B B B
Figure 10–8: Mechanism by which 2-(p-nitrophenyl)-3-(3-carboxy-4-nitrophenyl)thio-1-propene (10–18) cross-links adjacent lysines. The
olefin on the p-nitrostyrene is activated by the electron-withdrawing capacity of the nitro group and participates in a sequence of reversible,
nucleophilic addition–eliminations. Because the adduct is symmetric, two nucleophiles are cross-linked reversibly. In the first step of the
reaction, the nitrothiobenzoate (Structure 10–2) is the preferred leaving group from the asymmetric carbanion, but when the carbanion is
then formed between two lysines, either can be the leaving group and the reagent can be passed from lysine to lysine over the surface of the
protein. The nitrobenzyl carbanion is not that basic, so its protonation is reversible and this allows the cycles of addition and elimination to
proceed. When the reaction is quenched with acid and the nitro group is reduced to the amine, the aminobenzyl proton is no longer acidic
and the reagent is fixed in place.
Covalent Modification 549
formed in high yield when only 2 molar equivalents of These thiols are then oxidized to mixed disulfides to
the reagent was added initially to the protein. The b car- cross-link various lysines on the proteins. The 21 folded
bons of these two pairs of lysines are 1.3 and 1.1 nm polypeptides within the 30S subunit of the ribosome
apart, respectively, and each partner in a pair is on the have been cross-linked to each other in this way. The var-
same side of the crystallographic molecular model. ious products of the intramolecular cross-linking were
The reagent bromopyruvate is bifunctional by virtue identified by two-dimensional gel electrophoresis.
of its alkyl bromide, which is an alkylating agent, and its During the electrophoresis, the disulfides linking pairs of
carbonyl, which can form an imine with a lysine reversibly these polypeptides were reduced by disulfide inter-
that can be reduced to the permanent secondary amine change to unlink them from each other between the first
with NaCNBH3. Bromopyruvate is able to form an imine and the second dimension.
with Lysine 144 of 2-dehydro-3-deoxy-6-phosphoglu- With 2-iminothiolane, as well as with dimethyl
conate aldolase and then alkylate Glutamate 56.140,141 This suberimidate,147,148 dimethyl adipimidate,149 N,N¢-1,4-
observation established the proximity of these two amino phenylenedimaleimide,150,151 tetranitromethane,152
acids in the folded polypeptide before the crystallographic tartryldi (e-aminocaproyl azide),153 and dimethyl
molecular model became available.142 3,3¢-dithiobis(propionimidate),154 26 pairs of covalently
The participants in heterologous interfaces have cross-linked polypeptides could be unambiguously
also been probed by cross-linking. During the contrac- identified among the products from the reactions of the
tion of muscle, a complex must form between a subunit 30S subunit of the ribosome from E. coli.155 After these
of myosin in its helical polymer and actin in its helical results were reported, a crystallographic molecular model
polymer (Figure 9–1B). This complex between actin and of the 30S subunit became available.156,157 Five of the
myosin from rabbit muscle was cross-linked with cross-linked pairs involved polypeptides S1 and S21,
1-ethyl-3-[3-(dimethylamino)propyl]carbodiimide, a which were not identified in the maps of electron density
reagent that couples carboxylates to lysines on the sur- of the 30S subunit.157 Of the remaining 21 pairs, nine have
faces of proteins (Figure 10–5). The amino-terminal pep- significant portions of their folded structure touching
tide produced by cleavage of actin by hydroxylamine each other so intimately that their cross-linking would be
between Asparagine 12 and Glycine 13 and the carboxy- expected and four may be near enough to each other in
terminal peptide produced by cleavage of actin by the crystallographic molecular model to be intramolecu-
cyanogen bromide at Methionine 354 were both found to larly cross-linked by a long linker (Equation 10–46), but
be cross-linked to myosin in the covalently cross-linked three have only a few positions in their sequences close
complex.143 It was also found that amino acids within the enough to be cross-linked and five are not even near each
segment of actin between Histidine 40 and Lysine 113 other in the crystallographic molecular model. These last
were covalently attached to myosin when the complex five cross-linked products must have been the result of
between it and actin was cross-linked by N-(ethoxycar- intermolecular cross-linking, and some of the others may
bonyl)-2-ethoxy-1,2-dihydroquinoline, a reagent that be as well.
also can couple carboxylates to lysines (Figure 10–6). On It is also possible to covalently cross-link thiols that
the basis of these results, amino acids within these three have been introduced into a protein with 2-imino-
segments of actin are thought to be within the heterolo- thiolane. For example, such inserted thiols can be
gous interface it forms with myosin.144 cross-linked with 4,6-di(bromomethyl)-3,7-dimethyl-
Thiols, either in the form of cysteines in the 1,5-diazabicyclo[3.3.0]octadiene-2,7-dione (dibromobi-
sequence of the protein or introduced as covalent modi- mane):
fications, can be sites for cross-linking. It is possible to
insert cysteines at specific positions in the amino acid O O
sequence of a protein by site-directed mutation and then
N
attempt to form cystines between them.145 The actual H3C CH3
formation of a cystine demonstrates that the two cys- N
teines participating in it were adjacent to each other in
the tertiary or quaternary structure of the protein. BrH2C CH2Br
Proteins can also be modified by 2-iminothiolane to con- 10–19
vert lysines to thiols:146
This reagent makes the final cross-link strongly fluores-
H cent so that peptides containing the cross-linked lysines,
H
N
O H (H still joined together, can be purified to identify their posi-
H N
NH tions in the sequence of the protein.158
( SH
Æ In most experiments involving covalent modifica-
:
N
S H tion of a protein with an electrophilic reagent, the effect
of the modification on the normal function of that pro-
(10–46) tein is first monitored. When a reagent that has an inter-
550 Chemical Probes of Structure
esting or desirable effect is discovered, the position or mass spectrometer, it was observed that the modification
positions in the amino acid sequence of the protein at had increased the mass of the protein by the equivalent
which the modification has occurred must be identified of three ethylpyrocarbonyl groups (3 ¥ 72 Da @ 28,253 Da
in order to correlate this functional effect with the struc- – 28,033 Da). When a tryptic digest of the protein was
ture of the protein. This identification is usually made by submitted to mass spectrometry, 33 new peptides
digesting the modified protein either chemically or enzy- appeared in the spectrum,* the masses of all but two of
matically, isolating the peptides that have been modi- which could be explained as the result of modification
fied, and analyzing them by direct sequencing or by mass only at Lysine 85, Histidine 88, and Histidine 161.121
spectrometry. The precise position of the modification The advantage of identifying a site of modification
in the amino acid sequence is defined by the appearance chemically is that several different methods of analysis
of the modified amino acid itself in one of the cycles of can be applied to the isolated product; the advantages of
sequencing, by the disappearance of the amino acid nor- mass spectrometry are its rapidity and its sensitivity.
mally found at that cycle, or from the masses of the frag-
ments produced by bombardment of the vaporized Suggested Reading
peptide with helium in a tandem mass spectrometer
Aliverti, A., Gadda, G., Ronchi, S., & Zanetti, G. (1991) Identification
(Figure 3–8). of Lys116 as the target of N-ethylmaleimide inactivation of
Two inactivated products from the alkylation of ferredoxin:NADP+ oxidoreductase, Eur. J. Biochem. 198, 21–24.
bovine pancreatic ribonuclease with iodoacetate could Inoue, M., Hiratake, J., Suzuki, H., Kumagai, H., & Sakata, K. (2000)
be separated from each other in their native state before Identification of the catalytic nucleophile of Escherichia coli
they were digested. Each had incorporated one car- g-glutamyltranspeptidase by a g-monofluorophosphono deriva-
boxymethyl group. From one of the products, a tryptic tive of glutamic acid: N-terminal Thr-391 in small subunit is the
peptide (Histidine 105 to Valine 124) containing 1-car- nucleophile, Biochemistry 39, 7764–7771.
boxymethylhistidine at position 119 was isolated; from Buechler, J.A., & Taylor, S.S. (1989) Dicyclohexylcarbodiimide
the other product, a tryptic peptide (Glutamine 11 to cross-links the side chains of two conserved amino acids, Asp-
Lysine 31) containing 3-carboxymethylhistidine at posi- 184 and Lys-72, at the active site of the catalytic subunit of
c-AMP-dependent protein kinase, Biochemistry 28, 2065–2070.
tion 12 was isolated.159
In the chromatogram of the tryptic digest of ferre-
doxin–NADP+ reductase inactivated with N-ethyl[2,3-14C2] Problem 10–1: Mercaptoethanol undergoes the follow-
maleimide (Equation 10–30), there was one major ing dissociation:
radioactive peak that had the amino acid sequence
−
SVSLCVXR, comprising positions 110–117 in the sequence HOCH2CH2SH 1 HOCH2CH2S + H+
of the protein. In the sequence of the unmodified protein, pK a = 9.50
the amino acid at position X is a lysine. Because lysine was
not observed in that cycle of Edman degradation, because
trypsin failed to cleave between the lysine and the argi- Suppose the total amount of mercaptoethanol in solu-
nine as it usually does, and because amino acid analysis tion is equal to [SH]TOT, a quantity you know since you
of the peptide following its hydrolysis in acid produced a added that much. Suppose, also, that there is a chemical
peak at the position of N-succinyllysine on the chro- reaction that occurs between only the anion,
matogram (Figure 1–3), the modification could be HOCH2CH2S–, and an electrophile, X, and the rate of this
assigned to Lysine 116.4 reaction is
When g-glutamyltransferase from E. coli that had
−
been covalently modified with 2-amino-4-(fluorophos- rate = k [HOCH2CH2S ][X]
phono)butanoic acid was digested with lysyl endopepti-
dase (Figure 3–2) and the resulting peptides were As the pH changes, [HOCH2CH2S–] changes although
separated by reverse-phase adsorption chromatography, [SH]TOT is always the same.
only one previously unobserved peak of absorbance
appeared on the chromatogram. A mass spectrum of the (A) Show that rate = k { f ([H+ ])}[SH]TOT [X]
peptide responsible for that peak identified it as the pep- (B) Give an explicit equation for f([H+]).
tide TTHYSVDDK (positions 391–399 in the sequence of
the protein) into which 1 mole of the phosphonylating
agent had been incorporated. A mass spectroscopic
determination of the sequence of that peptide (Figure * Of the 31 tryptic peptides, 24 were derivatives of the tryptic pep-
3–8) identified Threonine 391 as the site of phosphonyla- tide from Threonine 83 to Arginine 111 containing Lysine 85 and
tion.120 Histidine 88, and seven were derivatives of the tryptic peptide from
Tyrosine 157 to Lysine 191 containing Histidine 161. The various
When bovine cytochrome b561 that had been modi- derivatives resulted from incomplete yields of modification, low
fied with diethyl pyrocarbonate was submitted to matrix- yields of modification at other histidines in the peptides, and
assisted-laser-desorption ionization in a time-of-flight incomplete tryptic digestion.
Covalent Modification 551
(C) Plot log [kobs(k)–1] against pH, where kobs ∫ was isolated. It corresponded to amino acids 112–123 in
k{f([H+])}. Indicate on the plot where pH = pKa. the sequence of papain: –QVQPYNQGALLY–. What is the
most likely site of alkylphosphorylation of papain?
(D) At what pH does the rate of the reaction become
zero?
(E) By what factor does the rate decrease for each Problem 10–4:
decrease in pH of 1.00 when pH < pKa? (A) Draw the structure of the peptide RDVLMKE in
the ionization state in which it would exist at
Problem 10–2: Give all of the products and a mechanism pH 1.4. Indicate all lone pairs.
for each of the following reactions.
The peptide was modified with iodoacetamide at pH 1.4,
O 40 ∞C, for 20 h. Digestion with carboxypeptidase yielded
NH 2 the full complement of glutamic acid from the resulting
R ∫ H 3C N peptide. Edman degradation yielded the full comple-
H ment of arginine. It is possible to estimate the number
O
of charges a peptide bears at a certain pH from its
( behavior on electrophoresis. This was done for the ini-
A. R
NH 3 + O C N- Æ tial peptide and the product from its reaction with
iodoacetamide.
O
charge
B. R SH + I Æ
NH 2 pH original peptide alkylated product
6.5 0 +1
O 2.1 +3 +4
H 3C
C. R ( + O Æ (B) Write a stoichiometric mechanism for the reac-
NH 3
tion that occurred between iodoacetamide and
O one of the side chains on this peptide.
(C) How would the product of the reaction with
iodoacetamide move on chromatography by
R
O cation exchange relative to the unreacted pep-
D. ( + Æ tide?
2–
HN NH H 3C OPO3
O
Problem 10–6: A peptide has the sequence SVEKCYEKP.
F. R ( + H C S Æ
NH 3 5 2 CF3 (A) How many charges does the peptide bear at
pH 1.9? At pH 5.6?
Problem 10–3: Diisopropyl fluorophosphate inhibits The peptide was reacted with trimethyloxonium tetraflu-
serine endopeptidases by specific phosphorylation of the oroborate
serine in the active site. Papain is an endopeptidase that
does not have a serine in its active site. Nevertheless, it ( BF4-
reacts with diisopropyl fluorophosphate with the result O CH
H 3C 3
that 1 mol of phosphate is bound for every mole of CH 3
enzyme but without loss of enzymatic activity.160 The
reaction between papain and the reagent was carried out in aqueous solution at pH 6.0. Three methyl groups were
with radioactive diisopropyl [32P]fluorophosphate, the covalently attached to the peptide. When the methylated
modified protein was digested with chymotrypsin, and and unmethylated peptides were examined by elec-
the segment of enzyme containing the radioactive label trophoresis, the following result was observed.
552 Chemical Probes of Structure
59. Nishimura, T., & Kitajima, K. (1979) J. Org. Chem. 44, 89. Brunner, J., Senn, H., & Richards, F.M. (1980) J. Biol.
818–824. Chem. 255, 3313–3318.
60. Soman, G., Hurst, M.O., & Graves, D.J. (1985) Int. J. 90. Hexter, C.S., & Westheimer, F.H. (1971) J. Biol. Chem.
Pept. Protein Res. 25, 517–525. 246, 3934–3938.
61. Duerksen-Hughes, P.J., Williamson, M.M., & 91. Hexter, C.S., & Westheimer, F.H. (1971) J. Biol. Chem.
Wilkinson, K.D. (1989) Biochemistry 28, 8530–8536. 246, 3928–3933.
62. Toi, K., Bynum, E., Norris, E., & Itano, H.A. (1967) J. 92. Vaughan, R.J., & Westheimer, F.H. (1979) J. Am. Chem.
Biol. Chem. 242, 1036–1043. Soc. 21, 217–218.
63. Yankeelov, J.A. (1970) Biochemistry 9, 2433–2439. 93. Brunner, J., & Richards, F.M. (1980) J. Biol. Chem. 255,
64. Itano, H.A., & Gottlieb, A.J. (1963) Biochem. Biophys. 3319–3329.
Res. Commun. 12, 405–408. 94. Ross, A.H., Radhakrishnan, R., Robson, R.J., & Khorana,
65. Riordan, J.F. (1973) Biochemistry 12, 3915–3923. H.G. (1982) J. Biol. Chem. 257, 4152–4161.
66. Patthy, L., Vaaradi, A., Thaesz, J., & Kovaacs, K. (1979) 95. Shafer, J., Baronowsky, P., Laursen, R., Finn, F., &
Eur. J. Biochem. 99, 309–313. Westheimer, F.H. (1966) J. Biol. Chem. 241, 421–427.
67. Hoare, D.G., & Koshland, D.E., Jr. (1967) J. Biol. Chem. 96. Blatter, E.E., Ebright, Y.W., & Ebright, R.H. (1992)
242, 2447–2453. Nature 359, 650–652.
68. Buechler, J.A., & Taylor, S.S. (1989) Biochemistry 28, 97. Cremo, C.R., Grammer, J.C., & Yount, R.G. (1988)
2065–2070. Biochemistry 27, 8415–8420.
69. Lewis, S.D., & Shafer, J.A. (1973) Biochim. Biophys. Acta 98. Ko, Y.H., Cremo, C.R., & McFadden, B.A. (1992) J. Biol.
303, 284–291. Chem. 267, 91–95.
70. Belleau, B., & Malek, G. (1968) J. Am. Chem. Soc. 90, 99. Hoyer, D., Cho, H., & Schultz, P.G. (1990) J. Am. Chem.
1651–1652. Soc. 112, 3249–3250.
71. Bertrand, R., Chaussepied, P., Kassab, R., Boyer, M., 100. Schepartz, A., & Cuenoud, B. (1990) J. Am. Chem. Soc.
Roustan, C., & Benyamin, Y. (1988) Biochemistry 27, 112, 3247–3249.
5728–5736. 101. Rana, T.M., & Meares, C.F. (1990) J. Am. Chem. Soc. 112,
72. Woodward, R.B., Olofson, R.A., & Mayer, H. (1961) J. 2457–2458.
Am. Chem. Soc. 83, 1010–1012. 102. Gallagher, J., Zelenko, O., Walts, A.D., & Sigman, D.S.
73. Llamas, K., Owens, M., Blakeley, R.L., & Zerner, B. (1998) Biochemistry 37, 2096–2104.
(1986) J. Am. Chem. Soc. 108, 5543–5548. 103. Ettner, N., Metzger, J.W., Lederer, T., Hulmes, J.D.,
74. Jennings, M.L., & Anderson, M.P. (1987) J. Biol. Chem. Kisker, C., Hinrichs, W., Ellestad, G.A., & Hillen, W.
262, 1691–1697. (1995) Biochemistry 34, 22–31.
75. Rajasekharan, R., Marians, R.C., Shockey, J.M., & 104. Ue, K., Muhlrad, A., Edmonds, C.G., Bivin, D., Clark, A.,
Kemp, J.D. (1993) Biochemistry 32, 12386–12391. Piechowski, W.V., & Morales, M.F. (1992) Eur. J.
76. Bayley, H., & Knowles, J.R. (1977) Methods Enzymol. 46, Biochem. 203, 493–498.
69–114. 105. Hutchison, C.A., III, Phillips, S., Edgell, M.H., Gillam, S.,
77. Iddon, B., Meth-Cohn, O., Scriven, E.F.V., Suschitzky, Jahnke, P., & Smith, M. (1978) J. Biol. Chem. 253,
H.J., & Gallagher, P.T. (1979) Angew. Chem., Int. Ed. 6551–6560.
Engl. 18, 900–917. 106. Zoller, M.J., & Smith, M. (1982) Nucleic Acids Res. 10,
78. Reiser, A., & Leyshon, L.J. (1971) J. Am. Chem. Soc. 93, 6487–6500.
4051–4052. 107. Dwyer, B.P. (1988) Biochemistry 27, 5586–5592.
79. Abramovitch, R.A., & Davis, B.A. (1964) Chem. Rev. 64, 108. Dwyer, B.P. (1991) Biochemistry 30, 4105–4112.
149–185. 109. Xu, K.Y. (1989) Biochemistry 28, 6894–6899.
80. Nielsen, P.E., & Buchardt, O. (1982) Photochem. 110. Erickson, H.K. (2000) Biochemistry 39, 9241–9250.
Photobiol. 35, 317–323. 111. Walter, G., Scheidtmann, K.H., Carbone, A., Laudano,
81. Chen, S., Lee, T.D., Legesse, K., & Shively, J.E. (1986) A.P., & Doolittle, R.F. (1980) Proc. Natl. Acad. Sci. U.S.A.
Biochemistry 25, 5391–5395. 77, 5197–5200.
82. Lewis, C.T., Haley, B.E., & Carlson, G.M. (1989) 112. Wilchek, M., Bocchini, V., Becker, M., & Givol, D. (1971)
Biochemistry 28, 9248–9255. Biochemistry 10, 2828–2834.
83. Garin, J., Boulay, F., Issartel, J.P., Lunardi, J., & Vignais, 113. Kyte, J., Xu, K.Y., & Bayer, R. (1987) Biochemistry 26,
P.V. (1986) Biochemistry 25, 4431–4437. 8350–8360.
84. Richards, F.F., Lifter, J., Hew, C.L., Yoshioka, M., & 114. Thibault, D. (1993) Biochemistry 32, 2813–2821.
Konigsberg, W.H. (1974) Biochemistry 13, 3572– 115. Castellino, F.J., & Hill, R.L. (1970) J. Biol. Chem. 245,
3575. 417–424.
85. Buron, C., Gornitzka, H., Romanenko, V.V., & Bertrand, 116. Parente, A., Merrifield, B., Geraci, G., & D’Alessio, G.
G. (2000) Science 288, 834–836. (1985) Biochemistry 24, 1098–1104.
86. Hirai, K., & Tomioka, H. (1999) J. Am. Chem. Soc. 121, 117. Xu, K.Y., & Kyte, J. (1989) Biochemistry 28, 3009–3017.
10213–10214. 118. Toyoshima, C., Nakasako, M., Nomura, H., & Ogawa,
87. Westerman, J., Wirtz, K.W., Berkhout, T., van Deenen, H. (2000) Nature 405, 647–655.
L.L., Radhakrishnan, R., & Khorana, H.G. (1983) Eur. J. 119. Hanai, R., & Wang, J.C. (1994) Proc. Natl. Acad. Sci.
Biochem. 132, 441–449. U.S.A. 91, 11904–11908.
88. Morgan, S., Jackson, J.E., & Platz, M.S. (1991) J. Am. 120. Inoue, M., Hiratake, J., Suzuki, H., Kumagai, H., &
Chem. Soc. 113, 2782–2783. Sakata, K. (2000) Biochemistry 39, 7764–7771.
554 Chemical Probes of Structure
121. Tsubaki, M., Kobayashi, K., Ichise, T., Takeuchi, F., & 142. Allard, J., Grochulski, P., & Sygusch, J. (2001) Proc. Natl.
Tagawa, S. (2000) Biochemistry 39, 3276–3284. Acad. Sci. U.S.A. 98, 3679–3684.
122. Light, A., Frater, R., Kimmel, J.R., & Smith, E.L. (1964) 143. Sutoh, K. (1982) Biochemistry 21, 3654–3661.
Proc. Natl. Acad. Sci. U.S.A. 52, 1276–1283. 144. Kabsch, W., Mannherz, H.G., Suck, D., Pai, E.F., &
123. Chaiken, I.M., & Smith, E.L. (1969) J. Biol. Chem. 244, Holmes, K.C. (1990) Nature 347, 37–44.
5087–5094. 145. Cai, K., Klein-Seetharaman, J., Altenbach, C., Hubbell,
124. Kannan, K.K., Notstrand, B., Frideborg, K., Leovgren, W.L., & Khorana, H.G. (2001) Biochemistry 40,
S., Ohlsson, A., & Petef, M. (1975) Proc. Natl. Acad. Sci. 12479–12485.
U.S.A. 72, 51–55. 146. Jue, R., Lambert, J.M., Pierce, L.R., & Traut, R.R. (1978)
125. Abita, J.P., Maroux, S., Delaage, M., & Lazdunski, M. Biochemistry 17, 5399–5406.
(1969) FEBS Lett. 4, 203–206. 147. Clegg, C., & Hayes, D. (1974) Eur. J. Biochem. 42, 21–28.
126. Freer, S.T., Kraut, J., Robertus, J.D., Wright, H.T., & 148. Expert-Bezancon, A., Barritault, D., Milet, M., Guerin,
Xuong, N.H. (1970) Biochemistry 9, 1997–2009. M.F., & Hayes, D.H. (1977) J. Mol. Biol. 112, 603–629.
127. Lambert, J.M., Perham, R.N., & Coggins, J.R. (1977) 149. Lutter, L.C., Bode, U., Kurland, C.G., & Stoffler, G.
Biochem. J. 161, 63–71. (1974) Mol. Gen. Genet. 129, 167–176.
128. Pabo, C.O., & Lewis, M. (1982) Nature 298, 443–447. 150. Chang, F.N., & Flaks, J.G. (1972) J. Mol. Biol. 68,
129. Reidhaar-Olson, J.F., & Sauer, R.T. (1988) Science 241, 177–180.
53–57. 151. Lutter, L.C., Zeichhardt, H., & Kurland, C.G. (1972) Mol.
130. Hugli, T.E. (1973) J. Biol. Chem. 248, 1712–1718. Gen. Genet. 119, 357–366.
131. Oefner, C., & Suck, D. (1986) J. Mol. Biol. 192, 605–632. 152. Shih, C.Y., & Craven, G.R. (1973) J. Mol. Biol. 78,
132. Lu, R.C., & Szilagyi, L. (1981) Biochemistry 20, 651–663.
5914–5919. 153. Lutter, L.C., Kurland, C.G., & Stoffler, G. (1975) FEBS
133. Hegyi, G., Premecz, G., Sain, B., & Muhlrad, A. (1974) Lett. 54, 144–150.
Eur. J. Biochem. 44, 7–12. 154. Peretz, H., Towbin, H., & Elson, D. (1976) Eur. J.
134. Burtnick, L.D. (1984) Biochim. Biophys. Acta 791, Biochem. 63, 83–92.
57–62. 155. Sommer, A., & Traut, R.R. (1976) J. Mol. Biol. 106,
135. Holmes, K.C., Popp, D., Gebhard, W., & Kabsch, W. 995–1015.
(1990) Nature 347, 44–49. 156. Wimberly, B.T., Brodersen, D.E., Clemons, W.M., Jr.,
136. Traviglia, S.L., Datwyler, S.A., Yan, D., Ishihama, A., & Morgan-Warren, R.J., Carter, A.P., Vonrhein, C.,
Meares, C.F. (1999) Biochemistry 38, 15774–15778. Hartsch, T., & Ramakrishnan, V. (2000) Nature 407,
137. Greiner, D.P., Hughes, K.A., Gunasekera, A.H., & 327–339.
Meares, C.F. (1996) Proc. Natl. Acad. Sci. U.S.A. 93, 157. Pioletti, M., Schlunzen, F., Harms, J., Zarivach, R.,
71–75. Gluhmann, M., Avila, H., Bashan, A., Bartels, H.,
138. Spackman, D.H., Stein, W.H., & Moore, S. (1960) J. Biol. Auerbach, T., Jacobi, C., Hartsch, T., Yonath, A., &
Chem. 235, 648–659. Franceschi, F. (2001) EMBO J. 20, 1829–1839.
139. Mitra, S., & Lawton, R.G. (1979) J. Am. Chem. Soc. 101, 158. Sinz, A., & Wang, K. (2001) Biochemistry 40, 7903–7913.
3097–3110. 159. Crestfield, A.M., Stein, W.H., & Moore, S. (1963) J. Biol.
140. Meloche, H.P. (1973) J. Biol. Chem. 248, 6945–6951. Chem. 238, 2413–2419.
141. Suzuki, N., & Wood, W.A. (1980) J. Biol. Chem. 255, 160. Chaiken, I.M., & Smith, E.L. (1969) J. Biol. Chem. 244,
3427–3435. 4247–4250.
Chapter 11
Immunochemical Probes of Structure
Immunoglobulins are proteins found, among other to produce new immunoglobulins recognizing a particu-
locations, in the blood serum of birds and mammals. lar protein of interest by injecting that protein into the
Immunoglobulins are also called antibodies. In an animal. Because the repertoire of immunoglobulins that
animal, the function of an immunoglobulin is to recog- an animal is capable of producing contains none that
nize a foreign macromolecule, the antigen, by binding to would recognize any of its own proteins, or they would
it tightly. An antigen is any foreign macromolecule that be destroyed, the protein injected has to be from another
elicits, upon its introduction into an animal, the produc- species and different enough from any related indige-
tion of immunoglobulins capable of binding to it with nous protein to be recognized as foreign. Because the
high affinity. Within the animal, when an antigen has oligosaccharides found on the proteins of animals are so
been recognized by being bound to the immunoglobulin, similar and because most species contain every possible
it is usually destroyed. An important point which should sequence of these common oligosaccharides as a result
be kept in mind is that the primary biological purpose for of microheterogeneity, an immunoglobulin specific for
a particular immunoglobulin is to distinguish one partic- an oligosaccharide on the protein from an animal will
ular foreign, undesirable macromolecule from the rarely be produced. There are no such problems, how-
myriad of necessary macromolecules indigenous to the ever, in producing immunoglobulins in animals that rec-
animal. Whenever an immunoglobulin makes a mistake ognize bacterial oligosaccharides.2 Because the immune
by recognizing and binding not only to its antigen but system has evolved to recognize and destroy foreign
also to one or more of the macromolecules normally organisms such as viruses and bacteria, the surfaces of
present in the animal, these indigenous macromolecules which are large aggregates of many subunits or many dif-
are also destroyed in autoimmune processes detrimental ferent proteins, small proteins sometimes have to be
to the animal. Therefore, the immune system has evolved covalently cross-linked to make them antigenic.3 If the
to produce immunoglobulins that are highly specific in immunization is successful, immunoglobulins that bind
their recognition of molecular structure. Almost any tightly to the protein that was injected appear at high
macromolecule can serve as an antigen, but proteins are concentration in the serum of the animal within two
the most common antigens. months.
Because no predictions can be made as to what for- The paradigm of the various types of immunoglob-
eign antigens will have to be recognized and destroyed ulins found in serum is immunoglobulin G (Figures 7–13
during the life of the animal, the immune system must be and 11–1).4–6 An immunoglobulin G is composed of two
prepared to make immunoglobulins capable of binding identical heavy a polypeptides (naa = 440–450) and two
with high specificity to any foreign molecule when it is identical light b polypeptides (naa = 210–220).5 Each
presented to the animal in an antigenic form. An extreme heavy a polypeptide is folded into four internally
example of the ability of the immune system to produce repeating, superposable domains designated VH, CH1,
immunoglobulins able to bind any foreign molecule is CH2, and CH3 in the order in which they occur in the
the production of immunoglobulins that bind C60 ful- sequence of the protein. Each light b polypeptide is
lerene,1 a form of elemental carbon that is not encoun- folded into two internally repeating, superposable
tered in any natural setting and that does not resemble domains, VL and CL. Each of these six different domains,
any natural antigen. each approximately 110 amino acids in length and each
The serum from any mammal or bird contains a present in two copies in the intact immunoglobulin G, is
wide variety of immunoglobulins, each with its own dis- superposable in its folded form on each of the other five
tinct amino acid sequence and each present in its own (Figure 7–13). The VH and VL domains associate with each
distinct concentration. They are the immunoglobulins other, the CL and CH1 domains associate with each other,
that have been produced in response to all of the foreign and these associations produce an ab heterodimer of
antigens encountered by that particular individual one heavy a subunit and one light b subunit. Two of
during its peculiar lifetime. This mixture of immunoglob- these ab heterodimers are associated through their CH2
ulins is present in the serum at a total concentration of and CH3 domains to form the entire immunoglobulin.
10–20 mg mL–1. An immunoglobulin G can be cut into three pieces.
The immune system of an animal can be stimulated When the intact native protein is treated with papain,7,8
556 Immunochemical Probes of Structure
cleavage occurs to the amino-terminal side of the complicates some experiments. A bivalent analogue of
cystines connecting the two heavy a polypeptides within the Fab fragment can be produced by digesting intact
the open, structureless segments (Figure 11–1), and two immunoglobulin G with pepsin.9,10 The pepsin cleaves to
identical Fab fragments and one Fc fragment are pro- the carboxy-terminal side of the cystines between the
duced. The designation Fab arises from the fact that this two heavy a subunits and produces a fragment, (Fab¢)2,
fragment contains the site that binds the antigen. The containing two Fab fragments joined by two or more
designation Fc originally referred to the fact that this cystines. The advantage of an (Fab¢)2 fragment is that
fragment could be crystallized. It is now more informa- when it is reduced with dithiothreitol or 2-mercap-
tive and consistent to consider this the constant frag- toethanol, it dissociates into two monovalent Fab¢ frag-
ment. It is because it is constant that it can crystallize. ments the CH1 domains of which are slightly longer than
Each of these fragments is a well-behaved, independent, those of an Fab fragment. A permanently bivalent frag-
soluble, globular protein. Each contains four of the orig- ment that is the same size as an Fab fragment can be
inal 12 internally repeating, superposable domains. made by fusing the cDNAs encoding the VL and
The advantage of using Fab fragments in experi- VH domains of a particular immunoglobulin. If the two
ments is that they are univalent. Each Fab fragment con- cDNAs are connected to each other by a segment of DNA
tains only one binding site for the antigen. An intact encoding a segment of flexible polypeptide too short to
molecule of immunoglobulin G is necessarily bivalent permit the intramolecular association of the domains in
because, as an (ab)2 homodimer, it must have two iden- the expressed protein, they associate intermolecularly to
tical binding sites for antigen (Figure 11–1). The fact that form an antiparallel (VL-VH)2 homodimer with two iden-
two antigens can be bound by intact immunoglobulin G tical sites for binding antigen.11
The flexible, unsupported segments of polypeptide manufacture. All members of the colony secrete
that connect the Fab portions to the Fc portion of an immunoglobulins with identical a polypeptides and
intact immunoglobulin G (Figure 11–1) are usually about identical b polypeptides. The colony assumes its identity
20 aa long but can be as long as 70 aa.12 These are its from its pedigree and not from its situation. All of the
hinges. It is the open structure of these hinges that per- lymphocytes in the colony are descendants of the same
mits the papain or the pepsin to cleave the cell, but each member of the colony, like all other lym-
immunoglobulin G into its fragments. Because these phocytes, is dispersed by the bloodstream and lymphatic
hinges are so long, the two Fab portions in an intact system and wanders independently and at random
immunoglobulin G are constantly moving relative to through the animal as it continuously manufactures its
each other and relative to the Fc portion. When these particular immunoglobulin. The sole product of the
segments are shortened sufficiently by site-directed members of a particular colony is this one immunoglob-
mutation, the immunoglobulin becomes rigid.13 ulin continuously released into the serum and extracel-
The major immunoglobulins in the serum of a lular fluid. Each time a lymphocyte is stimulated to
mammal are immunoglobulins G (10–20 mg mL–1), divide and manufacture its particular immunoglobulin
immunoglobulins M (1 mg mL–1), and immunoglobulins A against a particular antigen, a new colony is established.
(1 mg mL–1). Each of these immunoglobulins contains The particular amino acid sequences of the two subunits
light b subunits that are indistinguishable from one type of the immunoglobulin produced by a particular colony
to the next. It is the heavy a subunits, always present in confer the ability to bind a particular antigen.
equimolar ratio to the light b subunits, that distinguish Because many (10–100) different lymphocytes are
one type of immunoglobulin14 from the other. The heavy stimulated to divide and manufacture by molecules of
a subunits of all of these immunoglobulins are homolo- the same antigen, many different colonies continuously
gous to each other over the first three domains. It is in the produce immunoglobulins after exposure of an animal
peripheral portions of their Fc segments that they differ. to a particular antigen. Each of these immunoglobulins
Immunoglobulin M has a longer heavy a polypep- has a different amino acid sequence, but all are specific
tide (naa = 570–580) than the one in immunoglobulin G for that one antigen. They differ, however, in the location
by one extra domain, Cm4, which would be the analogue on the surface of the antigen that they recognize and to
of CH4, if CH4 existed. Immunoglobulin M is a pen- which they bind and in the strength with which they bind
tameric complex of five (ab)2 heterotetramers held to that location, as reflected in their individual dissocia-
together by cystines among themselves. The cystines tion constants for the binding of antigen. Such a set of
cross-link pairs of Cm3 domains to form a pentameric ring immunoglobulins, each capable of recognizing the same
of the heterotetramers. antigen but each different from the others, is referred to
Immunoglobulin A has a heavy a polypeptide only as a polyclonal set. The product of the reaction of an
about 30 amino acids (naa = 470–480)15 longer than that of intact animal to an antigen is always a polyclonal set of
immunoglobulin G. Immunoglobulin A is a mixture of immunoglobulins, which are present as a complex mix-
monomeric (ab)2 heterotetramers similar to those of ture in the serum of the animal.
immunoglobulin G and higher oligomers of (ab)2 het- In a normal animal, the various colonies are stable
erotetramers held together by cystines between their contributors to the mixture of immunoglobulins in the
Ca3 domains.16 serum necessary to deal with antigens in the environ-
Both immunoglobulins M and A have a short ment. Occasionally, the controls maintaining the stable
polypeptide J associated with them that may promote population of the colony fail, and one lymphocyte begins
their initial oligomerization even before the interte- to multiply malignantly. This uncontrolled cancerous
trameric cystines are formed.16 Immunoglobulins M and growth causes an enormous increase in the number of
A have their binding sites for antigens in a similar loca- lymphocytes producing an immunoglobulin of just one
tion to those on immunoglobulins G and distant from the unique sequence and structure. Such a cancer is referred
regions (Cm3, Cm4, and Ca3) that account for their distinct to as a myeloma. The serum of such individuals contains
oligomeric structures. The only significant differences high concentrations of only one type of immunoglobu-
between these types of immunoglobulins and lin. Such myeloma proteins are present in sufficient
immunoglobulins G is their size and, hence, their quantities to be purified, sequenced,5 and crystallized.4
valence. Monovalent Fab fragments can be produced Myeloma proteins appear by chance as the products of
from each.14,15,17 Although the injection of an antigen the random malignant transformation of normal lym-
usually stimulates the production of immunoglobu- phocytes, and the antigens to which most myelomas are
lins G, it is not unusual for it to stimulate production of directed are unknown.
the other types, either instead of or along with immuno- The disadvantage of a myeloma protein is that the
globulins G. investigator cannot choose the antigen against which it is
An immunoglobulin of a particular sequence is pro- directed. Its advantage is that it can be purified to homo-
duced by a colony of lymphocytes, all derived from one geneity, and the purified protein is necessarily composed
single cell that was initially stimulated to divide and of identical copies of the same molecule, each with an
558 Immunochemical Probes of Structure
identical ability to recognize the antigen. The advantage meander, three from each polypeptide, heavy and light.
of the ability to select the antigen and the advantage of These loops are the complementarity-determining
the homogeneity of the product have been combined in regions of the structure. Each of these six loops is one of
the production of monoclonal immunoglobulins.18 In the connections between two of the strands of the
this procedure, lymphocytes from the spleen of a mouse antiparallel b structure that form the superstructure of
that has been immunized with the antigen of interest are the core of the respective domains (Figure 11–2).
fused with cultured murine myeloma cells that normally The amino acid sequences in the loops of these six
secrete a particular myeloma protein or, even better, complementarity-determining regions show remarkable
myeloma cells that have lost their incitement to secrete variation among the different immunoglobulins, and
an immunoglobulin. These cultured myeloma cells are they are referred to as the hypervariable regions of the
immortal cell lines that were originally derived from a sequences.23 It is this variety of amino acid sequence that
myeloma in a mouse and that continuously grow and gives the immunoglobulins as a class their ability to pro-
divide either in flasks in an incubator or as solid tumors vide individual proteins each tailored to bind a particular
in mice. Hybrids, each produced by the fusion of one antigen. The specific sequences in these loops define the
lymphocyte and one myeloma cell, are selected on the specificity of the particular immunoglobulin. Both the
basis of their ability to grow on a particular medium. The specific amino acid sequence and the lengths of these
hybrids are then reproduced in dishes as single colonies loops differ among the various immunoglobulins. The
of cells. variations in sequence, as well as the variations in length,
Because each colony in the dish arose from one cause both the structure of the polypeptide forming
single cell, the cells in a particular colony produce only these loops and the distribution of functional groups
the immunoglobulin originally secreted by the parental over the surface formed by these loops to differ dramati-
lymphocyte that fused to the myeloma cell and the cally from one immunoglobulin to the next.24 As an illus-
myeloma protein originally secreted by that myeloma tration of the opportunism of biological processes, it is
cell. Therefore, each colony is the offspring of only one also possible for a sequence that dictates the glycosyla-
lymphocyte in the mouse from which the spleen was tion of an asparagine to be found in a complementarity-
taken. If that lymphocyte happened to be producing one determining region and for the oligosaccharide attached
of the immunoglobulins directed against the antigen orig- to that asparagine to contribute favorably to the binding
inally injected into the mouse, its offspring can be culti- of the antigen.25 It is these variations in structure and
vated for the production of a homogeneous monoclonal chemical character that produce the array of potential
immunoglobulin recognizing that antigen. To identify the specificities at the site formed by the six loops. In con-
colonies secreting monoclonal immunoglobulins against trast to this wild variation, the structures of the cores of
the antigen of interest, each colony is individually the VH domain and the VL domain remain constant
screened. When a colony producing a monoclonal because the amino acid sequences forming these cores
immunoglobulin that has the desired specificity has been are well conserved.26
identified by the screen, it is expanded either in culture Each immunoglobulin that appears in response to
or in an animal so that significant amounts of that mon- an antigen possesses a binding site that is formed by the
oclonal immunoglobulin can be produced and purified. complementarity-determining regions and that binds an
Although a few intact myeloma proteins4 and intact epitope on the antigen. An epitope is the region on the
monoclonal immunoglobulins (Figure 11–1)19 have been antigenic protein that interacts directly with the binding
crystallized and submitted to crystallographic analysis, site on the immunoglobulin. An antigen can have one or
most of the crystallographic molecular models contain- many epitopes, and each epitope elicits many different
ing complexes with antigens are those of Fab frag- colonies each producing a different immunoglobulin
ments.20,21 The sites on the surface of an intact recognizing that epitope and binding to it with its own
immunoglobulin that bind tightly to the two respective particular dissociation constant. Each epitope is one of
copies of the antigen are at the far ends of the Fab arms the regions on the antigen that induced the reproduction
(Figure 11–1), at the tip of the portion of the intact mole- of the members of the colony of lymphocytes and their
cule formed from the association of a VH domain and a production of that particular immunoglobulin. Usually,
VL domain. The Fab fragment retains this site in its an epitope is one or two short sequences of amino acids
entirety, and it is on the opposite end of the fragment plus two or three other side chains in the antigen that are
from the carboxy-terminal point of cleavage that pro- all adjacent to each other on its surface and that associ-
duced the Fab fragment. An example of a complex ate specifically with the surface formed by the six com-
between an Fab fragment and its antigen is that between plementarity-determining regions. The epitope and the
lysozyme from Gallus gallus and the Fab fragment of a binding site on the immunoglobulin combine noncova-
murine monoclonal immunoglobulin (Figure 11–2).20,22 lently as if they were two faces forming a heterologous
In the complex between an immunoglobulin and interface in a heterooligomeric protein.
its antigen, the single site on the Fab fragment for bind- In the crystallographic molecular model of the
ing the antigen is formed from six loops of random complex between lysozyme and the Fab fragment
Figure 11–2: Crystallographic molecular model of lysozyme from G. gallus bound
to the Fab fragment of a murine monoclonal immunoglobulin G.20,22 A complex
was formed between lysozyme and the Fab fragment of murine monoclonal
immunoglobulin D1.3, and the complex was crystallized from 20% poly(ethylene
glycol). A crystallographic molecular model was prepared from the map of elec-
tron density. (A) Skeletal drawing of the polypeptide backbones of the complete
VH domain.28
crystallographic molecular model. The Fab fragment (lower two-thirds of the
drawing) is drawn so that the heavy a subunit (thick bonds) and light b subunit
(thin bonds) include both domains from each subunit (VH and CH1 and VL and CL,
respectively). The lysozyme (bonds of intermediate thickness) is at the top of the
structure. The two epitopic loops of random meander in lysozyme that are in con-
tact with the complementarity-determining loops of the immunoglobulin are
Aspartate 18 to Asparagine 27 and Lysine 116 to Leucine 129. (B) Detailed view of
the interface between these two loops from the lysozyme (thick bonds at the top)
and the six loops of the complementarity-determining regions, three from each
subunit of the Fab fragment (thinner bonds). Each complementarity-determining
region and the segments from each of its ends that anchor it in the two b sheets of
the respective domain of the immunoglobulin are included in the drawing. The
two ends of each of the eight segments of polypeptide represented in the figure,
two from the lysozyme (Lys), three from the heavy a subunit (H), and three from
the light b subunit (L), are labeled by their position in the respective amino acid
sequence. Side chains (thinnest bonds) within or adjacent to the interface are
included in the drawing. Hydrogen bonds are dashed lines. Glutamine 121 at the
center of the interface between the lysozyme and the Fab fragment is identified.
These drawings were produced with MolScript.103
A B
Lys27 Lys27
Lys129 Lys129
H61 H61
L57 L57
H23 H23
L22 L22
H45 H45
L7 H38 L7 H38
H6 H6
L86 L100 L43 H93 L86 L100 H93
L43
H109 H109
L38 L38
Immunochemical Probes of Structure
559
560 Immunochemical Probes of Structure
Table 11–1 Amino Acids within the Interface between a Murine Monoclonal Immunoglobulin G and Its
Antigen, Micrococcal Nucleasea
a
The crystallographic molecular model was that for the complex between the Fab fragment of murine monoclonal immunoglobulin N10 and
micrococcal nuclease from S. aureus.27 bComplementarity-determining region. Those from the heavy a polypeptide of the immunoglobulin are
designated H and those from the light b polypeptide are designated L. Numbering is from the amino terminus. cAmino acids from the indicated
loops of the complementarity-determining regions that contact the particular amino acid on the surface of micrococcal nuclease. dvan der Waals
contacts. eHydrogen bond between two neutral atoms or between one charged atom and one neutral atom. fHydrogen bond between oppositely
charged atoms. gDonor or acceptor where charge is ambiguous.
in other examples of heterologous associations.30 Usually bridge the two proteins; and in the interface between
5–10 nm2 of accessible surface area from both antigen human tissue factor and the murine Fab fragment D3h44
and immunoglobulin is buried and 5–20 hydrogen bonds there are 46 positions occupied by molecules of water,32
form across the interface (Table 11–1), with donors and 23 of which are incorporated as structural elements of
acceptors from both side chains and backbone. the uncomplexed antigen and uncomplexed Fab frag-
Normally, there are few ionized hydrogen bonds (Table ment, and 19 of these bridge the two structures.
11–1),29 but in the interface between cytochrome c and The central and most critical amino acid in the epi-
murine monoclonal immunoglobulin E8 there are five.31 tope on lysozyme recognized by murine monoclonal
Waters are also incorporated as structural elements immunoglobulin D1.3 (Figure 11–2B) is Glutamine 121,
into these interfaces. For example, in the interface within the side chain of which occupies a distinct hole among
the crystallographic molecular model of equine the six loops of the complementarity-determining
cytochrome c and the Fab fragment of murine mono- regions on the surface of the Fab fragment (Figure
clonal immunoglobulin E8,31 there are 38 positions occu- 11–2B). Each of the three amino acids lining the hole for
pied by molecules of water,32 16 of which are present at Glutamine 121 is from a different complementarity-
the same locations in crystallographic molecular models determining loop, one from the heavy a subunit and two
of either the uncomplexed antigen or uncomplexed from the light b subunit, and this places the hole in the
Fab fragment or in both, and these locations are simply very center of the binding site on the Fab fragment. If
incorporated into the interface within their respective Glutamine 121 is replaced by either a histidine or an
faces. Eight of these waters bridge the two proteins. In asparagine by site-directed mutation, the antigen is no
the interface between lysozyme from G. gallus and the longer bound by the immunoglobulin. In the normal
murine Fab fragment HyHEL-63,33 there are also 38 posi- structure of lysozyme, Glutamine 121 is fully exposed to
tions occupied by molecules of water,32 14 of which are the solvent.
present at the same locations in uncomplexed antigen It is often the case that an epitope seems to be
and uncomplexed Fab fragment, and eight of these focused on a particular amino acid on the surface of a
Immunochemical Probes of Structure 561
protein. For example, about 30–40% of the polyclonal polioviruses are highly irregular. They are furnished with
immunoglobulins raised to human cytochrome c fail to bosses at the 5-fold rotational axes of symmetry of the
bind to the cytochrome c from Macaca mulatta, which icosahedral shell. These bosses are separated from each
differs from the human protein only by the replacement other by deep depressions on the surface of the virus. The
of Isoleucine 58 by a threonine.34 These immunoglobu- epitopes on polioviruses are located mainly on the bosses
lins that fail to recognize cytochrome c from M. mulatta themselves and a few small protruding segments of
do, however, recognize cytochrome c from Macropus polypeptide.39 It is believed that the crucial regions of the
canguru that differs from the human at several other surface of rhinoviruses that allow them to produce an
locations but does contain Isoleucine 58. No upper respiratory infection are located in the depressions
immunoglobulins raised to the cytochrome c from M. mu- between the bosses.40 These locations would be inacces-
latta failed to recognize the cytochrome c from the sible to immunoglobulins. Each time the epitopes on the
human, and this result suggests that when a cytochrome c bosses or the smaller protrusions of rhinoviruses sustain
contains a threonine at position 58, as does the protein a sufficient number of mutations to escape recognition,
from M. mulatta, this region on the external surface35 is an antigenically novel but still infectious rhinovirus
not antigenic.34 The impression left by these observa- arises. The actual machinery of infection, lying as it does
tions is that Isoleucine 58 is the key amino acid in this within the depressions, would be protected from being
epitope, as is Glutamine 121 in the epitope of lysozyme. recognized by any immunoglobulin.
In the crystallographic molecular model of the This strategy depends on the fact that most antigen
complex between lysozyme and murine monoclonal binding sites are themselves flat or concave. On the
immunoglobulin D1.3, the structure of the lysozyme is murine monoclonal immunoglobulin HyHEL-5, how-
identical, within the error of the models, with its struc- ever, Tyrosine 33 and Tyrosine 53 from complementar-
ture in the absence of the immunoglobulin,20 and the ity-determining regions 1 and 2, respectively, of the
structures of both the VL and the VH domains of the VH domain form a protrusion that juts out of the almost
Fab fragment are also identical, within the errors of the flat surface that constitutes the binding site for lysozyme,
models with their structures in the uncomplexed its antigen. In the complex between this immunoglobu-
Fab fragment,36 even though the two domains have lin and its antigen, this protrusion fits into a deep groove
shifted slightly relative to each other by 0.1 nm. In this on the surface of the protein that forms the active site of
instance, formation of the complex between antigen and the enzyme.41 Consequently, immunoglobulins do not
immunoglobulin is simply the docking of two comple- do so often but they can recognize their antigens by pro-
mentary faces. In the crystallographic molecular model truding into their structure instead of surrounding a pro-
of the complex between human tissue factor and the trusion on their surface.
Fab fragment of murine monoclonal immuno- The interface between antigen and immunoglobu-
globulin D3h44, “conformational changes upon forma- lin seen in the crystallographic molecular model21 of the
tion of the complex are very small and almost exclusively neuraminidase from influenza virus and the Fab frag-
limited to the reorientation of side-chains”.32 ment from murine monoclonal immunoglobulin NC41 is
Usually, however, the conformations of both anti- formed from at least four juxtaposed strands of polypep-
gen and immunoglobulin change noticeably as the tide from distant segments of the amino acid sequence of
complex is formed.31 The relative orientations of the the neuraminidase. When amino acids in any one of
VH domain and VL domain can shift by 5–10 ∞.21 One or three of these four strands are mutated, the neu-
more of the loops of the complementarity-determining raminidase can no longer bind to the immunoglobulin.
regions can be reconfigured37 or can pivot so that their When amino acids in the fourth strand are mutated, the
tips move as much as 1.0 nm.38 Side chains on both anti- affinity of the binding is noticeably diminished. All of
gen and immunoglobulin often reorient by rotating these results indicate that the epitope on this protein for
around their carbon–carbon bonds.27,31 Flexible strands this immunoglobulin is a large region on the surface
of polypeptide on the surface of the antigen readily comprising all four of these strands. It seems an
rearrange upon formation of the complex.21 Most of inescapable conclusion that this epitope would cease to
these changes, however, are small ones, and the surface exist if the protein were to unfold in this region. Such an
formed by the six complementarity-determining regions epitope is a conformationally specific epitope.
in the uncomplexed immunoglobulin has a shape that is Many immunoglobulins, however, will recognize
already roughly complementary to the surface of the their antigens even when the antigenic protein is no
uncomplexed epitope,31 so that only a few readjustments longer in its native structure. These are sequence-spe-
are required upon formation of the complex. cific immunoglobulins. The paradigm of this class
The binding site on an immunoglobulin is usually a would be an immunoglobulin that, when covalently cou-
flat surface or a depression on the surface of a globular pled to a stationary phase, can be used to isolate by affin-
protein formed from the VH and VL domains (Figure 11–2). ity adsorption a peptide comprising its epitope from a
Viruses appear to take advantage of this feature of the site. digest of its antigen;42 such an immunoglobulin can rec-
The surface of picornaviruses such as rhinoviruses and ognize its epitope even when it is a formless peptide.
562 Immunochemical Probes of Structure
The ability of an immunoglobulin to recognize the much greater dissociation constant of the fragment
short peptides from an endopeptolytic digest of its anti- relative to the native protein. It is also possible, however,
gen43 or short synthetic peptides with sequences of that the immunoglobulins still recognize the epitope or
amino acids from its antigen44,45 is used routinely, in the portions of the epitope after it is unfolded, but with much
absence of a crystallographic molecular model of the smaller free energy of dissociation. A difference in disso-
complex between intact antigen and immunoglobulin, to ciation constant of even the magnitude observed for the
identify the epitope on the antigen recognized by the complexes between the immunoglobulins and the nucle-
immunoglobulin. In fact, when crystals of a complex ase, if the concentrations of immunoglobulin and anti-
between Fab fragment and intact antigen cannot be gen and the individual dissociation constants are in the
made, a crystallographic molecular model of a peptide appropriate ranges, would be observed as a complete
from the antigen bound within the binding site on the elimination of the ability of antigen to bind to
immunoglobulin is assumed to depict accurately at least immunoglobulin when it is in fact only a finite attenua-
a portion if not all of the structure of the interface in the tion of the ability of antigen to bind to immunoglobulin.
complex with the intact antigen.46 Because the immune system was developed to rec-
Usually the class of sequence-specific immuno- ognize and destroy foreign organisms such as viruses and
globulins is distinguished from the class of conforma- bacteria and because other systems are used by animals
tionally specific immunoglobulins. Although the to eliminate small toxic molecules, antigens are always
monoclonal immunoglobulin NC41 specific for viral large macromolecules, usually proteins, and never small
neuraminidase that was used in the crystallographic molecules. This fastidiousness of the immune system
studies seems to be an obvious member of the latter can be circumvented by covalently attaching a small
class, the evidence for such conformationally specific molecule to a carrier protein as a hapten. The attach-
immunoglobulins is often anecdotal. A protein is irre- ment is accomplished with chemical couplings analo-
versibly unfolded and loses its antigenic properties. To gous to those used for covalent modification or
unfold a polypeptide irreversibly, however, it is usually cross-linking of proteins. In fact, many substances that
either covalently modified47 or noncovalently and inter- are able to covalently modify proteins produce undesir-
molecularly polymerized by aggregation. Either the epi- able immune reactions by attaching to proteins in an
tope could be covalently modified or it could be sterically animal or the incautious investigator and turning those
sequestered within an aggregate of the protein during immunologically benign proteins into malignant anti-
such uncontrolled reactions. If either of these events gens.
occurs, it appears as if the immunoglobulin were con- A hapten, when covalently attached to the carrier
formationally specific and recognized only the native protein, protrudes from its surface even more dramati-
structure of the protein when actually it was sequence- cally than Glutamine 121 does from the surface of
specific and recognized a single linear sequence of lysozyme. Because of its peculiar chemical structure, the
amino acids that was simply covalently modified or inac- hapten is usually the focus of the immune system; and
cessible. The technical difficulty is to unfold the polypep- because it protrudes from the surface, the entire hapten
tide of the antigen to a monodisperse random polymer usually ends up occupying a deep hole or deep crevice
without doing the same thing to the immunoglobulin, among the loops of two or three complementarity-deter-
which is also a folded polypeptide. Digestion of the anti- mining regions.17,49,50 These facts explain why the unat-
gen accomplishes this goal by producing structureless tached hapten can usually be bound efficiently by the
peptides, but if the immunoglobulin fails to recognize immunoglobulin. For example, polyclonal immunoglob-
any of these peptides it could simply be the case that ulins raised against a protein the lysines of which
cleavage has occurred within the epitope. had been modified by 2,4,6-trinitrobenzenesulfonate
It is probably the case that the distinction between (Reaction 10–28) bind Ne-(2,4,6-trinitrophenyl)lysine
sequence-specific and conformationally specific with dissociation constants of 10–7 to 10–9 M.51 It is this
immunoglobulins is one of degree. For example, a poly- ability of an immunoglobulin raised against a hapten to
clonal set of immunoglobulins was raised against native bind tightly the small molecule from which the hapten
micrococcal nuclease from S. aureus (naa = 149) and puri- was derived that permits immunoglobulins to be used in
fied on the basis of its ability to recognize a fragment of highly specific assays for small molecules.52
the intact polypeptide comprising amino acids 99–149. Although immunoglobulins for use in protein
These immunoglobulins could bind the intact, folded chemistry are usually raised by injecting an intact pro-
polypeptide of micrococcal nuclease 2 ¥ 104 times more tein into an animal, it is also possible to raise
tightly, as judged from the dissociation constants, than immunoglobulins directed against synthetic peptides
they could the fragment.48 The fragment is a monodis- with the same amino acid sequence as a segment from a
perse random polymer, and it may be that the binding particular protein.53 The synthetic peptides are attached
site on the immunoglobulin recognizes only the small as haptens to another protein. For example, the amino-
portion of the random polymer that by chance has terminal amino acid sequence of the large tumor antigen
assumed the proper conformation. This would explain from simian virus is AcMDKVLNR-, where Ac is a post-
Immunochemical Probes of Structure 563
translationally added acetyl group. Immunoglobulins against an intact protein.34,55 If every molecule of protein
raised against a synthetic peptide with this sequence that in a solution can bind one immunoglobulin at a particu-
had been covalently attached to bovine serum albumin lar epitope, then the immunoglobulin is recognizing the
as a hapten were able to recognize and bind exclusively native protein. If only a small percentage of the mole-
to the large tumor antigen protein in crude homogenates cules of protein in a solution can bind an immunoglobu-
from animal cells infected with the simian virus.53 lin at that epitope, it is only the unfolded polypeptides
Immunoglobulins directed against a particular peptide that are presenting that epitope. Consequently, if every
can be purified by using a stationary phase to which the protomer of an antigen binds one molecule of an
peptide is attached as an affinity adsorbent.54,55 immunoglobulin, when only immunoglobulins directed
The difficulty inherent in the use of immunoglobu- against one epitope are present, then the immunoglobu-
lins raised against a particular amino acid sequence in a lins must be recognizing the epitope when it is in the
protein to study that protein in its native conformation is native protein. In any experiment relying on the assump-
that the investigator, a fallible judge, rather than the tion that the immunoglobulins recognize the native pro-
immune system, which is less fallible, has chosen the epi- tein, it must be demonstrated both that the
tope. A native protein does not expose many sequences immunoglobulins bind to only one unique epitope and
sufficiently for immunoglobulins to recognize them. If that every protomer of the antigen is capable of binding
the investigator has chosen the epitope, there is only a one molecule of immunoglobulin. In the absence of such
small chance that it will be accessible on the surface of a demonstration, the conclusions reached can be disre-
the antigen and recognized by an immunoglobulin in the garded.
native protein, unless the choice is the safe one of the In contrast to these difficulties encountered when
amino terminus or the carboxy terminus, which are usu- they are used to examine a native protein, immunoglob-
ally well exposed in a native protein. The segment of ulins raised against a synthetic peptide can be used to
sequence against which the immunoglobulin was raised, purify a peptide containing a particular amino acid in
however, will usually be accessible in the unfolded the sequence of a protein from an endopeptolytic digest
polypeptide. of that protein.55 The amino acid sequence surrounding
Any solution of a protein, even pristine cytoplasm, Lysine 380 of the a subunit of acetylcholine receptor is
freshly drawn plasma, or a solution of redissolved crys- –SAIEGVKYIAEHM–. The synthetic peptide KYIAE was
tals, contains some of the irreversibly unfolded polypep- coupled covalently as a hapten to bovine serum albumin,
tide of any particular protein. Immunoglobulins raised and polyclonal immunoglobulins were raised against
against synthetic peptides necessarily bind preferen- this antigen in rabbits. Because the peptide had been
tially, and often exclusively, to any unfolded protein coupled to the serum albumin through the amino groups
exposing the sequence of amino acids against which they of its lysine and of its amino terminus, the carboxy-
were raised because the original antigen itself was a terminal sequences –YIAE protruded as haptens from the
structureless peptide of that sequence. Yet most experi- surface of the serum albumin. The antiserum was passed
ments are designed with the requirement that the over a stationary phase to which the peptide KYIAE had
immunoglobulins recognize and bind to the native pro- been covalently attached, and immunoglobulins specific
tein for the conclusion to be valid. for the carboxy-terminal sequence –YIAE were adsorbed
Crystallographic molecular models of complexes by the affinity adsorbent. After all of the other proteins in
between immunoglobulins raised against synthetic pep- the serum had been washed away, the adsorbed
tides and the peptides themselves heighten these con- immunoglobulins were eluted. These purified
cerns.56 In the binding site on the immunoglobulin, the immunoglobulins in turn were covalently attached to a
peptide usually binds rigidly in a conformation that dif- stationary phase to produce an immunoadsorbent spe-
fers dramatically from the conformation that the same cific for the carboxy-terminal sequence –YIAE. When
sequence of amino acids assumes in the native protein. acetylcholine receptor was digested with glutamyl
Furthermore, it is, as might be expected, usually buried endopeptidase (Figure 3–2) and the digest was passed
in a crevice among the loops of the complementarity- over the immunoadsorbent, the peptide GVKYIAE was
determining regions. Consequently, it is difficult to adsorbed and eluted in high yield and high purity.57 The
understand how such an immunoglobulin can ever rec- covalent modification of Lysine 380 in intact, native
ognize that sequence in the native protein. acetylcholine receptor could be readily monitored by
The solution to this problem is analytical. When using this immunoadsorbent to purify rapidly the pep-
either monoclonal immunoglobulins or polyclonal tide containing it.
immunoglobulins raised against synthetic peptides are Such an immunoadsorbent can purify a peptide
used, one can assume that each folded polypeptide pos- from the digest of a large protein, which contains so
sesses only one epitope, either exposed or buried. It is many peptides that a direct purification by chromatogra-
also possible to purify by affinity adsorption a subset of phy would be difficult if not impossible. If a chemical
polyclonal immunoglobulins recognizing only one epi- modification of an amino acid in the protein within the
tope from a set of polyclonal immunoglobulins raised sequence included in the peptide occurs in low yield so
564 Immunochemical Probes of Structure
that the modified peptide is only a minor component of precipitation to occur and a visible disc of precipitate
the digest, the immunoadsorbent will still purify it.57 In will form at this location.
fact, the targeted amino acid can be destroyed by the Immunodiffusion is a more sophisticated proce-
modification and the protein cleaved at the point of dure for permitting antigen and immunoglobulin to dif-
destruction, and the immunoadsorbent will still purify fuse into each other.61 Antigen and serum are placed in
the peptide containing the remaining fragment of that separate wells cut in a block of agar, and as they diffuse
amino acid.58 By use of such an immunoadsorbent, for outward into the agar and towards each other, there will
example, the modification of a particular amino acid in a be a line between the two wells along which the ratio of
protein can be followed kinetically59 or its accessibility to concentrations is appropriate for precipitation. Along
a particular electrophile under different circumstances this line a white immunoprecipitate will form. When the
can be monitored.60 original antigen and the same protein from a different
Immunoglobulins directed against an antigenic species or a mutant variety of the antigen are placed in
protein can effect immunoprecipitation. An immuno- two adjacent wells both the same distance from the well
precipitate is a visible, white precipitate that forms when containing immunoglobulin, the pattern of the lines of
an antigen and its immunoglobulin are present in solu- immunoprecipitate demonstrates whether or not the
tion at the proper concentrations. Each immunoglobulin related protein shares all of the epitopes present on the
has at least two binding sites for antigen (Figure 11–1). original antigen.62
Each antigen, if it is a protein, is usually polyvalent. A Complement fixation is a procedure for detecting
polyvalent antigen is one that has more than one epi- the relative concentrations of immunoprecipitates in a
tope. If the polyclonal mixture of immunoglobulins in series of samples.63 It is sensitive to concentrations of
the serum contains several monoclonal immunoglobu- immunoprecipitate much smaller than can be observed
lins directed against different epitopes on the antigen visually. The procedure can be performed on a series of
and if this mixture of immunoglobulins is mixed in the mixtures of antigen and immunoglobulin at different
proper ratio with that antigen, an immunoprecipitate ratios of concentration to obtain a direct measurement
forms containing antigen and immunoglobulin cross- of the equivalence point, which is the ratio at which com-
linked among themselves. The ratio of concentrations at plement fixation reaches its maximum level.
which maximum precipitation occurs is known as the An immunoprecipitate is held together by a com-
equivalence point. If there is only one epitope on the plex collection of interfaces formed between the binding
antigen that has elicited the immune response, no pre- sites on the tips of the Fab arms of the various
cipitate will form. An Fab fragment, because it is univa- immunoglobulins present in antiserum and their respec-
lent, cannot produce a precipitate. If a large excess of tive epitopes on the molecules of antigen. Each of the
antigen is present, each immunoglobulin has its binding individual reactions between an epitope and the binding
sites filled with antigens that are not bound to other site on the Fab arm of an intact immunoglobulin is a
immunoglobulins and no precipitate forms. If a large simple dissociation:
excess of immunoglobulin is present, each antigen is sur-
rounded by immunoglobulins, each with its other bind- k1
ing site vacant, and no precipitate forms. Such soluble
Ep + Fab 1
k
Ep·Fab
–1
complexes of excess antigen and excess immunoglobulin (11–1)
are present under all circumstances, and it is never pos-
sible to precipitate directly all of the antigen or all of the where Fab is the binding site, Ep is the epitope, and
immunoglobulins even at equivalence. Finally, the molar Ep·Fab is the immune complex (Figure 11–2). The
ratio between antigenic protein and immunoglobulin in immune complexes between epitopes on antigens and
a precipitate gathered at equivalence is a complicated immunoglobulins are quite strong, a fact that permits an
function of the number of epitopes, their relative affini- immunoprecipitate to form even when antigen and
ties, and the distribution of the different monoclonal immunoglobulins are present at the low concentrations
immunoglobulins in the polyclonal mixture. used for procedures such as complement fixation. More
After a protein of interest has been injected as a useful than the strength of the association, however, is
potential antigen into an animal, the appearance of the the fact that because k–1 is almost always a small rate con-
desired immunoglobulins in the serum is often detected stant, dissociation of the complex is slow. Most of the
by the ability of those immunoglobulins to form an procedures that use immunoglobulins depend on this
immunoprecipitate. The simplest way to produce the slow dissociation of the complexes between them and
proper ratio of concentrations in order to observe a pre- their antigens. The slow dissociation permits the com-
cipitate is to layer a solution of the antigen onto a sample plex between antigen and its immunoglobulin to be sep-
of the serum in a narrow tube. As antigen diffuses into arated either from an excess of the specific
the serum and immunoglobulin diffuses into the solu- immunoglobulin and other immunoglobulins and pro-
tion of antigen, there will be a point along the two gradi- teins in an antiserum or from all of the other proteins
ents at which the concentrations are those necessary for with which the antigen is mixed. For example, an
Immunochemical Probes of Structure 565
immunoprecipitate can be extensively washed without acrylamide are transferred laterally by electrophoresis
dissociating. Immunoblotting, immunostaining, and onto a membrane of nitrocellulose or poly(vinylidene
immunoadsorption also rely on this advantage of the difluoride) placed against the slab of polyacrylamide.
slow dissociation. This electrotransfer produces a blot on which the bands
One of the most important uses of immuno- of protein in the polyacrylamide have become bands of
chemistry is to identify the particular protein to which protein arrayed in the same pattern but now plastered
the antibody is directed. For example, a specific onto the membrane of polymer (Figure 11–3A).68 Then
immunoglobulin can be used to stain only its antigen the blot is soaked in a solution of the specific
among all of the proteins in a heterogeneous mixture immunoglobulin, and excess immunoglobulins are
separated by electrophoresis.64–67 First the proteins that rinsed away. The immunoglobulins that were bound by
have been separated by electrophoresis on a slab of poly- the antigen are immunostained with a second
Figure 11–3: Immunostaining of immunoblots of NADH dehydrogenase (ubiquinone) from bovine heart mitochondria. (A) NADH
Dehydrogenase (ubiquinone) was dissolved in a solution of dodecyl sulfate and submitted to electrophoresis on a slab of polyacrylamide.
Following the electrophoresis, the separated polypeptides on the gel were electrotransferred laterally onto a membrane of poly(vinylidene
difluoride) placed against the slab by applying an electric field perpendicular to the plane of the slab. The protein in each band remained at
the same position on the field and adhered tightly to the poly(vinylidene difluoride). The membrane was then stained with Coomassie bril-
liant blue.68 The 25 bands that appeared represent only a fraction of the more than 40 polypeptides in NADH dehydrogenase (ubiquinone).
Reprinted with permission from ref 68. Copyright 1992 Elsevier B.V. (B–G) The complexes between dodecyl sulfate and the polypeptides of
NADH dehydrogenase (ubiquinone) of 704, 444, 430, 228, 217, and 75 aa were each purified from subcomplexes of the enzyme.73,74 Following
purification, each of them was injected into rabbits, and polyclonal immunoglobulins specific for each of the polypeptides were purified from
the respective antiserum71,72 by binding them to the respective polypeptide immobilized on a membrane of nitrocellulose, washing away the
other proteins, and eluting the immunoglobulins.75 Membranes of poly(vinylidene difluoride) to which electrophoretically separated
polypeptides of intact NADH dehydrogenase (ubiquinone) had been electrotransferred as in part A were then immunostained76 with puri-
fied rabbit polyclonal immunoglobulins against the polypeptides of (B) 444 aa, (C) 217 aa, (D) 75 aa, (E) 704 aa, (F) 430 aa, and (G) 228 aa,
respectively, followed by goat polyclonal immunoglobulins that were raised against rabbit immunoglobulin G and to which peroxidase had
been covalently coupled. The electrophoretic separation in lane A and those in lanes B–G were performed in different laboratories on differ-
ent polyacrylamide gels that produced different mobilities. (H–K) Intact NADH dehydrogenase (ubiquinone) with its more than 40 subunits
was cross-linked with ethylene glycol bis(succinimidyl succinate). The product of the reaction was dissolved in a solution of dodecyl sulfate.
Four identical samples of this solution were submitted to electrophoresis in separate lanes, the separated polypeptides electrotransferred to
a sheet of poly(vinylidene difluoride), and the respective lanes were immunostained as in part B–H with immunoglobulins specific for the
polypeptides of (H) 75 aa, (I) 217 aa, (J) 444 aa, and (K) 704 aa, respectively.76 The respective un-cross-linked polypeptide is the lowest band
in each lane, and covalent complexes in which the polypeptide participates are represented by the higher bands. Reprinted with permission
from ref 76. Copyright 1993 American Chemical Society.
566 Immunochemical Probes of Structure
immunoglobulin directed against the first immunoglob- bovine serum albumin so the immunoglobulins recog-
ulin, for example, ovine anti-murine immunoglobulin, to nized the carboxy-terminal sequence –EFIGA. These
which either peroxidase69 or alkaline phosphatase66 has polyclonal immunoglobulins were purified by passing
been attached. The peroxidase or alkaline phosphatase the antiserum over an affinity adsorbent composed of a
produces a colored precipitate at the location of the anti- solid phase to which the peptide had been attached cova-
gen (Figure 11–3, panels B–H). In this way, one protein lently. When a homogenate of cultured human cells was
can be picked out of a complex mixture because it is the dissolved in a solution of dodecyl sulfate and submitted
only protein that is stained. For example, a murine mon- to electrophoresis and the separated polypeptides were
oclonal immunoglobulin that had been selected for its immunoblotted, the immunoglobulin raised against the
ability to inhibit the mitochondrial dicarboxylate trans- peptide SEFIGA directed immunostaining only to the
porter from Pisum sativum immunostained only one polypeptide of the receptor for epidermal growth factor.77
polypeptide on an immunoblot of a polyacrylamide gel The solution submitted to electrophoresis contained all
on which the hundreds of polypeptides from intact mito- of the hundreds of polypeptides in the cells, and yet only
chondria dissolved in a solution of dodecyl sulfate had the receptor for epidermal growth factor was recognized
been separated electrophoretically.70 That polypeptide by the polyclonal immunoglobulins.
was assumed to be the transporter. Immunoglobulins can also be used to isolate a par-
Immunostaining of immunoblots is also used ticular protein by immunoadsorption. An immuno-
to identify which polypeptides are components of a adsorbent is a stationary phase to which an
particular covalent complex produced by cross-linking. immunoglobulin specific for a particular antigen has
NADH Dehydrogenase (ubiquinone) is a large protein been covalently bound and with which that antigen can
composed of more than 40 different polypeptides (Figure be purified by affinity adsorption.78,79 For example,
11–3A).68 Immunoglobulins that specifically recognize although the protein had not been purified, the amino
the polypeptides of 75, 217, 228, 430, 444, and 704 aa acid sequence of Shaker S4 K+ channel from Drosophila
were produced in rabbits71–74 and purified by binding melanogaster had been determined genetically. The pro-
them to their antigens and then eluting them.75 On tein was expressed in Sf9 insect cells following their
immunoblots of polyacrylamide gels on which the com- infection with baculovirus into the DNA of which com-
plete protein with its more than 40 polypeptides had plementary DNA encoding the protein had been
been separated (Figure 11–3A), each of the immunoglob- inserted. The expressed protein could then be purified80
ulins directed immunostaining only to the polypeptide on an immunoadsorbent to which had been covalently
against which it was raised (Figure 11–3, lanes B–G).76 attached immunoglobulins raised against the synthetic
The complete protein was then cross-linked with peptide EEEDTLNLPKAPVSPQDKS, an amino acid
ethylene glycol bis(succinimidyl succinate), and the sequence from a region of the protein (amino acids
polypeptides that were separated by electrophoresis 333–351) thought to be an exposed loop connecting two
were again immunoblotted and immunostained with the a helices.
immunoglobulins specific for the polypeptides of 75, The gene encoding dystrophin, the protein that is
217, 444, and 704 aa (Figure 11–3, lanes H–K). Covalent missing in muscles of patients suffering from Duchenne/
complexes between the polypeptides of 217 and 75 aa; Becker muscular dystrophy, had been identified before
444 and 75 aa; 444 and 217 aa; 704 and 444 aa; 704, 444, dystrophin itself was known to exist.81 An immunoadsor-
and 75 aa; and 704, 444, and 217 aa could be positively bent, containing immunoglobulins raised against an
identified on the basis of their mobility and, more impor- expressed fusion protein containing a fragment of 556 aa
tantly, the fact that they were immunostained by the from the amino acid sequence of dystrophin, was used to
appropriate immunoglobulins. The most interesting purify it.82
result was the fact that none of these four polypeptides The voltage-gated chloride channel from Torpedo
had been cross-linked to any of the other polypeptides in californica had also been identified and sequenced
the complex. Because there are many polypeptides that genetically before it had been purified. It was then puri-
have mobilities similar to others in the protein, only the fied in one step of affinity adsorption by use of a station-
immunostaining could sort out the products of the cross- ary phase to which immunoglobulins recognizing the
linking reaction. hydrophilic sequence EGQQREGLEAVKVQTEDP from
One of the most effective ways to raise the protein had been coupled covalently.83
immunoglobulins that recognize a particular protein Immunoadsorption can also be accomplished by
with high specificity is to use as a hapten a synthetic pep- adding an excess of specific immunoglobulins to a solu-
tide with the sequence of the amino terminus or the tion to saturate all of the antigen, passing the solution
carboxy terminus of that protein. Polyclonal immuno- over agarose to which protein A from S. aureus, a protein
globulins were raised against the synthetic peptide with a high affinity for immunoglobulin G, has been
SEFIGA, the carboxy-terminal sequence of the human covalently attached,84 and eluting the adsorbed antigen
receptor for epidermal growth factor. The peptide had from the agarose after the other proteins have been
been attached as a hapten through its amino terminus to rinsed away.
Immunochemical Probes of Structure 567
A protein can be identified by tagging it with an immunoglobulin appears only as a vague elongated
epitope.85 A short segment of DNA encoding the amino structure. If the protein has a characteristic shape, the
acid sequence of an epitope to which an immunoglobu- location of the epitope recognized by the immunoglobu-
lin has already been raised is inserted in phase at one end lin on the surface of that shape can be identified.
of the reading frame for the protein of interest. The pro- a2-Macroglobulin is a molecule that in an electron
tein is then expressed from the complementary DNA into micrograph has the shape of a letter H; a murine mono-
which the insertion has been made, and the expressed clonal immunoglobulin specific for the domain of about
protein contains the amino acid sequence of the tag at 200 aa responsible for the binding of the a2-macroglobu-
one of its ends. The protein thus tagged can be identified lin to its receptor binds to the ends of the arms on the H.88
by immunostaining and isolated by immunoadsorption Fibrinogen is a molecule that, in an electron micrograph,
with the immunoglobulins directed against the epitope. has three globular domains arranged in a row; a murine
In this way, the protein encoded by an unidentified read- monoclonal immunoglobulin specific for the carboxy-
ing frame in a segment of genomic DNA or complemen- terminal 150 amino acids of its a polypeptide binds near
tary DNA can be identified and purified. If a protein has the central domain of the structure.89
been tagged by an epitope at one of its ends, an The multicatalytic endopeptidase complex is a
immunoblot of a digest of that protein separated by elec- cylinder composed of 14 different subunits, each present
trophoresis in a solution of dodecyl sulfate will provide a in two copies. A murine monoclonal immunoglobulin
map of the positions in the amino acid sequence at specific for one of these subunits binds at both ends of
which cleavage occurred during the digestion. The the cylinder, consistent with the existence of a 2-fold
lengths of the successive end-labeled fragments will cor- rotational axis of symmetry at the center of the cylinder
respond to the positions of cleavage in the sequence of and locating the positions in the cylinder of the two
the protein.86 copies of that subunit.90 Murine monoclonal immuno-
Immunoglobulins are also used to screen libraries globulins against several of the subunits in the multicat-
of cDNA.87 If immunoglobulins specific for a protein of alytic endopeptidase complex were always bound at two
completely unknown sequence have been made, they symmetrically displayed locations on the surface of the
can be used in such a screen to detect the complemen- cylinder, and the respective angles between those posi-
tary DNA for that protein. The complementary DNAs to tions when the cylinder was viewed along its axis could
be screened are inserted into plasmids that cause the be used to position those subunits relative to the 2-fold
proteins they encode to be expressed when they are rotational axis of symmetry that is normal to the cylin-
transfected into bacteria. The transfected bacteria are drical axis and at the middle of the cylinder.91
spread onto a field and grown into individual colonies, The most extensive application of immunoelectron
each of which consequently contains protein expressed microscopy has been an examination of the distribution
from the cDNA on its respective plasmid. The bacteria of the constituent polypeptides over the surface of the
are lysed and their proteins are immobilized on a surface two subunits of the ribosome from Escherichia coli. The
in such a way that the immobilized proteins remain in application of these procedures to the 30S subunit serves
the same location on the field and produce a replica of as an example. Although it was unknown at the time
the original pattern of the colonies. The replica is soaked these experiments were performed, the core of the 30S
with the immunoglobulins that recognized the protein of ribosomal subunit is formed from ribosomal RNA, and
interest and then washed, and those colonies that con- luckily almost all of its constituent polypeptides are dis-
tained antigen are identified by immunostaining or by tributed over its external surface and are accessible to
binding radioactive protein A from S. aureus to the immunoglobulins.
bound immunoglobulins. The bacteria in a colony iden- The 21 unique polypeptides found in the 30S sub-
tified in this way can then be replicated and the cDNA unit of the ribosome can be separated and catalogued by
sequenced. two-dimensional gel electrophoresis (Figure 11–4).92,93
Immunoelectron microscopy is used to identify They have been separated and individually purified,93
the region on the surface of a protein containing the epi- and their amino acid sequences have been determined.94
tope against which particular immunoglobulins are Polyclonal sets of immunoglobulins have been raised
directed. Most proteins, including immunoglobulins, are against most of these polypeptides. When immunoglob-
large enough to be observed in the electron microscope ulins specific for one of them were mixed with intact 30S
when embedded in a glass of negative stain. When the ribosomal subunits and the immune complexes were
embedded complex between a protein and an then prepared for electron microscopy, individual
immunoglobulin or the Fab fragment of an immunoglobulins bound to individual 30S ribosomal
immunoglobulin is observed on an electron micrograph, subunits or cross-linking two 30S ribosomal subunits
the immunoglobulin or Fab fragment appears as a pro- could be observed (Figure 11–5).95–98 The 30S ribosomal
trusion on the surface of the protein that it recognizes. subunit has a characteristic, asymmetric shape and the
Often the Fab arms and the Fc trunk of an immunoglob- epitopes recognized by these immunoglobulins could be
ulin can be distinguished; usually, however, an assigned to certain regions on the surface of that shape.
568 Immunochemical Probes of Structure
Suggested Reading
Amit, A.G., Mariuzza, R.A., Phillips, S.E.V., & Poljak, R.J. (1986)
Three-dimensional structure of an antigen–antibody complex
at 2.8-Å resolution, Science 233, 747–753.
Pons, F., Augier, N., Heilig, R., Leger, J., Mornet, D., & Leger, J.J.
(1990) Isolated dystrophin molecules as seen by electron
microscopy, Proc. Natl. Acad. Sci. U.S.A. 87, 7851–7855.
Yamaguchi, M., & Hatefi, Y. (1993) Mitochondrial
NADH:ubiquinone oxidoreductase (complex I): proximity of the
subunits of the flavoprotein and the iron-sulfur protein sub-
complexes, Biochemistry 32, 1935–1939.
S13 Figure 11–5: Immunoelectron microscopy of the 30S subunit of the ribo-
some. A drawing of the crystallographic molecular model of the 30S sub-
S9 unit from Thermus thermophilus is at the top.95 The ribosomal RNA and
unhighlighted polypeptides are drawn in a space-filling representation in
S10 which a sphere was placed at each a carbon of the proteins and at each
S11 phosphorus atom of the nucleic acid. The backbones of the polypeptides
of those subunits that were used as antigens in the various experiments are
drawn in skeletal representation and identified by the standard number-
S6 ing. Subunit S13 is located on the surface of the 30S subunit just over the
top of the view presented. This drawing was produced with MolScript.103
The electron micrographs are of immune complexes between polyclonal
immunoglobulins G and 30S ribosomal subunits from E. coli. Purified
30S subunits were mixed with the polyclonal immunoglobulins raised
against the particular purified polypeptide: S9, S10, S13, S11, S6, or S8. The
S8 complexes that formed were adsorbed to a layer of carbon on a grid for
microscopy, negatively stained with uranyl acetate, and observed in the
electron microscope. The immunoglobulins G are Y-shaped proteins
(Figure 11–1) that connect two globular 30S ribosomal subunits or bind to
just one. The 30S subunits in the micrographs can be recognized by their
characteristic shapes as illustrated by the crystallographic molecular
model. At the top in the view of the crystallographic molecular model pre-
sented is a smaller globular domain, to the left a significant protrusion, at
the bottom the larger globular domain, and to the right a deep cleft
between the upper and lower domains. The top two panels of micro-
graphs, for polypeptides S9 and S10, are results from the laboratory of
Stöffler.96 Reprinted with permission from ref 96. Copyright 1975 held by
the authors. The lower rows of micrographs, for polypeptides S13,97 S6,98
S11,97 and S8,98 are results from the laboratory of Lake. Reprinted with per-
mission from refs 97 and 98. Copyright 1975 and 1981 Elsevier B.V.
References
1. Chen, B.X., Wilson, S.R., Das, M., Coughlin, D.J., &
Erlanger, B.F. (1998) Proc. Natl. Acad. Sci. U.S.A. 95,
10809–10813.
2. Villeneuve, S., Souchon, H., Riottot, M.M., Mazie, J.C.,
Lei, P., Glaudemans, C.P., Kovac, P., Fournier, J.M., &
Alzari, P.M. (2000) Proc. Natl. Acad. Sci. U.S.A. 97,
8433–8438.
3. Margoliash, E., Nisonoff, A., & Reichlin, M. (1970)
J. Biol. Chem. 245, 931–939.
4. Silverton, E.W., Navia, M.A., & Davies, D.R. (1977) Proc.
Natl. Acad. Sci. U.S.A. 74, 5140–5144.
5. Edelman, G.M., Cunningham, B.A., Gall, W.E., Gottlieb,
P.D., Rutishauser, U., & Waxdal, M.J. (1969) Proc. Natl.
Acad. Sci. U.S.A. 63, 78–85.
6. Harris, L.J., Larson, S.B., Hasel, K.W., & McPherson, A.
(1997) Biochemistry 36, 1581–1597.
7. Mage, M.G. (1980) Methods Enzymol. 70, 142–150.
8. Porter, R.R. (1959) Biochem. J. 73, 119–126.
9. Nisonoff, A., Wissler, F.C., Lipman, L.N., & Woernley,
D.L. (1960) Arch. Biochem. Biophys. 89, 230–244.
10. Masson, P.L., Cambiaso, C.L., Collet-Cassart, D.,
Magnusson, C.G., Richards, C.B., & Sindic, C.J. (1981)
Methods Enzymol. 74, (Part C), 106–139.
11. Holliger, P., Prospero, T., & Winter, G. (1993) Proc. Natl.
Acad. Sci. U.S.A. 90, 6444–6448.
12. Gregory, L., Davis, K.G., Sheth, B., Boyd, J., Jefferis, R.,
Nave, C., & Burton, D.R. (1987) Mol. Immunol. 24,
821–829.
13. Guddat, L.W., Herron, J.N., & Edmundson, A.B. (1993)
Proc. Natl. Acad. Sci. U.S.A. 90, 4271–4275.
14. Putnam, F.W., Florent, G., Paul, C., Shinoda, T., &
Shimizu, A. (1973) Science 182, 287–291.
570 Immunochemical Probes of Structure
15. Toraano, A., & Putnam, F.W. (1978) Proc. Natl. Acad. Smith-Gill, S.J., & Davies, D.R. (1989) Proc. Natl. Acad.
Sci. U.S.A. 75, 966–969. Sci. U.S.A. 86, 5938–5942.
16. Chapuis, R.M., & Koshland, M.E. (1975) Biochemistry 42. Atassi, M.Z., Habeeb, A.F., & Ando, K. (1973) Biochim.
14, 1320–1326. Biophys. Acta 303, 203–209.
17. Segal, D.M., Padlan, E.A., Cohen, G.H., Rudikoff, S., 43. Macht, M., Fiedler, W., Kurzinger, K., & Przybylski, M.
Potter, M., & Davies, D.R. (1974) Proc. Natl. Acad. Sci. (1996) Biochemistry 35, 15633–15639.
U.S.A. 71, 4298–4302. 44. Tzartos, S.J., Kokla, A., Walgrave, S.L., & Conti-
18. Keohler, G., & Milstein, C. (1976) Eur. J. Immunol. 6, Tronconi, B.M. (1988) Proc. Natl. Acad. Sci. U.S.A. 85,
511–519. 2899–2903.
19. Harris, L.J., Skaletsky, E., & McPherson, A. (1998) J. Mol. 45. Evin, G., Galen, F.X., Carlson, W.D., Handschumacher,
Biol. 275, 861–872. M., Novotny, J., Bouhnik, J., Menard, J., Corvol, P., &
20. Amit, A.G., Mariuzza, R.A., Phillips, S.E., & Poljak, R.J. Haber, E. (1988) Biochemistry 27, 156–164.
(1986) Science 233, 747–753. 46. Dokurno, P., Bates, P.A., Band, H.A., Stewart, L.M.,
21. Colman, P.M., Laver, W.G., Varghese, J.N., Baker, A.T., Lally, J.M., Burchell, J.M., Taylor-Papadimitriou, J.,
Tulloch, P.A., Air, G.M., & Webster, R.G. (1987) Nature Snary, D., Sternberg, M.J., & Freemont, P.S. (1998) J.
326, 358–363. Mol. Biol. 284, 713–728.
22. Fischmann, T.O., Bentley, G.A., Bhat, T.N., Boulot, G., 47. Ahern, T.J., & Klibanov, A.M. (1985) Science 228,
Mariuzza, R.A., Phillips, S.E., Tello, D., & Poljak, R.J. 1280–1284.
(1991) J. Biol. Chem. 266, 12915–12920. 48. Sachs, D.H., Schechter, A.N., Eastlake, A., & Anfinsen,
23. Wu, T.T., & Kabat, E.A. (1970) J. Exp. Med. 132, 211– C.B. (1972) Proc. Natl. Acad. Sci. U.S.A. 69, 3790–3794.
250. 49. Arevalo, J.H., Hassig, C.A., Stura, E.A., Sims, M.J.,
24. Chothia, C., Lesk, A.M., Tramontano, A., Levitt, M., Taussig, M.J., & Wilson, I.A. (1994) J. Mol. Biol. 241,
Smith-Gill, S.J., Air, G., Sheriff, S., Padlan, E.A., Davies, 663–690.
D., Tulip, W.R., et al. (1989) Nature 342, 877–883. 50. Mizutani, R., Miura, K., Nakayama, T., Shimada, I.,
25. Wright, A., Tao, M.H., Kabat, E.A., & Morrison, S.L. Arata, Y., & Satow, Y. (1995) J. Mol. Biol. 254, 208–222.
(1991) EMBO J. 10, 2717–2723. 51. Barisas, B.G., Singer, S.J., & Sturtevant, J.M. (1972)
26. Chothia, C., Boswell, D.R., & Lesk, A.M. (1988) EMBO J. Biochemistry 11, 2741–2744.
7, 3745–3755. 52. Smith, T.W., Butler, V.P., Jr., & Haber, E. (1970)
27. Bossart-Whitaker, P., Chang, C.Y., Novotny, J., Biochemistry 9, 331–337.
Benjamin, D.C., & Sheriff, S. (1995) J. Mol. Biol. 253, 53. Walter, G., Scheidtmann, K.H., Carbone, A., Laudano,
559–575. A.P., & Doolittle, R.F. (1980) Proc. Natl. Acad. Sci. U.S.A.
28. Desmyter, A., Transue, T.R., Ghahroudi, M.A., Thi, 77, 5197–5200.
M.H., Poortmans, F., Hamers, R., Muyldermans, S., & 54. Kyte, J., Xu, K.Y., & Bayer, R. (1987) Biochemistry 26,
Wyns, L. (1996) Nat. Struct. Biol. 3, 803–811. 8350–8360.
29. Davies, D.R., & Cohen, G.H. (1996) Proc. Natl. Acad. Sci. 55. Wilchek, M., Bocchini, V., Becker, M., & Givol, D. (1971)
U.S.A. 93, 7–12. Biochemistry 10, 2828–2834.
30. Lo Conte, L., Chothia, C., & Janin, J. (1999) J. Mol. Biol. 56. Shoham, M. (1993) J. Mol. Biol. 232, 1169–1175.
285, 2177–2198. 57. Dwyer, B.P. (1988) Biochemistry 27, 5586–5592.
31. Mylvaganam, S.E., Paterson, Y., & Getzoff, E.D. (1998) 58. van der Donk, W.A., Zeng, C., Biemann, K., Stubbe, J.,
J. Mol. Biol. 281, 301–322. Hanlon, A., & Kyte, J. (1996) Biochemistry 35,
32. Faelber, K., Kirchhofer, D., Presta, L., Kelley, R.F., & 10058–10067.
Muller, Y.A. (2001) J. Mol. Biol. 313, 83–97. 59. Erickson, H.K. (2001) Biochemistry 40, 9631–9637.
33. Li, Y., Li, H., Smith-Gill, S.J., & Mariuzza, R.A. (2000) 60. Dwyer, B.P. (1991) Biochemistry 30, 4105–4112.
Biochemistry 39, 6296–6309. 61. Ouchterlony, O. (1958) Prog. Allergy 5, 1–78.
34. Nisonoff, A., Reichlin, M., & Margoliash, E. (1970) J. 62. Izumi, Y., Kanzaki, H., Morita, S., Futazuka, H., &
Biol. Chem. 245, 940–946. Yamada, H. (1989) Eur. J. Biochem. 182, 333–341.
35. Takano, T., Kallai, O.B., Swanson, R., & Dickerson, R.E. 63. Levine, L., & VanVunakis, H. (1967) Methods Enzymol.
(1973) J. Biol. Chem. 248, 5234–5255. 11, 928–936.
36. Bhat, T.N., Bentley, G.A., Fischmann, T.O., Boulot, G., 64. Renart, J., Reiser, J., & Stark, G.R. (1979) Proc. Natl.
& Poljak, R.J. (1990) Nature 347, 483–485. Acad. Sci. U.S.A. 76, 3116–3120.
37. Braden, B.C., Souchon, H., Eisele, J.L., Bentley, G.A., 65. Towbin, H., Staehelin, T., & Gordon, J. (1979) Proc.
Bhat, T.N., Navaza, J., & Poljak, R.J. (1994) J. Mol. Biol. Natl. Acad. Sci. U.S.A. 76, 4350–4354.
243, 767–781. 66. Blake, M.S., Johnston, K.H., Russell-Jones, G.J., &
38. Sheriff, S., Chang, C.Y., Jeffrey, P.D., & Bajorath, J. Gotschlich, E.C. (1984) Anal. Biochem. 136, 175–
(1996) J. Mol. Biol. 259, 938–946. 179.
39. Hogle, J.M., Chow, M., & Filman, D.J. (1985) Science 67. Pluskal, M.G., Przekop, M.B., Kavonian, M.R., Vecoli,
229, 1358–1365. C., & Hicks, D.A. (1986) BioTechniques 4, 272–283.
40. Rossmann, M.G., Arnold, E., Erickson, J.W., 68. Walker, J.E., Arizmendi, J.M., Dupuis, A., Fearnley, I.M.,
Frankenberger, E.A., Griffith, J.P., Hecht, H.J., Johnson, Finel, M., Medd, S.M., Pilkington, S.J., Runswick, M.J.,
J.E., Kamer, G., Luo, M., Mosser, A.G., et al. (1985) & Skehel, J.M. (1992) J. Mol. Biol. 226, 1051–1072.
Nature 317, 145–153. 69. Domingo, A., & Marco, R. (1989) Anal. Biochem. 182,
41. Padlan, E.A., Silverton, E.W., Sheriff, S., Cohen, G.H., 176–181.
References 571
70. Vivekananda, J., Beck, C.F., & Oliver, D.J. (1988) J. Biol. 89. Veklich, Y.I., Gorkun, O.V., Medved, L.V.,
Chem. 263, 4782–4788. Nieuwenhuizen, W., & Weisel, J.W. (1993) J. Biol. Chem.
71. Han, A.L., Yagi, T., & Hatefi, Y. (1989) Arch. Biochem. 268, 13577–13585.
Biophys. 275, 166–173. 90. Kopp, F., Dahlmann, B., & Hendil, K.B. (1993) J. Mol.
72. Han, A.L., Yagi, T., & Hatefi, Y. (1988) Arch. Biochem. Biol. 229, 14–19.
Biophys. 267, 490–496. 91. Kopp, F., Hendil, K.B., Dahlmann, B., Kristensen, P.,
73. Galante, Y.M., & Hatefi, Y. (1979) Arch. Biochem. Sobek, A., & Uerkvitz, W. (1997) Proc. Natl. Acad. Sci.
Biophys. 192, 559–568. U.S.A. 94, 2939–2944.
74. Ragan, C.I., Galante, Y.M., & Hatefi, Y. (1982) 92. Kaltschmidt, E., & Wittmann, H.G. (1970) Anal.
Biochemistry 21, 2518–2524. Biochem. 36, 401–412.
75. Bisson, R., & Schiavo, G. (1986) J. Biol. Chem. 261, 93. Held, W.A., Mizushima, S., & Nomura, M. (1973) J. Biol.
4373–4376. Chem. 248, 5720–5730.
76. Yamaguchi, M., & Hatefi, Y. (1993) Biochemistry 32, 94. Wittmann, H.G., Littlechild, J.A., & Wittman-Liebold,
1935–1939. B. (1980) in Ribosomes, Structure, Function, and
77. Canals, F. (1992) Biochemistry 31, 4493–4501. Genetics (Chambliss, G., Craven, G.R., Davies, J., Davis,
78. Wofsy, L., & Burr, B. (1969) J. Immunol. 103, 380–382. K., Kahan, L., & Nomura, M., Eds.) pp 51–88, University
79. Schneider, C., Newman, R.A., Sutherland, D.R., Asser, Park Press, Baltimore, MD.
U., & Greaves, M.F. (1982) J. Biol. Chem. 257, 95. Pioletti, M., Schlunzen, F., Harms, J., Zarivach, R.,
10766–10769. Gluhmann, M., Avila, H., Bashan, A., Bartels, H.,
80. Santacruz-Toloza, L., Perozo, E., & Papazian, D.M. Auerbach, T., Jacobi, C., Hartsch, T., Yonath, A., &
(1994) Biochemistry 33, 1295–1299. Franceschi, F. (2001) EMBO J. 20, 1829–1839.
81. Hoffman, E.P., Brown, R.H., Jr., & Kunkel, L.M. (1987) 96. Tischendorf, G.W., Zeichhardt, H., & Stöffler, G. (1975)
Cell 51, 919–928. Proc. Natl. Acad. Sci. U.S.A. 72, 4820–4824.
82. Pons, F., Augier, N., Heilig, R., Leger, J., Mornet, D., & 97. Lake, J.A., & Kahan, L. (1975) J. Mol. Biol. 99, 631–644.
Leger, J.J. (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 98. Kahan, L., Winkelmann, D.A., & Lake, J.A. (1981) J. Mol.
7851–7855. Biol. 145, 193–214.
83. Middleton, R.E., Pheasant, D.J., & Miller, C. (1994) 99. Scheinman, A., Atha, T., Aguinaldo, A.M., Kahan, L.,
Biochemistry 33, 13189–13198. Shankweiler, G., & Lake, J.A. (1992) Biochimie 74,
84. Langone, J.J. (1982) J. Immunol. Methods 55, 277–296. 307–317.
85. Munro, S., & Pelham, H.R. (1984) EMBO J. 3, 3087–3093. 100. Stoffler-Meilicke, M., & Stoffler, G. (1987) Biochimie 69,
86. Hanai, R., & Wang, J.C. (1994) Proc. Natl. Acad. Sci. 1049–1064.
U.S.A. 91, 11904–11908. 101. Brodersen, D.E., Clemons, W.M., Jr., Carter, A.P.,
87. Kemp, D.J., & Cowman, A.F. (1981) Proc. Natl. Acad. Sci. Wimberly, B.T., & Ramakrishnan, V. (2002) J. Mol. Biol.
U.S.A. 78, 4520–4524. 316, 725–768.
88. Delain, E., Barray, M., Tapon-Bretaudiere, J., Pochon, 102. Winkelmann, D.A., Kahan, L., & Lake, J.A. (1982) Proc.
F., Marynen, P., Cassiman, J.J., Van den Berghe, H., & Natl. Acad. Sci. U.S.A. 79, 5184–5188.
Van Leuven, F. (1988) J. Biol. Chem. 263, 2981–2989. 103. Kraulis, P.J. (1991) J. Applied Crystallogr. 24, 946–950.
Chapter 12
Physical Measurements of Structure
Physical properties used to assess the structure of a pro- crevices, over the open surface of a molecule of protein,
tein are its standard diffusion constant, its standard sed- a large number of molecules of water are situated in loca-
imentation coefficient, its intrinsic viscosity, and the tions that are also permanently occupied, even though
angular dependence of its ability to scatter light, X-radi- constantly exchanging. The relative positions of these
ation, and neutrons, all of which respond to the shape of locations becomes less and less fixed the farther they are
the macromolecule; its absorption of light, which situated from the atoms of the molecule of protein until
responds to the environments around particular chro- a region is reached where the water is no different from
mophores, in particular the peptide bonds; its fluores- the water in an otherwise identical solution lacking the
cence or the fluorescence of chromophores covalently protein. This continuous transition between the mole-
attached to it, which can be used to measure molecular cules of water fixed to two or three donors and acceptors
shape and intramolecular dimensions; and its nuclear of hydrogen bonds on the surface of a molecule of pro-
magnetic resonance spectrum, which can be used to tein and the molecules of water in the bulk solvent is
map spatial relationships among the amino acids in the characterized by a gradual, rather than an abrupt,
native structure. These physical properties are derived decrease of attachment. Therefore, no distinct boundary
from measurements made of solutions of the protein. exists between the macromolecule and the solvent.
When a crystallographic molecular model is not avail- Nevertheless, the concept of the hydrodynamic particle1
able, such physical measurements provide the only is necessary if specific dimensions are to be extracted
structural information about a particular protein. As from physical measurements of the shapes of molecules
more and more crystallographic molecular models of protein dissolved in free solution.
become available, however, physical measurements The hydrodynamic particle is the covalent mole-
have become valuable complements to crystallography. cule of protein and any molecules of water and any
Because they are structural measurements of the protein solutes that behave during the measurement as if they
in solution, they can be used to validate a crystallo- were affixed to the molecule of protein. An affixed mole-
graphic molecular model, or in some situations to adjust cule of water would be a specific location upon the sur-
the crystallographic molecular model to correct for dif- face of the protein continuously occupied by one or
ferences between the structure of a molecule of protein another molecule of water over a period of time long
in a crystal and its structure in solution. enough that the measurement registers it as a permanent
feature.
If it is assumed that a hydrodynamic particle exists,
Shape its mass mh (in grams) will be
M prot 3M prot ˝
Vh =
NA
( 7prot + d H O v 0H O )
2 2
(12–2) f 0,h = 6 ph
4p N A
( 7prot + d H O v 0H O )
2 2
(12–6)
where 7prot is the partial specific volume of the protein in The hydrodynamic radius, R0,h, the radius of a sphere
centimeters3 gram–1 and v 0H2O is the specific volume of with the same volume as the hydrodynamic particle,
pure water in centimeters3 gram–1. must not be confused with the apparent radius, or Stokes
If other solutes j are attached to the hydrodynamic radius, of the particle, a, the radius of a sphere with the
particle, Equation 12–1 is expanded by adding a set of same standard diffusion coefficient as the particle.
terms dj, each of which is the grams of each solute j for The definition of the minimum frictional coeffi-
every gram of protein, and Equation 12–2 is expanded cient for the molecule of protein, that expected of a
by adding a set of terms dj 7j, where the 7j are the partial hydrated sphere of the same volume as the hydrated
specific volumes (centimeters3 gram–1) of the solutes j. molecule of protein, incorporates the water bound to the
An example of bound solutes for which these additional protein rather than treating the protein as if it were unhy-
terms are major features of these equations is a case in drated as was done earlier (Equation 8–39). If dH2O is 0.3 g
which the protein has bound detergents or bound (g of protein)–1, consistent with the values in Table 6–4,
lipids.2 the hydrated effective sphere should have a volume 1.4
The standard diffusion coefficient of a protein times as large as the unhydrated effective sphere (if 7prot
(centimeters2 second–1) is designated as D 020,w, where the is taken as 0.74 cm3 g–1 and 7 0H2O as 1.00 cm3 g–1), and the
superscript indicates extrapolation to a zero concentra- frictional coefficient of the hydrated effective sphere of
tion of protein and the subscripts indicate a correction to protein, f0,h, should be 1.12 times larger than the fric-
a temperature of 20 ∞C and to a solvent with the viscosity tional coefficient of the unhydrated effective sphere of
of pure water. The standard diffusion coefficient is a protein, f0,unh. This is consistent with the fact that the
measure of f, the frictional coefficient (grams second–1) smallest frictional ratios, f/f0,unh, observed for globular
of the hydrodynamic particle in water at 20 ∞C at infinite proteins are always greater than or equal to 1.1 when no
dilution: correction is made for hydration.
The relationship between the frictional ratio (f/f0)
k BT and the shape of a particle has been derived for ellip-
f = (12–3) soids of revolution, either prolate or oblate.3 The rela-
D 020,w
tionships can be presented graphically (Figure 12–1A).1
After the frictional ratio (f/f0,h) for the hydrodynamic par-
where kB is Boltzmann’s constant (1.381 ¥ 10–16 erg K–1) ticle has been calculated from the observed value of the
and T is the temperature (293.15 K). A standard diffusion frictional coefficient f (Equation 12–3) and the calculated
coefficient and a frictional coefficient are particular and value of f0,h (Equation 12–6), the apparent axial ratio,
intrinsic properties of a given protein in a given solu- a/b, of the hydrodynamic particle can be read from the
tion. graph.
The concept of the hydrodynamic particle quali- Molecules of protein are neither prolate nor oblate
fies the meaning of the diffusion coefficient presented ellipsoids of revolution, but exact solutions to the hydro-
in Chapter 1 and the frictional ratio presented in dynamic equations are available only for these shapes.
Chapter 6. By use of the equation for the frictional coef- Such an approximation, however, may provide some
ficient of a sphere, a minimum frictional coefficient for insight into the shape of a particular molecule of protein,
the hydrodynamic particle at infinite dilution can be especially when the frictional ratio differs greatly from 1.
defined as Such a result cannot be explained on the basis of an
unexpectedly high degree of hydration and states that
f 0,h ? 6 ph R 0,h (12–4) the protein of interest is peculiar in its shape. In the par-
ticular case where the molecule is thought to resemble a
where the subscript zero refers to the minimization and cylindrical rod of length L and diameter d, it has been
h is the viscosity (pascal seconds). The viscosity of pure concluded that the dimensions of that cylindrical rod can
water at 20 ∞C is 1.002 mPa s. The hydrodynamic radius, be calculated by using the frictional ratio to determine
R0,h, is defined as the radius (centimeters) of a sphere the axial ratio of an equivalent prolate ellipsoid, a/b, and
with the same volume as the hydrodynamic particle: then applying the formula
R 0,h ? ( )
3V h
4p
˝
(12–5)
L
d
= ()
3 "a
2 b
(12–7)
A C
0.6
2.5
Prolate
Prolate
Frictional ratio (f /f0)
0.5
log (f /f0)
2.0
Oblate
f a 0.47
f0 ( )
–– = k1 ––
b Oblate
1.5 0.4
f a 0.33
f0 ( )
–– = k2 ––
b
1.0 0.3
1 10 20 30 1.3 1.5 1.7 1.9 2.1
Axial ratio a /b log (a /b)
25
4.0
B D
Prolate
20
Simha factor n
Prolate 3.0
15
log n
a 1.81
10 2.0
n ( )
= k3 ––
b Oblate
Oblate
a 1.00
5 n ( )
= k4 ––
b
2.5 1.0
1 5 10 15 1.0 1.5 2.0 2.5
Axial ratio a /b log (a /b)
Figure 12–1: Graphic relationships1 between the axial ratio (a/b) of an oblate ellipsoid of revolution or a prolate ellipsoid of revolution and
either (A) the frictional ratio (f/f0) or (B) the Simha factor (n). A prolate ellipsoid of revolution is generated by rotation around the major axis
of an ellipse; an oblate ellipsoid of revolution, by rotation around the minor axis. The relationships for smaller values of the axial ratio are
given directly. For large values of the axial ratio (>10) the logarithm of the frictional ratio (C) or the logarithm of the Simha factor (D) is given
as a function of the logarithm of the axial ratio. The frictional ratio or the Simha factor, determined experimentally, can be converted into an
axial ratio with the appropriate graph. When the axial ratios are greater than 100, each of the four curves in panels C and D becomes a straight
line to infinity. As a result, values of the frictional ratios or the Simha factors that are greater than those on the graphs can still be converted
to values for axial ratios with the use of the slopes of these lines for extrapolation. The slope of the line in panel C for logarithms of the fric-
tional ratios of prolate ellipsoids is 0.47; that for oblate ellipsoids, 0.33; the slope of the line in panel D for logarithms of the Simha factors for
prolate ellipsoids is 1.81; and that for oblate ellipsoids, 1.00. Reprinted with permission from ref 1. Copyright 1961 John Wiley.
ament, is known to be a rod. The molar mass of a triple- segment of rope of collagen type I, based on the assump-
helical rope of collagen type I is 281,000 g mol–1, its par- tion that dH2O = 0.3, would be 460 nm3 (Equation 12–2),
tial specific volume1 is 0.695 cm3 g–1, and its standard which would fill a cylinder 285 nm long with a diameter
diffusion coefficient1 is 0.85 ¥ 10–7 cm2 s–1. If it is assumed of 1.35 nm. From the dimensions of the triple helix
that dH2O = 0.3 g g–1, then the frictional ratio for the hydro- (Figure 9–33) and the length of the polypeptide, a mole-
dynamic particle, f/f0,h, would be 5.3, for which the ratio cule of unpolymerized collagen type I is thought to be
of L/d would be 210 (Figure 12–1C and Equation 12–7). 300 nm long.
The volume of the hydrodynamic particle containing a It is also possible to calculate the frictional coeffi-
576 Physical Measurements of Structure
cient for a string of spherical beads4 in various geomet- increases in direct proportion to the velocity of the parti-
ric arrangements. Fibronectin in electron micrographs cle until it just balances the net centrifugal force. At that
appears as a flexible segment of rope 130 nm in length. A point, a steady state is achieved, the forces on the hydro-
string of spherical beads of that length and with a total dynamic particle are equal and opposite, the particle
volume equal to that of a molecule of fibronectin has a travels in the direction of the centrifugal force at its ter-
frictional coefficient equal to that calculated from its minal velocity, and
standard diffusion coefficient.5 Before the length of a
molecule of caldesmon had been established by electron M prot
microscopy, it was calculated that a string of spherical fu =
NA (
w 2r 1 – 7prot r 0 ) (12–10)
beads 74 nm in length with a total volume equal to that
of a molecule of the protein would have a frictional coef-
ficient equal to that observed for a molecule of the pro- This equation can be rearranged to give
tein.6 It was later shown that the length of the rope-like
molecule of caldesmon seen in electron micrographs of u (
M prot 1 – 7prot r 0 )
the protein is about 70 nm.7 A string of 12 small spherical s ? = (12–11)
2 f NA
w r
beads (r = 0.8 nm) attached to a single larger spherical
bead (r = 3 nm) produced a structure that resembles elec-
tron micrographs of vinculin and has a frictional coeffi- The term on the left, uw–2r–1, which is the observed veloc-
cient equal to that calculated from the standard diffusion ity normalized for all of the parameters of the instru-
coefficient observed for the protein.8 ment, can be directly measured, and it is referred to as
The frictional coefficient of a molecule of protein the sedimentation coefficient, s. The standard sedimen-
can also be determined by sedimentation velocity. tation coefficient (seconds) for the hydrodynamic parti-
Consider a hydrodynamic particle dissolved in aqueous cle is designated as s020,w, where superscript and
solution that is submitted to a high centrifugal force in subscript have the same meaning as before. Because sed-
the rotor of an ultracentrifuge. The centrifugal force on imentation coefficients of proteins are between 10–13 and
the particle is equal to mhw2r, where w is the angular 10–11 s, the unit 10–13 s is designated as S, the Svedberg.
velocity (radians second–1) of the rotor, mh is the mass Because it is only a function of universal constants
(grams) of the hydrodynamic particle, and r is the dis- and the properties of the molecule of protein itself, the
tance (centimeters) the particle is from the axis of the standard sedimentation coefficient is also an intrinsic
rotor. This centrifugal force is countered by the buoyant property of the protein. In particular it is, as is the stan-
force, which is equal to Vhrsolw2r, where rsol is the density dard diffusion coefficient, a direct measurement of the
(grams centimeter–3) of the solution displaced by Vh, the frictional coefficient
volume (centimeters3) of the hydrodynamic particle. The
net force on the particle is (
M prot 1 – 7prot r 0 )
f = (12–12)
s 020,w NA
(
F = w 2 r m h – V h rsol ) (12–8)
0.3 g g–1 is assumed, this would give a frictional ratio f/f0,h tides from Glycine 70 to Phenylalanine 415 that can be
of 1.22. Although the protein (Figure 9–37) does not have produced by digestion with chymotrypsin. The standard
the axial ratio suggested by the frictional ratio (a/b would sedimentation coefficient of this dimeric fragment is
be 5 were the protein an oblate ellipsoid), the value of 2.85 S, while the standard sedimentation coefficient of
1.22 probably results from both the irregular shape of the an (a2)2 dimer of this dimer is 3.85 S.15 If it is assumed
molecule and an abnormally large amount of bound that both of these oligomers, the dimer and the dimer of
water within the central cavity between the two C sub- dimers, can be represented as cylindrical rods, L/d for
units. the a2 dimer is 28 and that for the (a2)2 dimer of dimers is
It has been demonstrated crystallographically11 that 44. Consequently, the (a2)2 dimer of dimers is 1.7 times
when aspartate carbamoyltransferase binds the enzy- longer than one a2 dimer. If this approximation is realis-
matic inhibitor N-(phosphonacetyl)-L-aspartate, the pro- tic, the coiled coils of the two dimers must be staggered
tein undergoes a conformational change that alters the by about 0.7 of their length in the dimer of dimers.
disposition of its subunits significantly. A conforma- The standard diffusion coefficient and the standard
tional change in a protein is any change in its structure sedimentation coefficient provide independent deter-
brought about by a change in the solution, for example, minations of the frictional coefficient of a molecule of
the addition of an inhibitor. The net effect of this partic- protein. The force producing net flux of protein when dif-
ular conformational change in aspartate carbamoyl- fusion is measured is chemical potential, which is unre-
transferase is to move the two trimers of catalytic lated to, as well as being somewhat less concrete of a
a subunits (Figure 9–37) 1.2 nm farther apart. In the concept than, centrifugal force. Furthermore, the theo-
process, the water-filled space between the two C sub- retical derivations of the relationship between the diffu-
units widens by the same amount. This change in struc- sion coefficient and the frictional coefficient and that
ture caused by the binding of the inhibitor can be between the sedimentation coefficient and the frictional
detected as a change in the sedimentation coefficient of coefficient are entirely different. It is of interest to com-
aspartate carbamoyltransferase.9,10 This change is accu- pare (Table 12–1) the two frictional coefficients, that cal-
rately quantified by difference sedimentation analysis in culated from diffusion (fdiff) and that calculated from
which the two samples, with and without the inhibitor, sedimentation (fsed). Depending upon one’s prejudice,
are simultaneously monitored in separate cells in the the agreement between the numbers is either as one
same rotor.12 The sedimentation coefficient9,10 decreases expected or quite gratifying. The lack of any systematic
by 3.4% upon the change in structure. The diameter of the deviation verifies the assumption, first made by Einstein,
space between two trimers of catalytic a subunits of that the same frictional coefficient applies to both diffu-
aspartate carbamoyltransferase is about 8 nm,11 so the sion and sedimentation.
increase in the amount of water between them resulting The frictional ratios (fav/f0,h), where fav is the average
from a movement apart of 1.2 nm should be about of the two measurements, are close to 1 (1.1–1.2) for most
40,000 g mol–1. This change alone should increase the fric- globular proteins (Table 12–1), but even in these
tional coefficient (Equation 12–6) by 3–4%, which would instances the frictional ratios of the hydrated particles
account completely for the observed change of 3.4%. predict (Figure 12–1A) an axial ratio of greater than 3,
Both the high frictional ratio for unexpanded aspar- which is unrealistic. These values of 1.1–1.2 do not indi-
tate carbamoyltransferase and the increase in hydration cate elongation but reflect the fact that molecules of pro-
experienced upon its expansion illustrate the fact that tein, even when they are almost spherical, are not
oligomeric proteins, because of the spaces among the smooth spheres but globular macromolecules with
subunits, display greater hydration than monomeric irregular, rough surfaces. Proteins such as fibrinogen,
proteins. From results of measurements of the scattering apolipoprotein(a), and plasminogen, however, which are
of X-radiation at small angles by solutions of proteins, it known from other observations to be highly asymmetric,
has been calculated that while monomeric proteins have have much higher frictional ratios.
values of dH2O of 0.25–0.35 g g–1, oligomeric proteins have The specific examples of elongated or excessively
significantly higher values for dH2O of 0.35–0.7 g g–1.13,14 If hydrated proteins described in detail so far, collagen,
the hydration of aspartate carbamoyltransferase in its fibronectin, caldesmon, vinculin, aspartate carbamoyl-
unexpanded state were 0.6 g g–1, its frictional ratio would transferase, and desmin, are macromolecules about
be only 1.13, a value that is easily accounted for by its which enough is known that the hydrodynamic meas-
irregular shape. urements can be evaluated comprehensibly. When no
Desmin is one of the proteins that forms intermedi- other structural information is available about a protein,
ate filaments (Figures 9–35 and 9–36). The monomeric a frictional ratio around 1.1 is strong evidence that it is
unit of the polymer is an a2 dimer of two identical globular, but frictional ratios greater than 1.1 are difficult
polypeptides; those from chicken are 463 aa long. The to interpret. For example, the fact that human bifunc-
core of the dimer is a coiled coil of two a helices, one tional polynucleotide 3¢-phosphatase/5¢-kinase has a
from each polypeptide. This coiled coil is contained frictional ratio (f/f0,h; dH2O = 0.3) of 1.3019 could result
within a fragment of the dimer containing the polypep- from an unusually irregular shape, an elongated shape,
578 Physical Measurements of Structure
protein species 7a s020,wa D020,wa Mpb fsedc fdiffd f0,unhe f0,hf fav/f0,hg
(cm3 g–1) (s ¥ 1013) (cm2 s–1 ¥ 107) (g mol–1) (g s–1 ¥ 108) (g s–1 ¥ 108) (g s–1 ¥ 108) (g s–1 ¥ 108)
lysozyme chicken 0.703 1.91 11.20 14,310 3.7 3.6 2.99 3.4 1.09
alcohol horse 0.750 5.0 6.2 79,600h 6.6 6.5 5.41 6.1 1.09
dehydrogenase
catalase cow 0.730 11.30 4.10 232,800i 9.3 9.9 7.67 8.6 1.11
b-galactosidase E. coli 0.760 15.93 3.12 465,400 11.7 13.0 9.79 10.9 1.13
serum albumin human 0.735 4.64 6.0 66,470 6.3 6.7 5.06 5.7 1.15
fructose-bisphosphate rabbit 0.742 7.35 4.63 156,840 9.2 8.7 6.76 7.6 1.18
aldolase
prothrombin cow 0.700 4.85 6.24 72,600j 7.5 6.5 5.13 5.8 1.21
manganese-stabilizing spinach 0.732 2.26 7.6 26,530 5.3 5.3 3.72 4.2 1.27
proteink
plasminogen human 0.710 4.30 4.31 103,000l 11.6 9.4 5.80 6.5 1.61
apolipoprotein(a)m human 0.690 9.30 2.29 323,000n 18.0 17.7 8.40 9.5 1.88
fibrinogen human 0.725 7.63 1.98 344,000o 20.7 20.4 8.70 9.8 2.10
a
Unless otherwise noted, these values are from tables in ref 16. The entries are arranged in order of asymmetry. bFrom sequence. cfsed from Equation 12–12. dfdiff from
Equation 12–3. ef0,unh from Equation 12–6 with dH2O equal to 0. ff0,h from Equation 12–6 with dH2O equal to 0.3. gAverage of fsed and fdiff divided by f0,h. h2 Zn2+ subunit–1.
i
One heme subunit–1. j10.4 g of oligosaccharide (100 g of protein)–1. kReference 17. l17 g of oligosaccharide (100 g of protein)–1. mReference 18. n30 g of oligosaccharide (100 g
of protein)–1. o2 g of oligosaccharide (100 g of protein)–1.
or an abnormally large amount of bound water or, most where h ¢ is the viscosity of the solution containing a par-
likely, some combination of all of these factors. ticular concentration of the protein and h is the viscosity
Measurement of the viscosity of a solution of a of an otherwise identical solution lacking the protein.
protein also provides an evaluation of the shape of the The specific viscosity is a positive number because h ¢ is
hydrodynamic particle. When a fluid flows through a always greater than h. The specific viscosity is the nor-
cylindrical capillary under the appropriate circum- malized incremental increase in the viscosity caused by
stances, laminar flow occurs. The fluid immediately the protein.
adjacent to the walls of the capillary is stationary, and If the flow through the capillary is driven only by the
the fluid at the center of the capillary has the highest weight of the fluid, the specific viscosity is readily meas-
rate of flow. Each cylindrical lamina between the center ured because
and the wall moves with an intermediate velocity that,
as the distance from the center increases, monotoni- h¢ t ¢r ¢
= (12–14)
cally decreases to a value of zero at the wall. Laminar h tr
flow requires that each cylindrical lamina move more
slowly than its neighbor toward the center. As such, where t is the time for a given volume of a solution to flow
shear occurs between adjacent lamina throughout the through the capillary, r is the density of the solution, and
capillary. The surfaces at which shear occurs are all par- the primed and unprimed terms refer to the solution of
allel to the axis of the capillary. The more viscous the protein and an identical solution lacking the protein,
fluid, the more difficult it will be for these surfaces of respectively.
shear to move across one another, and the more slowly The specific viscosity, hsp, is the fractional increase
the fluid will flow through the capillary. The time in the viscosity of the solution due to addition of the pro-
required for a given volume of a fluid to move through a tein, and it increases monotonically as the concentration
given capillary at a given hydrostatic pressure is directly of protein is increased. To render this increase an intrin-
proportional to h, the viscosity (pascal seconds) of the sic property of the protein, regardless of its concentra-
fluid. tion, the intrinsic viscosity, [h] (centimeters3 gram–1), is
The addition of macromolecules such as proteins to defined as
the fluid in the capillary interrupts the shear that other-
wise would occur in the solution in their vicinity and h sp
increases the viscosity of the solution. This increase can [h ] ? lim (12–15)
g prot Æ 0 g prot
be expressed in terms of the specific viscosity, hsp, which
is defined by
where gprot is the concentration of protein in grams cen-
h¢ – h h¢ timeter –3. At low concentrations of protein, hsp should be
h sp ? = – 1 (12–13)
h h directly proportional to gprot, and [h] is simply the slope of
Shape 579
the line of hsp plotted against gprot. Neither the specific particle would be 140 (Figure 12–1D) and the ratio of
viscosity nor the intrinsic viscosity is itself a viscosity. cylindrical length to diameter (L/d) would be 170
The intrinsic viscosity is sometimes called the limiting (Equation 12–7). This ratio is that of a cylinder of length
viscosity number to avoid this confusion. 260 nm with a diameter of 1.5 nm and a volume of
For macromolecules such as proteins, it can be 460 nm3. As noted before, collagen type I is 300 nm in
shown1 that length.
Another procedure that can provide information
Vh NA about the shape of a molecule of protein is the scattering
[h ] = n (12–16) of electromagnetic radiation or neutrons. In the earlier
M prot
discussion of light scattering, it was mentioned that the
intensity of the scattered light from a solution of protein
(
[h ] = n 7prot + d H2O v 0H O
2 ) (12–17) can depend on the angle at which the measurement is
made. This is due to the fact that if the molecule of pro-
tein has at least one dimension that is an appreciable
where n is a dimensionless coefficient of proportionality fraction of the wavelength of the light, photons scattered
referred to as the Simha factor. On the basis of Einstein’s from different points in the same molecule of protein will
calculations, the value of n for a spherical hydrodynamic be out of phase, and intramolecular interference due to
particle is 2.5. As with the frictional ratio, f/f0,h, the rela- these mismatched phases will diminish the overall inten-
tionship between the Simha factor n and shape has been sity of the scattered light. This interference increases as
derived for ellipsoids of revolution.20 The relationships the angle at which the scattered light is measured, the
can be presented graphically (Figure 12–1B). If a value for scattering angle q (Figure 8–4), is increased. At a scatter-
dH2O is assumed, n can be calculated from ing angle of 0, the angle of the forward scattering i0,
there is no interference. It is the forward scattering that
[h ] contains information about the molar concentration of
n = (12–18)
7prot + d H2O v 0H2O particles in the solution, and hence the molar mass of
those particles.
It can be shown that, when the contribution of the
and the apparent value of the axial ratio can be read from virial coefficients to the scattering is eliminated by
the graph. extrapolating the measurements to zero concentration of
From Equation 12–17, if dH2O = 0.3 g g–1, 7prot = protein
0.74 cm3 g–1, and v 0H2O = 1 cm3 g–1, [h] would be
( ) ( )
2.6 cm3 g–1 if the hydrodynamic particle were a sphere K g prot !ñ 2
1 1
regardless of its molar mass. What this means is that as lim =
g prot Æ 0 Rq ! g prot M prot P (q )
long as dH2O and 7prot do not vary significantly, the vis- T, P, m
cosities observed for a set of solutions, each of a different (12–19)
spherical molecule of protein and each at the same con-
centration in grams centimeter–3, will be the same where K is the optical constant (moles centimeter–4)
regardless of whether the mass is distributed among only defined by Equation 8–28, Rq is the Rayleigh ratio (cen-
a few large spheres because the protein has a large molar timeters–1) calculated from the measurements by
mass or is distributed among many small spheres Equations 8–30 or 8–31, gprot is the concentration of pro-
because the protein has a small molar mass. Most globu- tein in the units of grams centimeter–3, (!ñ/!gprot)P,m is the
lar proteins do have intrinsic viscosities between 3.0 and change (centimeters3 gram–1) in the refractive index of
4.0 cm3 g–1 regardless of their molar masses. An observa- the solution as a function of the concentration of the pro-
tion of an intrinsic viscosity in this range demonstrates tein, and Mprot is the molar mass (grams mole–1) of the
that a protein is globular. protein. As noted previously, the incremental scattering
Intrinsic viscosity is dramatically more sensitive to iq is the scattering that results only from the molecules of
the asymmetry of a molecule of protein than is the fric- protein and is measured as the difference in scattering
tional coefficient. The frictional coefficient of a molecule between the solution containing protein and an identical
of collagen type I is only 5 times larger than the fric- solution not containing protein.
tional coefficient it would have if it were a hydrated The function P(q) is the factor by which the inten-
sphere, but the intrinsic viscosity of a solution of colla- sity of the light scattered only by the protein, the incre-
gen type I is 460 times larger than the intrinsic viscosity mental scattered light (iq), is decreased as a result of the
it would have if it were a hydrated sphere. The intrinsic interference:21
viscosity of collagen type I1 is 1150 cm3 g–1. If it is
assumed that dH2O = 0.3 g g–1, the Simha factor is 1160 16 p 2 R G2 q
(Equation 12–18), and if it is assumed that the molecule P (q ) = 1 – sin2 +... (12–20)
is cylindrical, the axial ratio (a/b) of the hydrodynamic 3l2 2
580 Physical Measurements of Structure
where RG is the radius of gyration (centimeters) of the from the measurement. For example, in an elongated
molecule of protein and q is the angle relative to the inci- protein such as fibronectin, which is known to be con-
dent radiation at which the scattered radiation is meas- structed from internally repeating domains, radii of gyra-
ured. The value for the wavelength of the light, l, is its tion can be calculated for various rigid structures built
wavelength in the solution from a string of spheres each representing one of the
individual domains of the molecule, and these calculated
l0 values can be compared to the observed value for the
l = (12–21) radius of gyration estimated by light scattering.22
ñ
The radius of gyration for a single sphere of uniform
where l0 is its wavelength in a vacuum. Equation 12–20 is density is
an infinite series, but at small values of q the higher terms
become negligible and the approximation RG = ()
3 "
5
R sph (12–25)
1 1 16 p 2 R G2 q
lim = = 1+ sin2 where Rsph is the radius of the sphere. The radius of gyra-
q Æ0 P (q ) 16 p 2
R G2 q 3l 2 2
1– sin 2 tion for a cylindrical rod is
3l2 2
(12–22) Lr
RG = (12–26)
12"
can be used. In practice
where L r is the length of the rod. The radius of gyration
lim
g prot
Rq
=
g prot
R0 ( 1+
16 p 2 R G2
2
sin2
q
2 ) for a prolate ellipsoid23 is
( )
q Æ0 3l "
g prot Æ 0 2a 2 + b 2
RG = (12–27)
(12–23) 5
A plot of the left-hand quotient, extrapolated to zero con- where a and b are the semi-major and semi-minor axes,
centration of protein at each value of q, against sin2 (q/2), respectively.
for small values of q, will be a straight line from the slope The effect of the finite size of the molecule of pro-
and intercept of which a value of RG, the radius of gyra- tein on the scattered light is that its intensity, as reflected
tion, can be calculated. Equation 12–19 emphasizes that in Rq (Equation 8–30 or 8–31), decreases as q increases,
the interference arising from the shape of the molecule of owing to intramolecular interference, but its intensity
protein is independent from the colligative property of will decrease significantly only if the term [16p2RG2
light scattering from which its molar mass can be esti- sin2(q/2)]/3l2 in Equation 12–23 is large enough to cause
mated. a measurable effect. In practice,1 this means that at least
The radius of gyration of the molecule of protein is one dimension of the protein must be greater than l/20.
the molecular parameter that is obtained from the angu- The sizes of most molecules of protein are too small for
lar dependence of the intensity of the scattered radiation this to be the case when visible light (l = 300–500 nm in
and that provides information about the shape of the water) is used as the radiation. For example, the decrease
molecule of protein. The radius of gyration of a solid of in light scattering from a solution of fibronectin meas-
uniform scattering density, as is usually assumed to be ured at a wavelength of 436 nm (in vacuo) was only about
the case for proteins, is defined by the relationship 10% at the maximum possible sin2(q/2) of 0,24 even
though fibronectin has a molar mass of 519,000 g mol–1,
∫ r dV
vol
2 is a string of domains with a total length of 180 nm, and
has a radius of gyration of 8.6 nm.22
R G2 = (12–24)
∫ dV
For most proteins, significant decreases in angular
light scattering are observed only when X-radiation is
vol used (l = 0.1–0.2 nm). Unfortunately, this is radiation of
such short wavelength that complete intramolecular
where r is the distance of a volume element dV from the interference occurs at quite small values of q (Equation
center of mass. The integration is performed over the 12–20), and the scattered radiation from a solution of
whole volume of the solid. The advantage of the radius of protein becomes equal to that from the solution lacking
gyration is that it can be calculated by numerical inte- protein when q is only 1–2 ∞. Fortunately, accurate meas-
gration for any structure, for example, a crystallographic urements of scattered X-radiation can be made at the
molecular model, and compared to the value obtained necessary small angles. The values for Rg obtained from
Shape 581
small-angle X-ray scattering, for example, 1.75 nm for 12–2B). The advantage of this presentation is that a
myoglobin,25 2.3 nm for cyclic AMP-dependent protein radius of gyration can be estimated from the limiting
kinase,26 and 1.36 nm for reduced cytochrome c,27 slope at the smallest values of scattering angle q, typically
demonstrate the ability of this technique to provide those less than 1 ∞ (Figure 12–2A). Third, the distance dis-
information about small globular proteins. tribution function p(r), which is the Fourier transform of
When the observations of X-ray scattering are pre- the scattering function
sented, a different convention is used to approximate
P(q). Because
x2 x3
p (r ) =
1
2p ∫ i q r sin (qr ) dq
q (12–31)
exp ( – x ) = 1 – x + – + ... (12–28)
2! 3!
can be calculated. The distance distribution function p(r)
is the frequency with which vectors of a length r connect
the first two terms in Equation 12–20 are identical to the
two volume elements within the molecule of protein
first two terms in the expansion of exp[(16p2RG2
(Figure 12–2C). In practice, the inverse Fourier transform
sin2 (q/2)/3l2], and at small angles21,28
of Equation 12–31
16 p 2 R G2 q
∫
lim ln P (q ) = – sin2 (12–29) d max
sin (qr )
q Æ0 3l2 2 iq = 4 p p (r ) dr (12–32)
0
qr
Because at such small angles none of the terms in
Equation 12–19 except iq (see Equations 8–30 and 8–31) where dmax is the maximum dimension of the particle, is
and P(q) change as q is varied, a plot of ln iq as a function used to compute p(r) in reverse.29
of sin2 (q/2) at the smallest angles will give a straight line In a plot of the distance distribution function p(r)
with a slope of –16p 2RG2/(3l2).* From this slope RG is against r, the longest dimension of the molecule of pro-
readily determined. tein is the intercept of the function with the abscissa. For
In reports of studies of X-ray scattering, there are example, the longest dimension of a molecule of cyclic
several ways in which the observations are analyzed. AMP-dependent protein kinase is 7.2 nm (Figure 12–2C).
First, the data can be presented directly (Figure 12–2A)26 From scattering curves of myoglobin in its native state
as the natural logarithm of the observed incremental (Figure 4–18), after the removal of its heme, and then in
scattering intensity (ln iq) as a function of q, where solution at pH 2, the gradual expansion of the protein as
its structure was disrupted could be followed by moni-
4p sin (q /2) toring the gradual increase in the value of this intercept.25
q ? (12–30)
l The shape of the distance distribution function pro-
vides information about the shape of the molecule of
The advantage of this presentation is that the scattering protein.29 If the molecule of protein is globular, the dis-
calculated from a particular model of the molecule of tance distribution function p(r) has a fairly symmetric
protein as a function of q can be compared to the com- shape with a single maximum (Figure 12–2C). If the mol-
plete set of scattering data. Second, the data can be pre- ecule of protein has an elongated structure, the distance
sented in a Guinier plot as ln iq as a function of q 2 (Figure distribution function will be skewed. If it is elongated in
only one dimension so that it is prolate in shape, the
maximum will be shifted to short distances because
there are more short vectors in a prolate solid than there
* Unfortunately, investigators who study the scattering of X-radia- are long vectors.29 There is a slight indication of such an
tion and neutrons use a different convention for the scattering
angle (Figure 8–4) from those who study the scattering of light. The elongation in Figure 12–2C. If the molecule of protein
same scattering angle routinely designated as q during measure- contains two well-separated globular domains, there will
ments of light scattering is routinely designated as 2q during be two maxima in the distance distribution function, the
measurements of X-ray scattering and neutron scattering. one at shorter distances for vectors confined within each
Consequently, the angles q from measurements of X-ray scattering domain and the one at longer distances for vectors
and neutron scattering must be multiplied by a factor of 2 before
they are used as angles q in the equations presented in this text. The between domains.
term sin2 (q/2) used when results of light scattering are presented A distinction can be made between small-angle
thus becomes sin2 q when results of small-angle X-ray scattering scattering and solution scattering. Measurements of
and neutron scattering are presented, and values of sin2 q from small-angle scattering are confined to the region of the
X-ray scattering and neutron scattering are equivalent to values of scattering function for angles that are only large enough
sin2 (q/2) in the equations used in this text. In measurements of
X-ray scattering and neutron scattering, the slope of the line when to define accurately the distance distribution function
ln iq is plotted as a function of the sin2 q used by these investigators p(r). This range of small angles also includes scattering at
has the value –16p2RG2/3l2. the smallest angles, which provides an estimate of the
582 Physical Measurements of Structure
ated proteins was measured by subtracting the scattering shifting domains in a crystallographic molecular model
from a mixture of the two respective types of of a protein are often confined to a particular orientation
30S subunits that each contained only one of the two either by the packing forces of the crystal or by the bind-
deuterated proteins.42 The Fourier transform (Figure ing of a ligand. A measurement of the radius of gyration
12–2C) of this incremental neutron scattering function from small-angle X-ray scattering of the protein in solu-
provides the frequency with which vectors of length r tion unconfined by the packing forces or in the absence
connect a volume element in one deuterated protein of that ligand can be matched with a value calculated
with a volume element in the other. The maximum in from a molecular model of the protein in which the
such a curve is an estimate of the distance between the domains have a different orientation from those
centers of mass of the two proteins in the 30S subunit. observed crystallographically.46 From a crystallographic
Enough of these distances (92 out of a possible 210) were molecular model of the protein27 or a systematically
measured to establish the relative positions of all 21 of altered conformation of that molecular model, the fre-
the proteins in the 30S subunit.41 The majority of these quency with which vectors of length r actually do con-
relative positions agreed with their relative positions in nect volume elements within the model can be
the crystallographic molecular model of the calculated and compared with the observed distance dis-
30S subunit.43 tribution function. In Figure 12–2C, the line through the
Because the molecules of protein in solution are points was calculated from an altered conformation of
randomly oriented, solution scattering curves for X-radi- the crystallographic molecular model of cyclic AMP-
ation or neutrons are rotationally averaged and by them- dependent protein kinase. This altered conformation
selves can provide only spherically symmetric was proposed to represent the structure that the mole-
structural information. Because molecules of protein cule assumes in solution in the absence of the peptide
are not spherically symmetric, some information about that was bound to the protein when it was crystallized.
the structure of the protein must be available to con- The disagreement between an observed solution
strain the model. For example, scattering of X-radiation scattering curve and that calculated from a crystallo-
or neutrons can be used together with crystallographic graphic molecular model is also an indication that the
information. Crystallographic molecular models of the protein assumes a conformation different from that of
domains of a protein may be available but not a crystal- the crystallographic molecular model when it is dis-
lographic molecular model of the complete protein, and solved in a solution of a particular composition,34 for
scattering curves can provide models for the dispositions example, in which ligands for the protein may be dis-
of the domains in the intact protein.44 Intact solved.33 Theoretical solution scattering curves calcu-
immunoglobulin M is a pentameric ring of five subunits lated from likely alternative conformations often
each composed of two Fab portions and one Fc portion indicate how the structure of the protein has changed. In
formed from folded polypeptides related to those form- the case of aspartate carbamoyltransferase, it had been
ing immunoglobulin G (Figure 11–1). Ten copies of a possible to crystallize the protein in the conformation it
crystallographic molecular model of an Fab fragment of assumes when its regulatory b subunits bind MgATP.
immunoglobulin G and five copies of a crystallographic Even so, the crystallographic molecular model of this
molecular model of an Fc fragment of immuno- conformation47 had to be adjusted before it would pro-
globulin G could be arranged to produce a structure with duce a theoretical X-ray solution scattering curve that
a calculated solution scattering curve in agreement with agreed with the one observed for it in solution.48 Only
that observed for immunoglobulin M, and the appropri- when the distance between the two catalytic a3 trimers in
ate combinations of Fab and Fc fragments could be the crystallographic molecular model (Figure 9–37) was
arranged to produce structures with calculated solution increased 0.3 nm by reasonable rotations of the regula-
scattering curves in agreement with those observed for tory b2 dimers did the theoretical curve agree with the
the respective fragments.32 observed curve.
Measurements of X-ray or neutron scattering from The examples of measurements of small-angle
a solution of a protein for which a complete crystallo- X-ray scattering for cyclic-AMP dependent protein
graphic molecular model is available have also proven to kinase (Figure 12–2C) and of X-ray solution scattering for
be valuable complements to the crystallographic obser- aspartate carbamoyltransferase illustrate the use of scat-
vation. The usual result is that the theoretical solution tering in comparisons of the structure of a protein in
scattering curve calculated from the crystallographic solution to its structure in a crystallographic molecular
molecular model agrees closely with the one that is model. In both of these instances, the measurements of
observed.29,45 Such coincidences are further evidence scattering were used to adjust appropriately and realisti-
that the crystallographic molecular model represents the cally the structure of the crystallographic molecular
structure of the protein when it is in solution. model, and there are other instances in which such
There are, however, a number of instances in which adjustments have also been required.29,31 The fact that
measurements of scattering have been used to adjust reasonable adjustments in the orientations of domains
crystallographic molecular models. Two independently or the disposition of subunits are all that is necessary to
Shape 585
bring errant crystallographic molecular models into ecule of protein coated with the metal appears dark
coincidence is further evidence that the crystallographic against a light background. Whenever results from such
molecular model represents the structure of the protein. electron microscopic studies are presented, a represen-
The solution scattering curves of helical polymeric pro- tative field of molecules (Figure 12–4D)56 should be
teins can also provide information about the parameters shown so that the reader can judge what fraction of the
of the helix into which the monomers are assembled.38 molecules of protein on the film of carbon or in the
The difficulty with measurements of hydrodynamic metallic replica give images that resemble the images
properties or of small-angle scattering for assessing the chosen for a gallery of “representative” views (Figures
shape of a molecule of protein is that they often provide 12–4A–C).
only one unambiguous numerical result, either a fric- Collagen XII from Gallus gallus is a trimer of three
tional coefficient, a Simha factor, or a radius of gyration. polypeptides. All three polypeptides in the trimer are
If the frictional ratio f/f0,h is less than 1.15, the Simha encoded by the same gene, but there are two forms of the
factor n is less than 4.0, or the radius of gyration RG is near polypeptide, one 3100 aa long and the other about
a value of (‹)"R0,h, it can be concluded that the protein is 2700 aa long, produced by translation of alternatively
globular. If the value of one or more of these parameters spliced versions of the same messenger RNA;53 the
for a given protein is significantly greater than the values shorter translation is missing the amino-terminal 400 aa
expected for a sphere, it is usually necessary to conclude of the sequence. The carboxy-terminal 380 aa of each
that the protein has a highly irregular surface, has an polypeptide contains two segments (152 and 103 aa) of
extended structure, has a high degree of hydration, or has collagen repeat, and in this region the three polypeptides
some combination of these features. It is clear from the of the trimer should form an interrupted triple-helical
foregoing discussion of some of the results that larger rope of collagen (Figure 9–33) with two segments 45 and
values of these parameters are consistent with a large set 30 nm in length. This triple-helical rope is the structure
of particular arrangements of the available mass and holding the three polypeptides together in the oligomer.
values for hydration. The only reason so much is heard The amino-terminal 1870 aa of the short splice variant
about prolate and oblate ellipsoids of revolution is not contains 10 fibronectin type III modular domains, strung
that molecules of proteins are such geometric solids but together in a necklace that should be 32 nm in length,22
that frictional coefficients and radii of gyration can be and one amino-terminal von Willebrand factor type A
calculated explicitly for such solids. In using any of these modular domain, a globular structure about 4 nm in
measured parameters in an informative way, other diameter. The longer splice variant has an additional
details about the structure of the protein are essential. eight fibronectin type III modular domains (26 nm) and
One way to observe the shape of a molecule of pro- two additional von Willebrand factor type A domains.
tein directly is by electron microscopy. The three sym- The electron micrographs of collagen type XII display a
metrically protruding regulatory subunits on aspartate structure that is the fulfillment of these expectations
carbamoyltransferase and the hollow, water-filled cavity (Figure 12–4C).53,54 The single, thin collagen tail of 75 nm
between its two rotationally symmetric, trimeric a3 sub- is kinked about 30 nm from its end.54 There is a central
units (Figure 9–37), which together probably account for globular region from which three significantly thicker
its abnormally large frictional coefficient, were first arms extend, each either a short or a long variant; the
observed in electron micrographs of the protein (Figure short variant is about 40 nm long with a globular ball at
12–4A)49,50 before there was a crystallographic molecular its end,53 and the long variant is about 90 nm long with a
model. Another protein with an abnormally large fric- globular ball in its middle and two globular balls at its
tional coefficient is fibrinogen. Electron micrographs of end.54 The three polypeptides enter the center wrapped
fibrinogen (Figure 12–4B),51 which turned out to be around each other in the collagen tail and leave the
remarkably accurate representations of its structure, center separately as three fibronectin necklaces.
were published 20 years before a crystallographic molec- Electron micrographs of activated bovine coagula-
ular model of the protein (Figure 13–22) became avail- tion Factor Va were instrumental in explaining its
able.52 unusual behavior upon sedimentation analysis. The pro-
To prepare it for direct observation in the electron tein has a molar mass57 of 170,000 g mol–1 and a standard
microscope, a protein molecule can either be negatively sedimentation coefficient,58 s020,w, of 8.2 S, from which a
stained by being embedded in a glass of the salt of a frictional ratio f/f0,h of 1.6 (based on the assumption that
heavy metal ion such as uranyl cation or phospho- dH2O = 0.3) can be calculated. When activated coagulation
tungstate anion (Figure 12–4C, upper four images)53,54 or Factor Va was observed by electron microscopy, it was
be positively stained by being rotary shadowed55 with a found to be two globular domains of protein, very simi-
layer of platinum that produces a metallic replica of the lar in diameter, attached together through a narrow
molecule (Figure 12–4C, lower four images). In the neck.59 This shape seen in the electron micrographs is
former method, the molecule of protein, because it is less responsible for the unusually large frictional ratio of this
electron dense than the glass, appears as a light image protein. Although an axial ratio for a prolate ellipsoid was
against a dark background; in the latter method, the mol- calculated from the earlier results of the sedimentation
586 Physical Measurements of Structure
analysis,58 it was moot when the electron micrographs The average of this stack of images could be calculated,
became available.59 and this average represents an enhanced image of the
Rotary shadowing is usually used for estimating the actual molecule (inset, Figure 12–4D). The chalice seen
dimensions of extended molecules such as collagen XII in the enhanced image can be imperfectly discerned in
(Figure 12–4C), nidogen,60 inversion-specific glycopro- each of the individual selected images (Figure 12–4D),
tein,61 fibulin,62 and myosin.55 The molecule of protein is and this correspondence fulfills the usual requirement
spread upon a flat sheet of mica before being coated with placed upon any reconstruction. A similar procedure has
metal. Consequently, a long, thin, flexible molecule been applied to a2-macroglobulin65 to obtain enhanced
should lie almost flat upon the surface, and the contour images. This protein has a shape almost as peculiar as
length of its replica in the two-dimensional micrograph that of phosphorylase kinase.
should be almost as long as its actual contour length in It is also possible to obtain a three-dimensional
three dimensions. For this reason, rotary shadowing is reconstruction of the structure of a macromolecule
thought to be the most reliable method to obtain esti- observed in an electron micrograph. An electron micro-
mates of the length of an extended molecule of protein. graph of a macromolecule is the two-dimensional pro-
For example, the frictional ratio f/f0,h for a fragment of jection either of the structure of that macromolecule if it
caldesmon (amino acids 166–450 from a polypeptide of is embedded in amorphous vitreous ice or of the mold of
756 aa)63 is 2.2, consistent with a cylinder of the appro- that macromolecule if it is embedded in negative stain. It
priate volume that is 40 nm long. Electron micrographs has already been noted that the two-dimensional Fourier
of this fragment of caldesmon that had been rotary-shad- transform of the projection of a three-dimensional object
owed with platinum and tungsten displayed elongated is a central section of the three-dimensional Fourier
molecules of uniform thickness with contour lengths transform of the unprojected object (Equations 9–4
that averaged 35 nm. In rotary-shadowed images, globu- through 9–6). From the complete three-dimensional
lar domains such as the two heads of myosin or the three Fourier transform of the object, the distribution of scat-
globular domains of nidogen appear as dark, unfeatured tering density within the object, and hence details of its
lumps. Although globular proteins constructed from three-dimensional structure, can be calculated by
clusters of globular domains or from globular subunits Fourier transformation. To gather the complete three-
can appear as clusters of dark lumps upon rotary shad- dimensional Fourier transform of the object, Fourier
owing,64 they usually appear as single undifferentiated transforms of projections of the object in a large number
and structureless lumps of platinum. Negative staining of different orientations must be assembled. This is
(Figure 12–4A,D) or embedding in amorphous vitreous accomplished in a helical polymeric protein by the fact
ice is required to obtain images displaying details of the that the helical array positions the monomer in specific,
structure of a globular protein. defined orientations, each of which provides a different
Phosphorylase kinase is a globular protein with a projection. When molecules are not arranged in such an
dramatically peculiar shape. Because of its unusual array but scattered upon the field, as are the molecules of
shape, a collection of digitized micrographic images of phosphorylase kinase in Figure 12–4D, it is necessary to
individual molecules (Figure 12–4D) could be super- define the precise orientation of each of them relative to
posed by a computer and stacked one upon the other.56 the plane of the micrograph before their individual
Figure 12–4: Asymmetric molecules of protein viewed by electron microscopy. Solutions of the protein of interest (10 mg mL–1 to 1.0 mg
mL–1) were applied to electron microscopic grids coated with a thin film of carbon49 supported by a net of either collodion or formvar. The
layer of carbon (~5 nm) was ionized so that it was hydrophilic enough to accept the aqueous solution. The molecules of protein were
adsorbed to this surface and were then negatively stained with either 2% phosphotungstate (panel A) or 1–2% uranyl formate (panels B and
D). The water evaporates to leave a glass of the heavy metal salt in which are embedded the molecules of protein. (A) Gallery of selected
images50 of aspartate carbamoyltransferase from E. coli (Figure 9–37) viewed either along one of its 2-fold rotational axes of symmetry (left
three images) or along its 3-fold rotational axis of symmetry (right three images) at 480000¥. Reprinted with permission from ref 50. Copyright
1972 American Chemical Society. (B) Gallery of selected images of bovine fibrinogen51 at 480000¥. The elongated molecule has globular
domains at each end. Reprinted with permission from ref 51. Copyright 1981 Academic Press. (C) Selected images of collagen type XII from
Gallus gallus53,54 at 240000¥. The upper four images were negatively stained with uranyl formate. Reprinted with permission from ref 54.
Copyright 1992 Blackwell Publishing. The lower four images are molecules of protein that were rotary shadowed.55 A solution of the protein
was sprayed into an aerosol mist and droplets of the mist were adsorbed to a sheet of mica. A beam of platinum atoms was directed at an
angle of 6 ∞ onto the surface of the mica as it was rotated at 120 revolutions min–1. The resulting thin film of platinum containing replicas of
the molecules of protein was transferred to a copper grid. In the four rotary-shadowed images, selected representatives of a homotrimer of
long splice variants (l3), of a homotrimer of short splice variants (s3), and of the two heterotrimers (l2s, ls2) are presented at 240000¥. Reprinted
with permission from ref 53. Copyright 1995 The Rockefeller University Press. (D) A field of negatively stained molecules of phosphorylase
kinase56 at 480000¥. This is an accurate representation of the usual situation. Most of the molecules of protein negatively stained on the grid
are featureless asymmetric structures. The minority that present a repeating, definable image (indicated by arrowheads) are selected by the
microscopist as representative images of the protein and presented in galleries as in panels A and B. In this instance, the shape of the indi-
vidual images of phosphorylase kinase was so peculiar that digitized optical densities of a large number of the selected images of individual
molecules (62) could be sequentially superposed by a computer to obtain an enhanced image (inset). Reprinted with permission from ref 56.
Copyright 1985 Academic Press.
Shape 587
D
588 Physical Measurements of Structure
Fourier transforms can be summed together to obtain each protein on the assumption that they are prolate
the complete three-dimensional Fourier transform of the ellipsoids of revolution.
average molecule.
When the objects scattered over the field of the Problem 12–2: Human immunoglobulin G is a protein
electron micrograph are viruses, the icosahedral symme- with a molar mass of 167,000 g mol–1 and a partial spe-
try of each of the viral coat proteins permits the exact ori- cific volume of 0.739 cm3 g–1.
entation of each individual virion to be estimated.66–70
The assignment of an orientation to each virion permits (A) Assume hydration to be 0.3 g of H2O (g of pro-
the Fourier transforms of their projections to be added tein)–1 and calculate the minimum frictional coef-
together to obtain a complete three-dimensional Fourier ficient, f0,h, for the hydrated hydrodynamic
transform of the average viral particle. In this way, a particle at 20 ∞C in water.
three-dimensional reconstruction of the structure of the (B) The standard sedimentation coefficient s020,w for
viral coat71 and other appendages of the virus that are immunoglobulin G is 7.0 ¥ 10–13 s, and the stan-
distributed with icosahedral symmetry72,73 can be calcu- dard diffusion coefficient D020,w is 4.0 ¥ 10–7 s–1.
lated. For such reconstructions, the viral particles are Calculate the frictional coefficient, first from the
usually suspended in a layer of amorphous vitreous ice. standard sedimentation coefficient and then from
If it is possible to define somehow the orientation of the standard diffusion coefficient.
each member of a population of asymmetric molecules
spread upon a grid at random, the same type of summa- (C) From the average of these two estimates of the
tion can be performed. If the molecule has a tendency to frictional coefficient and from the estimate of the
lie upon the carbon surface in a preferred orientation, for minimum frictional coefficient of the hydrated
example, the molecules of phosphorylase kinase that hydrodynamic particle, estimate the axial ratio
settle on the grid to present a projection in the shape of a a/b upon the assumption that the molecule is a
chalice (Figure 12–4D), this tendency orients them in one prolate ellipsoid of revolution.
dimension but fortunately only in one dimension. The (D) The shape of an immunoglobulin G is displayed
direction in which each individual oriented molecule in Figure 7–13. How does this structure compare
faces upon the surface is random, and the direction in with your estimate of its shape?
which each faces can be defined by direct observation.
When the grid is then tilted 50 ∞, a large collection of mol- Problem 12–3: Thiosulfate sulfurtransferase is a mono-
ecules, each in a completely different three-dimensional meric enzyme. The polypeptide from bovine liver is 296
orientation but each in a known three-dimensional ori- amino acids in length and has a molar mass of 33,160 g
entation relative to the others, is created.74 From a sum- mol–1. In water at 20 ∞C the standard sedimentation coef-
mation of their individual Fourier transforms in the ficient of the native protein is 3.00 ¥ 10–13 s, and its stan-
tilted image, a three-dimensional Fourier transform of dard diffusion coefficient is 7.50 ¥ 10–7 cm2 s–1. The
the structure of the average molecule can be gathered. partial specific volume of the protein is 0.742 cm3 g–1.
From this Fourier transform, a molecular model can be
calculated. Such reconstructions have been performed (A) Calculate the frictional coefficient of the protein.
for human a2-macroglobulin75 and the 50S subunit of the (B) Calculate the frictional ratio for the hydrody-
ribosome.76,77 The resulting molecular model of the namic particle f/f0,h, with the assumption that the
50S subunit of the ribosome, albeit at low resolution, hydration of the protein is 0.3 g of H2O (g of pro-
resembled quite closely the crystallographic molecular tein)–1.
model that became available 12 years later.78
(C) What would be the axial ratio of an ellipsoid of
Suggested Reading revolution with this frictional ratio?
Perkins, S.J., Nealis, A.S., Sutton, B.J., & Feinstein, A. (1991) Solution (D) How does this estimation compare to the crystal-
structure of human and mouse immunoglobulin M by synchro- lographic molecular model of the protein (Figure
tron X-ray scattering and molecular graphics modelling. A pos- 9–18)?
sible mechanism for complement activation, J. Mol. Biol. 221,
1345–1366.
Problem 12–4: The standard sedimentation coefficient
Mani, R.S., Karimi-Busheri, F., Cass, C.E., & Weinfeld, M. (2001)
of human fibrinogen is 7.63 ¥ 10–13 s, its molar mass is
Physical properties of human polynucleotide kinase: hydrody-
namic and spectroscopic studies, Biochemistry 40, 12967–
344,000 g mol–1, and its partial specific volume is
12973. 0.725 cm3 g–1.
(A) Assume dH2O = 0.3 and determine the volume of
Problem 12–1: Calculate the values of fsed, fdiff, and f0,unh the hydrodynamic particle, the frictional coeffi-
from 7, s 020,w, D020,w, and Mprot for each protein in Table cient of fibrinogen, its frictional ratio, and its axial
12–1. From tabulated values of fav/f0,h determine a/b for ratio and dimensions on the basis of the assump-
Shape 589
(C) The following data82 were gathered from solutions index of the solvent in which the myosin was dissolved
of tropomyosin at an ionic strength of 1.1 M: was 1.34. The refractive index increment (!ñ/!gprot)P,m for
myosin is 0.208 cm3 g–1, and the molar mass of myosin is
gprot h¢/h 527,000 g mol–1.
[g (100 mL)–1]
1.210 0.33 γ prot
1.299 0.44 lim
γ prot Æ 0 Rθ
1.466 0.64 q
1.588 0.76 (deg) (g cm–2)
1.793 0.96 30 0.74
2.223 1.29 35 0.74
2.972 1.72 40 0.78
4.603 2.14 45 0.80
50 0.83
55 0.86
where h¢ is the viscosity of the solution of protein,
60 0.89
h is the viscosity of the solvent, and gprot is the con- 70 0.95
centration of protein. Determine the intrinsic vis- 90 1.07
cosity [h] at this ionic strength. 110 1.17
140 1.24
(D) Assume that dH2O = 0.3 g of H2O (g of protein)–1
and calculate the Simha factor n. From n deter-
mine the axial ratio for tropomyosin if it were a
(A) What was the wavelength of the light in the solu-
prolate ellipsoid of revolution by using Figure
tions?
12–1B,D.
(B) What is the radius of gyration of the myosin under
(E) From spectroscopic measurements it is known
these circumstances?
that, at all ionic strengths, tropomyosin is >90%
a-helical. It is a coiled coil in which the two (C) What would be the length of a rod with this radius
a helices wrap around each other as the strands in of gyration?
a two-stranded rope. The length of an a helix for
(D) What is the length of the molecules of myosin in
each of its amino acids is 0.15 nm. Calculate the
Figure 13–30A? The magnification stated in the
length of a molecule of tropomyosin at high ionic
legend for Figure 13–30A is for the figure in the
strength and, assuming it to be a cylindrical rod,
text.
calculate its diameter from its hydrated molecular
volume. What is its actual axial ratio? Compare
this to the axial ratio obtained from n. Problem 12–8: Triskelion is a protein that assembles
(F) The intrinsic viscosities of solutions of into spherical structures known as clathrin coats. These
tropomyosin also vary with ionic strength:82 clathrin coats are the structures that surround the
coated vesicles formed from the invagination of the
plasma membrane of an animal cell at sites known as
[h] ionic strength
(M) coated pits. Bovine triskelion is formed from three iden-
tical heavy polypeptides (naa = 1675 polypeptide–1,
1.00 1.1 Mprot = 191,590 g mol–1) and three identical light
1.03 0.6
polypeptides (naa = 228 polypeptide–1, Mprot = 25,080
1.23 0.3
1.75 0.2 g mol–1). Its partial specific volume, calculated from its
2.45 0.1 amino acid composition, is 0.744 cm3 g–1.
(A) Calculate the unhydrated volume of triskelion.
Plot molar mass against specific viscosity and explain the (B) What would be the unhydrated radius (R0,unh) and
correlation in terms of structures that could form as the the unhydrated radius of gyration (RG,sph) of
ionic strength is lowered. triskelion if it were a sphere?
The light scattering of triskelion dissolved in a
Problem 12–7: The following are a set of data for the buffered solution was monitored either as a function of
light scattering of a series of solutions (0.2–0.8 g L–1) of the concentration of protein (gprot) or as a function of
myosin.83 The wavelength of light used for the observa- the scattering angle q.84 Reprinted with permission
tions was 436 nm (in a vacuum). The solutions were from ref 84. Copyright 1991 American Chemical
examined at a temperature of 20 ∞C, and the refractive Society.
Shape 591
( )
0 0.2 0.4 0.6
iq
ln P (q ) = lim ln – A
( m mol g –1)
g prot Æ 0 g prot
2.0
iq 16 p 2 R G2 q
lim ln = A – sin2
g prot Æ 0 g prot 3l 2 2
1.0
0 0.2 0.4 0.6 If w is expressed in radians
g prot (mg cm –3)
w3 w5
sin w = w – + – ...
Open and closed circles show the plots of [Kgprot/Rq]qÆ0 3! 5!
against the concentration of clathrin and [Kgprot/Rq]g protÆ0
against sin2 (q/2), respectively. The units on the vertical
(A) Show that
axis are moles gram–1. The authors of this study have
used an optical constant K that incorporates the incre-
16 p 2 R G2 q
()
ment of the refractive index so that its value is iq 2
2p2ñ02(!ñ/!gprot)2/NAl04. The refractive indices (ñ) of the lim ln = A –
q Æ0 g prot 3l2 2
two solutions were both 1.34. The laser used in the exper- g prot Æ 0
iment emitted polarized light of wavelength 632.8 nm in
the vacuum.
It is more convenient to take a series of measurements at
varying scattering angles, q (in radians), of a single solu-
(C) How well does the molar mass of the protein
tion of protein than to measure the scattering at a fixed
observed in the light scattering experiment agree
angle for several solutions of protein. Therefore, what is
with that calculated from the sequences of the
usually done is to determine the slope of ln (iq /gprot) as a
constituent polypeptides of triskelion?
function of q 2 at various fixed values of gprot and then
(D) From the slope of the appropriate line in the figure, extrapolate to gprot = 0. If, however, gprot is held constant,
calculate the radius of gyration for triskelion. as is done in such an experiment, then
( )
iq r2 !ñ –2 the scattered X-rays (l0 = 0.154 nm; ñ = 1.33; l =
P (q ) = lim 0.116 nm) as a function of the square of the scattering
g prot Æ 0 g prot 2 I 0 K M prot ! g prot
T, P, m angle were measured for a series of solutions of immu-
592 Physical Measurements of Structure
nity protein.23 Reprinted with permission from ref 23. Absorption and Emission of Light
Copyright 1983 Journal of Biological Chemistry.
A valence electron in any molecule occupies an atomic
orbital or a molecular orbital that has energy levels asso-
ciated with it (Figure 12–5).85 These energy levels have
8 discrete magnitudes because of the quantum theory, and
Protein
the steps between any two energy levels are also of dis-
(mg mL-1) crete magnitude or quantized. The energy levels that
14.6 have the smallest steps between them are the rotational
6 energy levels. These energy levels correspond to the
11.7 quantized kinetic and potential energy associated with
ln iq
S1
E2
Energy
IR IR vibrational transitions
S1ÆT1 T1
A
S0 F
rotational
energy levels P
expanded
E1 scale
0r 1r 3r
e e* e*
Bond distance
Figure 12–5: Photophysical processes experienced by an electron in a covalent bond between two atoms.85 The smooth curve S0 is the poten-
tial energy of the molecular orbital of the covalent bond in which the electron resides as a function of the distance r between the two nuclei.
The smooth curve S1 is the potential energy that would be experienced if the electron were transferred to a particular unoccupied antibond-
ing molecular orbital between the two atoms as a function of the distance between the two nuclei. Each well of potential energy has levels of
vibrational energy (the parallel lines within each well) and levels of rotational energy (see expanded scale of the potential energy of the ground
state to the left) associated with it. Absorptions by electrons of photons of energy equivalent to the differences in energy between vibrational
energy levels (process IR) produces an infrared spectrum. When an electron in its occupied molecular orbital, the ground state, absorbs a
quantum of electromagnetic energy sufficient to boost its energy high enough to enter the unoccupied molecular orbital, the excited state is
created (process A). As the excited state relaxes, some of the absorbed energy is lost as heat. When the relaxed excited state emits light as the
electron returns to the ground state (process F), the quantum of emitted fluorescent light has less energy (longer wavelength) than that of
the quantum of light originally absorbed. If the spin of the electron inverts while it is in the excited state (process S1 Æ T1), the electron enters
a triplet excited state. The smooth curve T1 is the potential energy of the electron in the bond in the triplet state as a function of bond length.
The triplet excited state also relaxes by giving off heat. The phosphorescent light emitted from the relaxed triplet state (process P) has even
less energy (even longer wavelength) than the fluorescence from the initial excited state, and the triplet excited state has an even longer life-
time. The bond lengths of the ground state, the excited singlet state, and the excited triplet state are indicated as 0re, 1re*, and 3re*. Adapted
with permission from ref 85. Copyright 1977 W.A. Benjamin.
ing the occupation of the successive levels at tempera- longer than that of the incident photon by a difference
tures experienced by living organisms are also signifi- equivalent to the energy of the transition between the
cant. Consequently, in the ground state, the vibrational two vibrational energy levels (expanded scale in Figure
energy level that is occupied is usually (>90%) the lowest 12–5). If the photon is absorbed by an electron in the
for each particular vibration. Because, however, their dif- excited state of a vibrational mode, it can carry away the
ferences in energy are so small, rotational energy levels energy of the transition to the ground state and be scat-
are widely occupied by the bonds in the different mole- tered with higher energy. If it is scattered by an electron
cules in the solution. in the ground state that enters an excited state during its
A photon can encounter an electron in a molecule residence, the photon will provide the energy for this
in such a way that its energy is absorbed. If the electron transition and be scattered with lower energy. As a result,
absorbs the energy of the oscillating electric field and the energies of the scattered light vary symmetrically
then immediately emits the same photon back again about the energy of the incident light. The intensities of
without retaining any of its energy, the direction of the the bands of scattered light with the longer wavelengths
electromagnetic wave is altered so that its new direction are greater than those of shorter wavelength because
of propagation bears no relationship to its incident direc- most vibrations are in the ground state before a photon
tion while its wavelength remains the same. This is elas- is absorbed. The Raman effect that results is a set of
tic scattering, and it is the phenomenon mainly small changes in wavelength that are experienced by a
responsible for X-ray diffraction, low-angle X-ray scatter- small percentage of the photons that are scattered by the
ing, and light scattering. electrons in the sample. As with light scattering itself, the
If the electron that has absorbed the photon is in a Raman effect on scattered light is usually measured by
bond that happens to change its vibrational energy level sampling the light emitted by a sample perpendicular to
during the instant that it is excited, the subsequently the incident light. The incident light is from a laser, and
scattered photon will have a wavelength that is shorter or it is intense and monochromatic. It is the spectrum of the
594 Physical Measurements of Structure
wavelengths of the scattered light that is determined. formation of the excited state can be followed by moni-
Although almost all of the scattered light is the same toring the disappearance of the absorption of light by the
wavelength as the incident light, scattered light of other ground state86 because the excited state, being a new
specific, sharply defined wavelengths is also present, and molecule, has a different absorption spectrum. At the
a Raman spectrum of this scattered light provides a cat- very least, in its most stable structure, this new molecule,
alogue of many of the transitions among the vibrational the excited state, will have some bond lengths, bond
energy levels of the molecule. angles, and bond energies that are different from those of
The decision as to whether the photon is immedi- the ground state because its bonding differs from that of
ately scattered from the electron, in an event that is the ground state. The instant the electron enters the new
essentially instantaneous, or is absorbed by the electron orbital, however, the molecule has the structure of the
for a longer period of time depends on how closely the ground state. As the excited state relaxes in energy to the
energy of the photon matches one of the differences most stable structure available to it, the distance in
between the quantized energy levels available to the energy between excited state and ground state shortens,
electron. If the energy of the photon that has just been and the excited electron loses energy. Because of the
absorbed by the electron is equal to the difference in overlap required for excitation, the excited electron usu-
energy between the vibrational energy level its bond ally enters the excited state through one of its higher
occupied at the instant the photon was absorbed and a vibrational energy levels, and it simultaneously loses
higher vibrational energy level accessible to it, the energy by a nonradiative passage to the lowest vibra-
photon can be absorbed completely, and the vibrational tional energy level. The net result of these relaxations is
energy of the bond occupied by the electron or that of a that the excited electron very rapidly (10–13 to 10–11 s)
bond vibrationally coupled to it will increase by that step finds itself at an energy considerably below the energy it
in energy (process IR in Figure 12–5). Most of the energy had achieved immediately after the light was absorbed.
absorbed will not be emitted back radiatively as light but Because the rotational and vibrational energy levels
will be dissipated nonradiatively by intermolecular colli- of the electronic excited state usually overlap rotational
sion or by exciting coupled rotational motions the energy and vibrational energy levels of the ground state, the
levels of which bridge the gap between the vibrational electron usually reenters the molecular or atomic orbital
energy levels of the ground state and the excited state of the ground state by pursuing a path among the rota-
(Figure 12–5). Any light emitted radiatively due to a direct tional and vibrational energy levels of excited state and
transition from the excited state back to the ground state ground state. In this case, the energy originally absorbed
has the same wavelength as the absorbed light but an is dissipated nonradiatively as heat, and only the absorp-
altered direction and becomes distinguishable from elas- tion of the light is detected. The absorption of ultraviolet
tically scattered light only by the delay in its reemission. and visible light (in the range between 200 and 2000 nm)
The absorption of infrared light (in the range from produces transitions among electronic energy levels, and
20,000 to 2000 nm, or 500 to 5000 cm–1)* produces tran- the result is a spectrum of ultraviolet or visible absorp-
sitions among vibrational energy levels, and a spectrum tion that has maxima the energies of which correspond
of infrared absorption has discrete maxima the energies to the energies of electronic transitions in the molecule.
of which correspond to transitions between pairs of If, however, the energy levels of the ground state
vibrational energy levels. and the excited state overlap weakly, the excited electron
If the energy of the photon absorbed by the electron can become trapped in the lowest vibrational energy
is equal to the difference in energy between the molecu- level of the excited state long enough (>10–9 s) to reenter
lar orbital or atomic orbital the electron occupies in the the ground state with a bang rather than a whimper. The
ground state and an unoccupied molecular or atomic reentry of the excited electron into the ground state in
orbital of higher energy, the photon can be absorbed such a single step requires that the energy it loses be
(process A in Figure 12–5). The electron enters the unoc- emitted as a photon. This emission is either fluorescence
cupied orbital, and an electronically excited state of the or phosphorescence.
molecule is created. Because the excited state differs If it came from a covalent bond or a lone pair of
from the ground state in the distribution of its electrons electrons, then at the instant of excitation, the excited
among molecular and atomic orbitals, it should be electron entering the new orbital has a spin opposite to
thought of as a distinct, albeit similar, molecule. The the spin of the partner it left behind in its previous
orbital. As it relaxes into the lowest vibrational energy of
* It is customary for investigators using Raman spectroscopy or the excited state and as the excited state rearranges, the
infrared spectroscopy to present absorption as a function of the excited electron remains coupled to its old partner, and
inverse of the wavelength, referred to as the wavenumber (in cen- the excited state remains a singlet excited state. From a
timeters–1), which is directly proportional to the energy of the singlet excited state, the electron can rapidly return to
absorption. One advantage of this convention is that the two sym-
metrical displacements of the Raman effect for the same vibra- the ground state because the excited electron can readily
tional mode have the same numerical values when expressed in reenter its old orbital with a spin compatible with the
terms of wavenumber. single electron still there (process F in Figure 12–5) The
Absorption and Emission of Light 595
reentry is spin-allowed and rapid (<10–7 s), and the emit- absorbance are referred to as the amide I band, the
ted photon is fluorescence. amide II band, and the amide III band, respectively. The
The energy of a photon of fluorescent light is nec- amide I band of absorbance is the strongest of the three
essarily less than the energy of the photon absorbed by and is the only one that is located in a region of the spec-
the electron during excitation because of the nonradia- trum that does not contain significant absorptions from
tive relaxations of the excited state that have occurred. other groups in a protein (Figure 12–6).88
Fluorescent light is light of a longer wavelength (usually Direct infrared spectroscopy of proteins in aqueous
visible light) emitted shortly after (within 10–7 s) a mole- solution is severely compromised by the strong absorp-
cule has absorbed light of a shorter wavelength (usually tion of infrared light by water and other solutes. An
ultraviolet light). The spectrum of the light absorbed is infrared spectrum registered by the Raman effect, how-
the absorption spectrum; the spectrum of the light emit- ever, avoids these drawbacks. A Raman infrared spec-
ted is the emission spectrum. Fluorescent light, as with trum is monitored as small differences in wavelength
scattered light and for the same reasons, is emitted in all relative to the wavelength of the incident light.
directions relative to the incident light unless intramole- Consequently, the actual light registering each of the
cular interference occurs. It is usually measured perpen- bands in the Raman spectrum is within the visible range,
dicular to the direction of the incident light. It can be not the infrared range, and the problems of the absorp-
measured under continuous excitation, or the excitation tion of infrared light by water and other solutes and by
can be a flash (<10–9 s in length), and the rate of decay of the container are avoided. Although the water in the sol-
the fluorescence following the flash, indicative of its life- vent also produces Raman bands in the regions of its
time, can be measured. absorptions, they are much weaker than their direct
If the electronically excited state is structured in absorptions of infrared light,89 and the subtraction of
such a way that the excited electron can become background from the spectrum of the protein is much
unpaired with the electron it left behind, it can enter a less drastic.
triplet excited state by intersystem crossing. The ground Raman infrared spectroscopy, however, has its own
state of the triplet state is usually of lower energy than disadvantages. Two of those disadvantages are that the
that of the singlet state. Once the triplet excited state has signals registering the Raman infrared spectrum are
been occupied, the electron can return to the ground weak and that these small signals can be swamped by flu-
state only through a spin-disallowed process that is very orescence. In addition, the presence of large particles
slow (on the order of microseconds to seconds). The such as fragments of membrane, by increasing the scat-
emitted light, or phosphorescence (process P in Figure tering from the solution, makes Raman spectroscopy
12–5), emerges from the solution over a relatively long even more difficult. Direct infrared absorption is unaf-
period of time and has an even longer wavelength than fected by this latter problem, and infrared spectra of sus-
the rapidly emitted fluorescence. Fluorescence is light
emitted from singlet excited states; phosphorescence,
from triplet excited states. A B C
Absorbance
pensions of membranes can be readily measured.90–92 which can permit the observation of the vibrational tran-
Measurements of the direct infrared spectrum of solid sitions for a single bond among the thousands within a
dehydrated protein or even a crystal of protein can also particular protein.
be made.93,94 If the protein contains a functional group that
When a solution of protein is excited with a He–Ne absorbs the exciting visible light in an electronic transi-
laser, the Raman infrared spectrum of the scattered light tion, as does, for example, the heme in hemoglobin,99
(Figure 12–7)95 displays a maximum arising from the bands in the Raman infrared spectrum resulting from the
amide I band of the folded polypeptide with a wavenum- absorption of energy by vibrations of the atoms within or
ber of around 1650 cm–1 less than the wavenumber of the adjacent to that functional group will be enhanced, and
majority of the scattered light, which has the same this enhanced spectrum is referred to as a resonance
wavenumber as the incident light (15,802 cm–1, Raman infrared spectrum.100 The maxima in a reso-
632.8 nm). The amide III maximum at a wavenumber nance Raman infrared spectrum can often be assigned to
1250 cm–1 less than that of the elastically scattered light vibrations of particular bonds, such as an iron–dioxygen
and other maxima that can be assigned to vibrational stretching vibration in oxygenated hemoglobin,99 the
transitions in some of the amino acids, such as phenyl- oxygen–oxygen stretching vibration of the peroxy form of
alanine, tyrosine, and methionine, are also observed.96 hemocyanin,101 the iron–oxygen stretching vibration of
By using difference spectra between a selectively deuter- the ferryl intermediate of cytochrome d ubiquinol oxi-
ated protein and the same protein undeuterated, absorp- dase,102 or the copper–sulfur stretching vibrations in
tion bands in the Raman infrared spectrum from other halocyanin.103 When light of wavelength 200 or 206.5 nm,
amino acids, such as leucine, isoleucine, valine, alanine, which is in the range where peptide bonds absorb
glutamate, and aspartate, can be dissected out of the strongly, is used to produce a resonance Raman infrared
complete spectrum.97 Difference Raman infrared spectra spectrum, the amide I, amide II, and amide III bands are
between proteins selectively labeled with 18oxygen and selectively enhanced.104,105
their unlabeled counterpart have also been reported,98 The amide I band in the direct infrared spectrum of
a solution of a particular protein registers its secondary
structure.88,90,106–108 The amide I band in the direct
A infrared spectra of a protein containing mainly a helix,
Amide for example, hemoglobin (87% a-helical), has a maxi-
Intensity of scattered light
Amide
I
C–H IIITyr
Phe
SO42– mum at around 1655 cm–1; that of a protein rich in
&
Tyr Phe b sheet, for example, immunoglobulin G (67% b struc-
Tyr –COO–
ture), has a maximum around 1635 cm–1; and that of a
Tyr Tyr C–S
&
Æ
Tyr
Phe & Phe C–S Tyr
S–S
Tyr Phe (Met Phe protein with a mixture of these two secondary structures,
for example, ribonuclease A (23% a helix and 46%
C–C & Tyr
B b structure), has a spectrum that seems to register this
mixture (Figure 12–6).88 Various algorithms have been
Phe derived for estimating the percentages of a helix, b struc-
C–S
Tyr
C–H & Tyr
(Met ture, and b turn in a protein from the shape of the amide I
COOH Phe & C–S
Tyr
S–S band in its direct infrared spectrum.88,107 The amide I
+ Tyr Phe Phe
Tyr NH3
Phe
Phe band in the direct infrared spectrum of a coiled coil of
Tyr
a helices also has a characteristic shape, diagnostic of
1750 1500 1250 1000 750 500 250 this structure.109
Raman frequency shift (cm–1) Unfortunately, as mentioned above, the amide I
band falls in a region of the direct infrared spectrum
Figure 12–7: Raman spectra of the intensity of the light scattered where water absorbs strongly, and this strong absorp-
from (A) a solution of ribonuclease at 200 mg mL–1 and (B) a solu-
tion of amino acids at the same ratio that they are present in tion by the solvent and its vapor must be subtracted to
ribonuclease.95 The samples were excited with a He-Ne laser obtain only the amide I band of the protein.88 Deuterium
(l = 632.8 nm), and the intensity of the light scattered was meas- oxide does not absorb strongly between 1700 and
ured as a function of wavenumber (centimeter–1) in the neighbor- 1600 cm–1, and direct infrared spectra of proteins in deu-
hood of the wavenumber of the elastically scattered light, which terium oxide rather than water display a readily meas-
had a wavenumber identical to that of the incident light
(15,802 cm–1). The intensity of the scattered light is presented as a ured amide I band.90,106 Unfortunately, it is difficult to
function of the difference between the wavenumber of the meas- exchange the protons with deuteriums on the amido
ured light and the wavenumber of the incident and elastically scat- nitrogens of the peptide bonds through the entire pro-
tered light. As in the direct infrared spectrum (Figure 12–6), the tein,107 because those in the interior are inaccessible to
amide I absorption is the most obvious, but other absorptions that solvent. As a result of this incomplete exchange and the
can be assigned to various vibrational modes of the side chains of
the amino acids, as well as the partially obscured amide III band, fact that the amide I bands of deuterated peptide bonds
are clearly seen in the spectrum. Reprinted with permission from are shifted by at least 5 cm–1 relative to those of
ref 95. Copyright 1970 Academic Press. undeuterated peptide bonds,108 the resulting spectrum
Absorption and Emission of Light 597
shifted while their amplitudes remained the same, the coincide with absorption maxima in the absorption spec-
plane of polarization of the emerging light would rotate trum of the same protein. In uncomplicated situations,
by an angle a, but the electric vector would still trace in the circular dichroic spectrum simply registers the opti-
cross section a flat, linear segment (Figure 12–8B). The cal activity of each chiral contributor to the absorption
first effect is circular dichroism; the second effect, opti- spectrum. For example, most of the peaks in the absorp-
cal rotation. Both effects are required to occur simulta- tion spectrum of cytochrome c1 have only one corre-
neously in any circumstance, and as polarized light is sponding negative or positive peak at the same
rotated, it necessarily becomes elliptical and vice versa. wavelength in its circular dichroic spectrum (Figure
This obligatory connection permits the spectrum of opti- 12–9).113 The peak of absorption from the pyridoxal phos-
cal rotation as a function of wavelength to be calculated phate in glycine hydroxymethyltransferase at 422 nm cor-
from the spectrum of circular dichroism as a function of responds to a prominent peak of positive polarization at
wavelength and vice versa.112 the same wavelength and of the same width in the circu-
The degree to which the emerging light has become lar dichroic spectrum, and corresponding peaks in the
elliptical can be measured, and it is expressed as an angle two corresponding spectra shift to 343 nm upon the addi-
tion of the substrate serine to the solution.114 Because
OB ¢ adjacent bands of absorption often have different polar-
q = tan –1 (12–33)
OA ¢ ities, the circular dichroic spectrum can often reveal
details in the absorption spectrum. For example, the two
where the ratio OB¢/OA¢ is the ratio of the minor and overlapping peaks at 480 and 530 nm in the absorption
major axes of the ellipse (Figure 12–8B). The molar ellip- spectrum of cytochrome-c oxidase from Thermus ther-
ticity at a given wavelength l, [q]l, is defined by the rela- mophilus correspond to peaks at the same wavelengths
tionship of positive polarization and negative polarization, respec-
tively.115 If excitonic coupling between two or more chro-
q mophores is occurring, however, the resulting bands in
[q ]l ? (12–34) the circular dichroic spectrum will each be split into two
l [chromophore]
or more components of both positive and negative ampli-
tude, and this splitting complicates the situation.116
where [chromophore] is the molar concentration of the A polypeptide folded entirely as an a helix has a cir-
functional group absorbing the light, referred to as the cular dichroic spectrum that is distinct from that of a
chromophore, and l is the path length of the sample polypeptide folded entirely in b structure. Both of these
chamber. By convention, the units of [q]l are chosen to spectra are distinct from that of a polypeptide unfolded
be degrees centimeter2 (decimole of chromophore)–1. A as a random coil (Figure 12–10).117 In the circular
circular dichroic spectrum is a display of the amplitude dichroic spectrum of a polypeptide folded as an a helix,
of [q]l as a function of wavelength. the amido p ∞ Æ p * transition at about 200 nm is split into
The optical rotation a (Figure 12–8B) produced by a positive component (lmax = 191 nm) and a negative
the sample can be registered with a spectropolarimeter. component (lmax = 205 nm). This splitting arises from the
It can also be normalized by the concentration of the fact that each amide is held in the same orientation rela-
chromophore responsible for it to produce the specific tive to the axis of the a helix.118 There is also the addi-
rotation [a]l. A spectrum of the optical rotatory disper- tional band of negative ellipticity at 225 nm from the
sion is simply the amplitude of [a]l plotted as a function n Æ p * transition of the peptide bond, which, in combi-
of the wavelength of the polarized light. Because optical nation with the band of negative ellipticity at 205 nm,
rotation arises from a shift in the relative phases of the gives the circular dichroic spectrum of the a helix its
two circularly polarized components (Figure 12–8B), it is characteristic double minimum. The p ∞ Æ p * transition
proportional to the derivative with respect to wavelength from a polypeptide in either b structure or random coil
of the electronic absorption from which it arises. This has is unsplit.
the practical disadvantages of both turning each peak of The tyrosines, phenylalanines, and tryptophans in
absorption into two peaks, a positive one and a negative a polypeptide absorb light of wavelength between 180
one distributed around the wavelength of maximum and 240 nm and have characteristic circular dichroic
absorption, and broadening the signal. In a spectrum of spectra.117 The contributions of the tyrosines, phenylala-
optical rotatory dispersion arising from several maxima nines, and tryptophans in a protein to its circular
of absorption, the individual components are difficult to dichroic spectrum can be numerically subtracted to
resolve. reveal the circular dichroic spectrum of the amides of
A circular dichroic spectrum, however, is usually polypeptide backbone alone. Because the polypeptide is
simpler to interpret. Unless excitonic coupling between the main contributor to the circular dichroic spectrum
two apposed chromophores of similar wavelengths of between wavelengths of 180 and 240 nm, the unit in
absorption is occurring, the individual bands in a circu- which the molar ellipticity is usually presented is deci-
lar dichroic spectrum of a protein are unsplit peaks that molarity of peptide bonds (Figure 12–10).
Absorption and Emission of Light 599
1.0 A
6
2
0.6 Oxidized
0
Reduced
0.4
–2
[q ]
0.2
–4
0 –6
10 B 160 180 200 220 240 260
(deg cm2 dmol–1 x 10–4)
Wavelength (nm)
8 Figure 12–10: Circular dichroic spectra that are used as reference
spectra for a helix (dotted line), b structure (dashed line), and
6 random meander (solid line).117 Molar ellipticity, [q ] ¥ 10–4 [degree
centimeter2 (decimole of peptide bond)–1], is presented as a func-
4 tion of wavelength (nanometers). Myoglobin from Physeter
catodon, dissolved in 0.1 M NaF at pH 7, is a protein that is almost
2 entirely a-helical (Figure 4–18). It was used as a reference com-
pound for a helix (dotted line). Poly(Lys-Leu-Lys-Leu) in 0.5 M NaF
at pH 7 was used as a polypeptide that is purely b structure (dashed
0 line). Poly(Pro-Lys-Leu-Lys-Leu) in a salt-free solution is com-
[q ]
[q ]
wavelength of the appropriate reference spectrum for the 1
respective secondary structure multiplied by the fraction
for that particular secondary structure, and that the ref-
0
erence spectrum for random meander is that of random
coil. The least-squares procedure gives the respective
values for the fractions of the four types of secondary –1
structure that produce a calculated curve most closely
reproducing the experimental curve. The fractions for
each type of secondary structure estimated in this way –2
160 180 200 220 240 260
for a set of proteins agreed quite closely with the fractions
Wavelength (nm)
for each type of secondary structure in the respective
crystallographic molecular models of these proteins. Figure 12–11: Circular dichroic spectra117 of (A) glyceraldehyde-
One of the more important and informative uses of 3-phosphate dehydrogenase (phosphorylating) in 0.1 M NaF,
circular dichroism is to provide evidence that the struc- pH 7, and (B) subtilisin in 0.2 M NaF, pH 7. Molar ellipticities,
ture of the protein has changed under particular circum- [q ] ¥ 10–4 [degree centimeter2 (decimole of peptide bond)–1], are
presented as a function of wavelength (nanometers). The spectra
stances. A conformational change is a change in the were either directly measured (solid lines) or duplicated (dotted
structure of the protein between two states of similar sta- lines) by adding together spectra for a helix, b structure, b turn,
bility. For example, the conformational change of aspar- and random meander (Figure 12–10). In the procedure used to
tate carbamoyltransferase that occurs upon the binding duplicate the experimental spectrum, it was assumed that the pro-
of its substrates and that is detected both crystallograph- teins contain only a helix, b structure, b turn, and random mean-
der. If fa, fb , fT, and fRM are the fractions of each of these secondary
ically and as a change in sedimentation coefficient is also structures, it is assumed that the sum of these four numbers is 1
accompanied by significant changes in the circular and that fa(q ∞a) + fb(q ∞b) + fT(q ∞T) + fRM(q ∞RM) is equal to the measured
dichroic spectrum of the protein.120 Such changes in cir- value of q at every wavelength, where the q ∞ values are the molar
cular dichroic spectra coincident with a conformational ellipticities of the standard curves (Figure 12–10) at the same wave-
change of a protein are commonly encountered. This fact length. A least-squares method was used to obtain the best values
for fa, fb , fT, and fRM, and these four values were then used to con-
increases the concern over the accuracy of secondary struct the calculated curves presented in the panels. Note that fa, fb ,
structural dissections by numerical analysis of circular fT, and fRM are parameters determined only by the structure of the
dichroic spectra because crystallographic descriptions of protein and must have the same values for all wavelengths. For the
conformational changes of proteins rarely involve signif- spectrum of glyceraldehyde-3-phosphate dehydrogenase (phos-
icant changes in the content of a helix, b structure, phorylating), the best values of fa, fb , fT, and fRM were 0.31, 0.30, 0.22,
and 0.17; for the spectrum of subtilisin, 0.30, 0.21, 0.21, and 0.28.
b turns, or random meander or changes in their disposi- Reprinted with permission from ref 117. Copyright 1980 Academic
tion over the sequence of the folded polypeptide. The Press.
changes in the circular dichroic spectrum of
Na+/K+-transporting ATPase during a conformational Ca2+-transporting ATPase are examined,403–408 the con-
change caused by binding of its substrates are consistent tent of a helix does change in the correct direction, but
with the transformation of 7% of its amino acids from only by 2–3%. This rather small change in the amount of
a helix into b structure.121 When crystallographic molec- a helix is more consistent with the absence of a measur-
ular models of the two conformations between which able shift in the amide I absorption in the infrared spec-
the homologous conformational change occurs in trum under the same circumstances.90
Absorption and Emission of Light 601
As noted previously, most short peptides are struc- sion of fluorescence from indole itself varies between
tureless in water. The formation of a helices by those these limits systematically as a function of the polarity of
peptides synthesized to promote this secondary struc- the solvent; the more polar the solvent, the longer the
ture is routinely monitored by circular dichroism. It is wavelength of the emission, and the tryptophans in a
also possible, by difference circular dichroic spec- protein that are the more buried display shorter wave-
troscopy, to follow the assumption of a fixed structure by lengths of maximum emission.129 Consequently, the
an otherwise structureless peptide when it binds to a wavelength of its emission is used as a measure of the
protein.122 degree to which a tryptophan is buried within a protein.
Ultraviolet absorption spectra of proteins at wave- In the case of the absorption spectrum of tryptophan as
lengths greater than 240 nm are dominated by the opposed to its emission spectrum, the situation is
absorption of phenylalanine (lmax = 258 nm; e258 = 197), reversed. The most buried tryptophans, in the most non-
cystine (e260 = 280), tyrosine (lmax = 275 nm; e275 = 1420), polar environments, have been found to absorb light of
and tryptophan (lmax = 280 nm; e280 = 5600).123,124 the longest wavelength, on the red edge of the absorption
Tryptophan has the largest extinction coefficient and band for tryptophan in the ultraviolet.130
longest wavelength of maximum absorbance. Because of If a protein contains no posttranslationally or
the strong absorption of tryptophan, the spectra of most experimentally added chromophore, the emission of flu-
proteins between 260 and 310 nm have the same shape orescence from the protein will be dominated by that of
as the spectrum of tryptophan alone with its maximum its tryptophans. By the systematic removal of its trypto-
at 280 nm and its pronounced shoulder at 289 nm. phans through site-directed mutation, the contribution
Proteins with little or no tryptophan, however, have of each of them to the total emission of fluorescence
maxima of absorption shifted toward or coincident with from the protein can be ascertained.131,132 A tryptophan
the 275 nm maximum characteristic of tyrosine. The can also be inserted into a particular location in a protein
absorption of a protein at its particular maximum, some- by site-directed mutation to monitor local changes in
where between 275 and 280 nm, when properly cor- conformation.133
rected for the absorption due to the scattering of light by The fluorescence from each of the tryptophans in a
the solution, can be used as a rapid measurement of its protein displays a characteristic wavelength of maxi-
concentration. Proteins that are posttranslationally mum emission and a characteristic intensity.132 When a
modified with chromophores such as flavin, pyridoxal protein is unfolded, its emission of fluorescence usually
phosphate, or heme or bind noncovalently chro- shifts to longer wavelengths as its buried tryptophans
mophores such as flavin, heme, metallic cations, coen- become exposed,131,134 but the intensity of the fluores-
zyme b12, chlorophyll, pheophytin, or carotenoid, display cence can either increase131 or decrease.134 These
absorption spectra that are characteristic of those chro- changes can be followed for individual tryptophans in
mophores (Figure 12–9). If one or more of the accessible appropriate mutants.131 Because all of the tryptophans in
tyrosines on the protein have been nitrated, their absorp- a protein are fully exposed to solvent upon unfolding,
tion spectra are shifted into the visible range (lmax = this observation states that the enclosing of a tryptophan
430 nm) and their acid dissociation constants are by the native structure can either quench or enhance its
increased (pKa = 6.5), so that at neutral pH they are pres- fluorescence. Consequently, it is the particularity of the
ent mainly as the nitrophenolate, which absorbs local environment around each tryptophan in the native
strongly.125 protein that governs the intensity of its emission. For
Either tryptophan or nitrotyrosine can be used as a example, Tryptophan 94 in folded, native ribonuclease
spectral reporter group, the spectrum of which registers from Bacillus amyloliquifaciens has very little emission
its environment or can monitor a conformational change of fluorescence because one of its immediate neighbors
of the protein.126 For example, the absorption spectrum is Histidine 18, which strongly quenches it.135 The lowest
of nitrated Tyrosine 115 in micrococcal nuclease indi- intensity of emission (by a factor of greater than 3-fold)
cates that it is in a nonpolar environment in the absence from the three tryptophans in lysozyme from T4 bacte-
of substrate but a polar environment in the presence of riophage is that of Tryptophan 158, which is surrounded
substrates.125 This change in environment is also reflected by a cystine and two methionines, the sulfurs of which
in its accessibility to nitration by tetranitromethane. The are also efficient quenchers.136 The amides of glutamine
conformational change of aspartate carbamoyltrans- and asparagine are also efficient quenchers. This sensi-
ferase that occurs on the binding of substrates can be tivity to the particularity of the surroundings explains
detected by an upfield shift in the wavelength of the why tryptophans with the shortest wavelengths of maxi-
absorption of tryptophans in the protein127 or of nitrated mum emission are not always the ones with the lowest
tyrosine side chains in its regulatory b subunits.128 intensity of emission.132
In addition to absorbing ultraviolet light, trypto- If intersystem crossing is not significant, the excited
phan is also both fluorescent and phosphorescent. The state of a fluorescent functional group such as trypto-
wavelength of the maximum emission of fluorescence phan can decay to the ground state by at least four sepa-
from tryptophan varies from 300 to 350 nm.129 The emis- rate pathways:137
602 Physical Measurements of Structure
F0 / FQ
these buried tryptophans, however, has a sufficiently
long lifetime (t0 @ 1 s) that it can be quenched. The 2
bimolecular rate constants kQ for the quenching of the
phosphorescence of the tryptophans in the buried class
are relatively small (<0.001 M–1 ns–1), and the quenching
registered by these rate constants seems to result from
extensive and momentary unfoldings of the folded
polypeptide that occasionally provide access to the inte-
rior, but only for a short time.130 These observations pro-
vide support for the concept that most parts of a protein
are conformationally active and continuously expand
and contract.
The intermediate, second class of tryptophans, and
probably the most numerous, are those that are partially 1
0 0.04 0.080 0.12
buried and have intermediate rate constants for
quenching (0.01–2 M–1 ns–1). Examples are Tryptophan [O2] (M)
59 in ribonuclease T1 from Aspergillus oryzae Figure 12–12: Collisional quenching of the fluorescence of trypto-
(kQ = 0.3 M–1 ns–1),138 Tryptophan 126 from lysozyme of phans in several proteins by oxygen.129 Solutions of the various pro-
T4 bacteriophage (kQ = 0.3 M–1 ns–1),136 and Tryptophan teins were placed in cuvettes in a fluorometer and excited with light
333 of phosphoglycerate kinase from Saccharomyces of wavelength 280 nm. Fluorescence at 90 ∞ to the exciting beam
was monitored at the wavelength of maximum emission for each
cerevisiae (kQ = 0.8 M–1 ns–1).139
protein (325–350 nm). The high concentrations of oxygen (molar)
Oxygen provides an interesting exception to the were produced by enclosing the cuvette in a chamber that could be
behavior of most quenchers. The difference in its ability pressurized to 105 kg cm–2 O2 gas and allowing the gas to equili-
to quench accessible and buried tryptophans is much brate with the solution at various pressures. The proteins used were
less than that observed with larger more polar quenchers bovine a-chymotrypsin (ˆ), rabbit fructose-bisphosphate aldo-
lase (Í), bovine immunoglobulin G (2), and bovine serum albu-
(Figure 12–12).129,140 On the basis of this observation, it
min (䉭). A solution of tryptophan (3) was used as an example of a
has been proposed that oxygen is small enough to insin- fully exposed side chain. Lines were drawn on the basis of the
uate its way through a molecule of protein in liquidlike expectation that F0FQ–1 as a function of [quencher] would be linear
diffusion among the tightly packed amino acids. (Equation 12–41). Reprinted with permission from ref 129.
Changes in the accessibility of tryptophans to polar Copyright 1973 American Chemical Society.
quenchers dissolved in the aqueous phase have been
used to monitor conformational changes in the structure
of a protein. In the case of succinate–CoA ligase (ADP- membrane, tryptophans were inserted at various posi-
forming) from E. coli, the binding of ATP to the a subunit tions in its amino acid sequence by site-directed muta-
of the enzyme causes significant decreases in the acces- tion, and changes in the rate constants of quenching for
sibility of the tryptophans in the b subunit to acrylamide these tryptophans were measured before and after inser-
dissolved in the solution.141 This suggests that a confor- tion.143
mational change propagated throughout the whole pro- It is also possible for the energy of the relaxed
tein occurs upon the binding of ATP. The implication excited state in excess over the energy of the ground
that both the a and b subunits change their structure in state, which otherwise would be emitted as fluorescence,
concert when ATP binds is consistent with the observa- to be transferred intact through space by resonance to
tion that they are intimately associated in the oligomeric another chromophore in a radiationless process
structure of the protein (Figure 8–22).142 To follow the (Equation 12–38). This fluorescence resonance energy
conformational change that occurs when the carboxy- transfer (FRET) discharges the electronically excited
terminal portion of colicin E1 from E. coli inserts into a state of the functional group that originally absorbed the
604 Physical Measurements of Structure
Fluorescence
14
energy, the acceptor. Because this transfer of energy 2
H O
H
H N - H N HN I
N I O3S N I
O O
12-3
12-1 12-2
-
COO
-
SO3 OH
- -
COO OOC (
12-4 N
SO2 H 12-6
O2S H H
N N 12-5
- - N S S NH
OOC COO H
S S N NH
H (
S S H2N N NH2
H 12-7
Figure 12–14: Fluorescent, electrophilic reagents used to modify, covalently or noncovalently, three sites on rhodopsin.145
N-[(Iodoacetamido)ethyl]-1-aminonaphthalene-5-sulfonate anion 12–1 (labs = 350 nm; lemit = 495 nm), N-[(iodoacetamido)ethyl]-1-amino-
naphthalene-8-sulfonate anion 12–2 (labs = 350 nm; lemit = 495 nm), and 5-(iodoacetamido)salicylate anion 12–3 (labs = 323 nm;
lemit = 405 nm) were used to modify a particular cysteine in the protein by alkylation. N,N¢-Bis[1-(dimethylamino)naphthalene-5-sulfonato]-
L-cystine 12–4 (labs = 350 nm; lemit = 520 nm) and N,N¢-bis[fluoresceinyl(isothiocarbamido)]cystamine 12–5 (labs = 495 nm; lemit = 518 nm)
were used to modify a different cysteine in the protein by disulfide exchange. 9-Hydrazinoacridine 12–6 (labs = 440 nm; lemit = 470 nm) and
proflavin 12–7 (labs = 470 nm; lemit = 512 nm) were used as ligands for a particular site on the protein with a high affinity for aromatic cations.
All wavelengths (l) are wavelengths of maximum absorption or maximum emission. In each instance the fluorescent functional group selec-
tively attached to the protein was used as a donor of resonant energy to 11-cis-retinal, a natural, covalent posttranslational modification
(Table 3–1) of the protein that absorbs maximally at 500 nm.
J =
∫ I (l) e (l) l d l
4 absorb at convenient wavelengths.149 Tryptophans are
often used as donors of resonant energy. Those found
(12–51)
∫ I (l) d l naturally in the protein can be used one by one as unique
donors by preparing the respective site-directed
mutants, each of which retains only one of them.150
where l is the wavelength.* This integral quantifies the Nitrotyrosine151 or kynurenine,152 a covalent modifica-
match between the energies of the donor and acceptor tion of tryptophan, can be used as acceptors of resonant
required for the resonance. The integral J is calculated energy from a tryptophan.
numerically from the absorption spectrum of the accep- Often cysteines in the protein are used as conven-
tor and the emission spectrum of the donor (Figure ient nucleophiles to be covalently modified by fluores-
12–13), which should be gathered from donor and accep- cent electrophilic reagents (Figure 12–14). Sometimes,
tor when they are in solution attached individually to the one of the cysteines in a protein, because of its peculiar
protein. reactivity, can be selectively modified with one fluores-
The orientation factor K 2 (dimensionless) is cent reagent and another cysteine can then be modified
defined by with another.153,154 Cysteines can be placed at specific
positions in a protein by site-directed mutation and then
(
K 2 ? cos q T – 3 cos q D cos q A ) (12–52)
selectively modified with appropriate fluorescent
reagents.155 A fluorescent reagent can be attached to a
specific glutamine on the surface of a protein by use of
where q T is the angle between the transition dipoles of the enzyme protein-glutamine g-glutamyltransferase,
donor and acceptor and qD and qA are the angles between which exchanges the ammonia of the glutamine with a
the transition dipoles of the donor and acceptor, respec- primary amine on the reagent.24,156 It is also possible to
tively, and the vector between the centers of those use synthetically produced fluorescent a amino acids
dipoles.148 The transfer of energy is between these transi- and in vitro systems for incorporating unnatural amino
tion dipoles of the donor and acceptor, and this factor acids at specific positions in its amino acid sequence to
quantifies the match between the orientations of the produce a protein with a fluorescent functional group
donor and acceptor required for the resonance. If the located at a single, designated point in its native struc-
orientations of the transition dipoles are not fixed ture.157,158
The bilayer of phospholipid in which membrane-
bound proteins are embedded also provides a location in
* In the equation presented by Latt et al.,146 there is a misleading which to locate a fluorescent donor or acceptor. The
factor of 1000 that serves to correct liters mole–1 to centimeters3 bilayer can be turned into a sheet of fluorescent donors
mole–1, a correction that would automatically be made during the or fluorescent acceptors by dissolving hydrophobic fluo-
cancellation of units. This is an excellent example of the absolute
necessity of including units and making sure that they cancel rophores159,160 in the liquid hydrocarbon at its center or
properly whenever any calculation is performed in the physical by covalently attaching fluorophores to the phospho-
sciences. lipids from which it is formed.161 Because the bilayer
Absorption and Emission of Light 607
forms a sheet of hydrocarbon and because the molecules accurate than simply using ¤ for the value of K2. For
of a particular protein all float at the same depth within example, predicted distances between four pairs of
this sheet of hydrocarbon, the molecules of donor or donors and acceptors positioned on specific amino acids
acceptor dissolved within it end up in a fixed location rel- in phosphoglycerate kinase from yeast, a protein for
ative to the rest of the protein. A matched acceptor or which a crystallographic molecular model is available,
donor, respectively, can then be attached covalently to a were no more reliable when K 2 was estimated from
specific location on the protein, and transfer of energy depolarizations than when ¤ was used for K 2.155
between the molecules of donor or acceptor within the If both donor and acceptor are rigidly bound by the
bilayer and the acceptor or donor on the protein can be protein at a fixed orientation relative to each other, then
monitored. neither will have any orientational freedom and no limits
The orientation factor K 2 is the most uncertain can be placed on K 2 other than from 0 to 4.163 For exam-
parameter in Equation 12–50.145 In any given situation, K 2 ple, in the crystallographic molecular model of deoxyri-
has a specific numerical value but its value cannot be bodipyrimidine photo-lyase from Anacystis nidulans, the
measured directly. If both donor and acceptor were free angle between the transition dipoles of the flavin ade-
enough to assume all possible relative orientations with nine dinucleotide and the 8-hydroxy-5-deazaflavin
equal probability, K 2 would be ¤.145 If one of the two were bound to the protein is 36 ∞. From this angle and the
fixed and the other could assume all possible relative ori- angles of the dipoles to the vector between the two chro-
entations, K 2 would have a value between 1§3 and ›.145 If mophores (Equation 12–52), a value of K 2 of 1.6 could be
both donor and acceptor, however, are fixed in their rela- calculated.164 For the same protein from E. coli, however,
tive orientations, for example, both rigidly bound to a mol- the angle between the transition dipoles of the flavin
ecule of protein, K 2 can have a value anywhere between 0 adenine dinucleotide and the methylene tetrahydrofolic
and 4.0. Because R0, and hence r, depends for its value on acid, which takes the place of the 8-hydroxy-
(K 2)1§6, the uncertainty of K 2 affects the value of r by more 5-deazaflavin in the latter crystallographic molecular
than ±12% only when it is greater than › but far more model, is almost 90 ∞, causing K 2 to be almost 0. Even
dramatically when it is less than 1§3. though the distances between the chromophores in
The more freedom the donor, the acceptor, or both these two proteins are the same (1.7 nm), the efficiency
of them have to assume different orientations by rotation of the transfer of energy by resonance for the protein
around unhindered bonds between them and the rigid from A. nidulans is 97% while that from E. coli is 62%.
portion of the protein, the closer the value of K 2 comes to When K 2 of ¤ was used to calculate the distance between
¤. An estimate of the orientational freedom of donor or the flavin adenine dinucleotide and the tetrahydrofolic
acceptor can be made from the rate and extent of the acid in the protein from E. coli, in the absence of a crys-
depolarization of its fluorescent emission. If either the tallographic molecular model, the value obtained was
donor or acceptor is excited with linearly polarized light, 2.2 nm instead of the actual distance of 1.7 nm.165
the light emitted as fluorescence immediately, that is, The efficiency of the transfer of energy by reso-
before the chromophore has had time to reorient, will nance between Tyrosine 14 and Tyrosine 55 in steroid
also be polarized. The polarity of the emitted light, how- D-isomerase from Pseudomonas testosteroni is less than
ever, will decay over the lifetime of the excited state as it 25% even though these tyrosines are only 0.6 nm apart in
reorients. The rate of this decay and the final residual the crystallographic molecular model. Consequently, K 2
polarity of its fluorescence provide an estimate of the ori- must be less than 0.003, a fact from which it was con-
entational freedom of the donor or acceptor. cluded that these two tyrosines were held rigidly by the
From these estimates of orientational freedom, a protein in a relative orientation incompatible with effi-
distribution of the probability for particular values of K 2 cient transfer even over such a small distance.166 Had a
can be calculated.162 For example, from the depolariza- value of ¤ been used for K 2 in the calculation, the dis-
tion of the fluorescence from the 5-(N,N-dimethyl- tance between these two tyrosines would have been esti-
amino)naphthlenesulfonyl group attached to rhodopsin mated to be greater than 1.5 nm.
as a donor to the retinal rigidly fixed in the center of the The original enthusiasm for measurements of the
protein, the value for K 2 could be estimated to fall transfer of energy by resonance was the potential it
between 0.08 and 1.8 with a confidence of 90%,156 and offered for measuring the distance between two loca-
from the depolarization of the fluorescence from the tions in a protein the crystallographic molecular model
pyrene group attached covalently as a donor to the active for which is not available.145 For example, if the donor
site of acetylcholinesterase and the depolarization of the were attached by the unique stoichiometric covalent
fluorescence from the propidium bound as an acceptor modification of a particular amino acid in the sequence
at another site on the protein, the value for K 2 could be of the protein and the acceptor were specifically attached
estimated to fall between 0.25 and 2.2.148 For donor or by the unique modification of another amino acid in the
acceptor or both to display depolarization, however, they sequence, it would be possible to estimate the distance
must be reorienting fairly freely anyway, and such esti- between the donor and acceptor in the folded polypep-
mates of ranges for K 2 may not be significantly more tide and hence the distance between the two modified
608 Physical Measurements of Structure
amino acids. In support of this intention, Equations from Tryptophan 86 and Tryptophan 152, respectively,
12–48 and 12–50 have been shown to be consistent with were both 1.5 nm when K 2 was set at ¤, and the dis-
the observed transfers of resonant energy between a tances in the crystallographic molecular model are 1.72
donor and an acceptor at the two ends of short synthetic and 1.66 nm, respectively.150 The distances estimated
peptides of proline.144 The distance between donor and from the transfer of energy between the amino terminus
acceptor was varied by varying the number of prolines in and Lysines 15, 26, 41, and 46 in bovine pancreatic
the peptides to demonstrate that the dependence of effi- trypsin inhibitor were 3.4, 2.2, 2.1, and 2.3 nm, respec-
ciency upon distance was as the sixth power. If K 2 was tively, and the distances in the crystallographic molecu-
assumed to be ¤, the calculated distances between lar model are 3.17, 1.68, 1.80, and 2.17 nm,
donor and acceptor agreed fairly well (within 25%) with respectively.176 The distance between Tyrosine 99 and
the distances measured from molecular models of these Tyrosine 138 in the complex between calmodulin and
modified peptides. Many estimates of distances between four calcium ions was estimated from the transfer of
locations in proteins have been made from measure- energy to be between 1.4 and 1.9 nm,151 and the distance
ments of the transfer of energy by resonance. in the crystallographic molecular model is 1.2 nm.177 The
One way to evaluate the reliability of such esti- distance between a cysteine substituted for
mates is to compare a distance estimated in this way Phenylalanine 239 and Cysteine 343 in cyclic-AMP
with the distance observed in a subsequently obtained dependent protein kinase was estimated from the trans-
crystallographic molecular model. The distance between fer of energy to be 4.1 nm,178 and the distance in the crys-
Cysteine 199 and Cysteine 343 in cyclic AMP-dependent tallographic molecular model is 3.7 nm.179
protein kinase was estimated to be 3.1–5.2 nm on the Aside from the uncertainty of the values of K 2, one
basis of the transfer of energy by resonance between two of the main difficulties in measuring distances by trans-
different pairs of donor and acceptor,154 but in the sub- fer of energy is that the donors and acceptors are often
sequently reported crystallographic molecular model167 attached covalently to the protein by using reagents that
the distance between the sulfurs of these two cysteines is end up placing the chromophore on a flexible tether a
only 2.12 nm. The distance between the two Cysteines significant distance from the amino acid to which it is
283 in dimeric creatine kinase from rabbit muscle was attached. For example, the reagents used to modify
estimated to be 4.8–6.0 nm from measurements of the rhodopsin (Figure 12–14) place the centers of the chro-
efficiency of transfer for five different pairs of donor and mophores 0.4–1.2 nm away from the electrophilic
acceptor,168 but in the subsequently reported crystallo- carbon or sulfur that is directly attached to the nucle-
graphic molecular model of the protein,169 these cys- ophilic amino acid that has been modified. The fluores-
teines are only 3.33 nm apart. The distance between cent (5-sulfonaphthalen-1-yl) amino group and the
Lysine 84 on one of the subunits in an a3 catalytic trimer fluorescent (7-nitrobenz-2-oxa-1,3-diazol-4-yl) amino
and the closest Lysine 84 on a subunit in the other a3 cat- group that were used as donor and acceptor, respec-
alytic trimer in aspartate carbamoyltransferase (Figure tively, in estimates of distances between locations in the
9–37) was estimated to be 3.3 nm on the basis of transfer complex between DNA, deoxymononucleotide, and the
of energy by resonance between a pyridoxamine phos- Klenow fragment of DNA-directed DNA polymerase
phate and a pyridoxal phosphate attached to the respec- from E. coli, were attached to various atoms in the com-
tive side chains.170,171 This estimate conveniently splits plex by tethers that were each about 1.2 nm in length.180
the difference between the distances of 2.1 and 3.8 nm If both donor and acceptor are attached through such
observed in subsequent crystallographic molecular long tethers, the distance between them can be signifi-
models of the two respective conformations of the pro- cantly different from the actual distance between the two
tein.172–174 The distance between the binding site for amino acids to which they are attached. One approach to
acetylcholine on acetylcholine receptor and the bilayer adjusting the estimates of distance for these added
of phospholipids of the membrane in which it is located lengths is to assume that the fluorescent functional
was estimated to be 3.0–4.0 nm from measurements of group and its tether extend unrestrained outwards from
the transfer of energy by resonance between a donor the surface of the protein and correct for this extra dis-
covalently attached to choline and two different accep- tance geometrically.181 The difficulty with this approach
tors dissolved in the hydrocarbon of the bilayer,160 and is that the fluorescent functional group may adsorb to
the distance to the closest surface of the bilayer esti- the surface of the protein or insert into a crevice on the
mated crystallographically is 3.0 nm.175 surface.153
A more suspect evaluation of the reliability of dis- Many of the failures of the measurements of distance
tances estimated from the transfer of energy by reso- to agree very closely with subsequent or even prior crys-
nance are comparisons of them with those observed in a tallographic determinations may result from technical
crystallographic molecular model available at the time shortcomings. Too frequently the necessary parameters
the measurements were made. The distances estimated such as quantum yield and spectral overlap are not meas-
from the transfer of energy to chloramphenicol bound at ured directly but are based on prior published values.
the active site of chloramphenicol O-acetyltransferase Measurements in the ultraviolet are often compromised
Absorption and Emission of Light 609
by contaminants in the solutions. It has already been men- the distance between them. In the case of the sliding
tioned that steady-state measurements of fluorescence are clamp of bacteriophage T4 DNA-directed DNA poly-
much less accurate than direct measurements of lifetimes merase, however, it is thought that the ring of subunits
of the fluorescence. Nevertheless, because of the uncer- composing the protein must split open so that a mole-
tainties concerning the orientations of the dipoles of donor cule of DNA can enter the hole in its center, and changes
and acceptor, the degree of their orientational freedom, in the efficiency of the transfer of energy by resonance
and the relationship of the distance between them and the between donors and acceptors on different subunits
distance between their points of attachment, because of equivalent to changes in distances of up to 1.5 nm are
the modest success of such estimates, and because crys- thought to reflect real changes in distance upon the
tallographic molecular models have become far more opening of the ring and its subsequent intimate embrace
common, measurements of the transfer of energy by res- of the DNA.188
onance are used infrequently to estimate distances. They The transfer of energy by resonance is also used to
are, however, still widely used for other purposes, because monitor the association between a molecule of protein
they have the advantage of providing information about a modified with a donor and another molecule of protein
protein while it is in solution. modified with an acceptor. The catalytic a subunit of
The transfer of energy by resonance is used to detect cyclic AMP-dependent protein kinase has been modified
conformational changes in a protein. For example, the effi- with a tethered fluorescein and the regulatory b subunit
ciency of the transfer of energy by resonance between with a tethered rhodamine. During the heterologous
Tryptophan 133 and a (5-sulfonaphthalen-1-yl)amino association of the two subunits, the fluorescence of the
group attached through a tether to Cysteine 93 in dolichyl- fluorescein decreases by about 30% as the rhodamine, an
phosphate b-D-mannosyltransferase increased from 42% acceptor of the energy of its excited state, is brought into
to 66% upon the binding of the substrate dolichyl phos- its vicinity.189 Such assays based on the decrease in the
phate.182 From this observation, it was concluded that a fluorescence of a donor produced by an acceptor upon
conformational change occurs in the protein upon the formation of a complex have been used to follow the asso-
binding of the substrate, and from the magnitude of the ciation of cytochrome c and cytochrome-c oxidase190 and
change in the efficiency of the transfer of energy, it was esti- the association of myosin and actin.191
mated that the distance between Tryptophan 133 and Changes in the efficiency of the transfer of energy
Cysteine 93 decreased about 0.3 nm during this confor- by resonance resulting from changes in the molar con-
mational change. From similar observations, conforma- centrations of the participants can also be used to meas-
tional changes producing shifts in the apparent positions ure the dissociation constant for a complex between two
of donors and their acceptors of 0.3, 0.3, 0.5, and 1.2 nm proteins or the complex between a protein and a nucleic
have been observed for the binding of DNA to transcrip- acid. The dissociation constant between cytochrome c
tion factor AP-1,183 the exchange of Na+ for K+ in the active and cytochrome-c oxidase190 and the dissociation con-
site of Na+/K+-exchanging ATPase,184 the exchange of Ca2+ stant between Rho-GDP dissociation inhibitor and GTP-
cations for Mg2+ cations in troponin,185 and the binding of binding protein Cdc42192 have been determined by
a bisubstrate analogue to adenylate kinase, respectively.186 monitoring changes in the efficiency of the transfer of
For all of the same reasons, associating a change in energy, as has the equimolar stoichiometry of the com-
the distance between two locations on a protein with plex between the sliding clamp and the clamp holder of
the change in the transfer of energy by resonance is just DNA-directed DNA polymerase from bacterio-
as uncertain as estimating a distance between them. phage T4.193 Both the dissociation constant and the
Upon the binding of the b(1Æ4) trimer of N-acetylglu- kinetics of the association between DNA and transcrip-
cosamine to lysozyme from chicken, the distance tion factor AP-1183,194 have been monitored by changes in
between a kynurenine at position 62 and Tryptophan 108 the transfer of energy by resonance.
appeared to decrease by 0.5 nm when calculated from Dissociation constants are also measured by moni-
the change in the efficiency of transfer.152 It is unlikely, toring changes in fluorescence that do not involve trans-
however, that such a large conformational change occurs fer of energy by resonance. For example, the
on the binding of the oligosaccharide because no such enhancement of the fluorescence of poly(deoxy-
differences (< 0.04 nm) in the distance between these two 1,N 6-ethenoadenylic acid), a fluorescent analogue of
amino acids is observed between the crystallographic poly(adenylic acid), upon the association of RecA protein
molecular models of unliganded lysozyme and lysozyme from E. coli has been used to determine the dissociation
to which the b(1Æ4) tetramer of N-acetylglucosamine is constant and the kinetics of the association between the
bound.187 It is possible that crystal packing constrains protein and the nucleic acid.195
both the liganded and unliganded conformations to the The transfer of energy by resonance has also been
same structure in the crystal even though their structures used to monitor changes in the structure of DNA pro-
are so different in solution. It is more likely, however, that duced by a protein. By modifying the immediately adja-
the relative orientations of the donor or the acceptor or cent 3¢ end of one strand and 5¢ end of the other strand of
both are changed upon the association of the ligand, not a molecule of double-stranded DNA with a donor and an
610 Physical Measurements of Structure
the center of rotation.201 Reprinted with permission from (I) On the basis of this frictional coefficient, if the
ref 201. Copyright 1995 American Chemical Society. viral capsid were a smooth, unhydrated sphere,
what would be its radius?
r (cm) A290 The HBe viral capsid was examined by scanning
2.0
Absorbance at 290nm
0.0
6.9 7.0 7.1 7.2
0.6
Radius (cm)
3
40000 A
l [q ]
2 –1 –4
(nm) (deg cm dmol x 10 ) 2
223 –1.72
30000 210 –1.86
198 +2.30 1
(deg cm2 dmol –1)
20000 0
–1
10000
B
0
[q ]
–10000 0
–1
–20000
(deg cm2 dmol–1 x 10–4)
(nm)
223 -2.75 -0.97 +0.31 1
210 -2.67 -0.78 -0.78
198 +3.85 +3.85 -3.96
0
(N) Assume that the folded polypeptide of the HBe
viral capsid contains only a helix, b structure, and –1
random meander, and formulate a set of three
simultaneous equations relating the three E
unknowns fa, fb, and fRM, the respective fractions 3
of each type of secondary structure in the protein.
Do not use as an equation the assumption that the
sum of these three fractions is 1. 2
(O) Solve this set of equations for fa, fb, and fRM.
1
(P) Why is it so surprising that the values of fa, fb, and
fRM add up to 1 anyway?
0
(Q) On the basis of these numerical values of fa, fb, and
fRM, why is one required to conclude that the viral
capsid of southern bean mosaic virus (Figure –1
9–27) and the viral capsid of hepatitis B virus do
not share a common ancestor even though their –2
overall structures are similar? 160 180 200 220 240 260
Wavelength (nm)
Nuclear Magnetic Resonance 613
Problem 12–11: The circular dichroic spectra of several determined only by the type of nucleus, 1H, 13C, 15N, 19F,
proteins are shown to the left.117 The solid lines are the or 31P, that it is (Table 12–2). The frequency ni at which
observed spectra; the dotted lines are theoretical fits to nucleus i absorbs in an applied external field is its
the data. Reprinted with permission from ref 117. Larmor frequency.
Copyright 1980 Academic Press. At readily accessible magnetic flux densities
(<25 T), the difference in energy between the two spin
(A) Using the letters in each panel to designate each
states of one of these nuclei is less than 0.5 J mol–1, which
protein, rank them in order from the one with the
is the energy contained in a photon of wavelength greater
most a helix and the least b structure to the one
than 3 ¥ 108 nm and frequency less than or equal to
with the most b structure and least a helix.
1000 MHz. This is in the radiofrequency range of elec-
(B) Consider the protein with the most a helix. What tromagnetic energy.
would be the maximum percentage of a helix it A solution of molecules contains discrete popula-
could have? tions of atomic nuclei in which each and every nucleus
is chemically identical. For example, if the naturally pres-
(C) Consider the protein with the least a helix. What
ent deuterium is ignored, a solution of p-xylene
would be the maximum percentage of b structure
it could have?
H H
Problem 12–12: Suppose that R0 = 1.7 nm for the trans- H H
HC CH
fer of energy by resonance between a donor and an H H
acceptor on a protein and that the efficiency of the H H
energy transfer between donor and acceptor is 0.79.
12–8
When a ligand that binds to the protein is added, the effi-
ciency of energy transfer decreases to 0.64. What is the
uniformly and completely labeled with 13carbon would
apparent change in the distance between donor and
contain one discrete population of 1hydrogen nuclei
acceptor that occurs upon binding of the ligand?
composed from the four hydrogens attached to the
phenyl ring in each molecule of p-xylene, a discrete pop-
ulation of 1hydrogen nuclei composed from the six
Nuclear Magnetic Resonance 203,204
hydrogens attached to the methyl groups in each mole-
Many atomic nuclei display rotational motion known as cule of p-xylene, a discrete population of 13carbon nuclei
nuclear spin. Because nuclear spin is quantized, its composed from the carbons of the methyl groups, a dis-
angular velocities can assume only those magnitudes crete population of 13carbon nuclei composed from the
dictated by spin quantum numbers. Among many other para carbons of the ring, and a discrete population of
13
atomic nuclei, those of 1H, 13C, 15N, 19F, and 31P have only carbon nuclei composed from the meta carbons of the
two spin quantum numbers, +" and –". These dictate ring. The nuclear spin of each nucleus in a given popula-
two specific angular velocities of the same magnitude tion of nuclei can be represented as a vector of unit
but of opposite polarity. These two angular velocities are length parallel to the axis of its spin. The net magnetiza-
the two spin states of these nuclei. As any one of these tion of a given population of nuclei is the vector sum of
nuclei is a charged particle by virtue of its protons, either its individual nuclear spins. In an applied magnetic field,
of these angular velocities creates a magnetic field of the although the individual nuclear spins show only a ten-
respective polarity aligned with the axis of the nuclear dency to align with the field, the net magnetization of
spin. When such a nucleus is placed in an external, each population of nuclei is aligned exactly with the
homogeneous magnetic field of a given polarity, its axis direction of the field. In an applied magnetic field, each
tends to align with the direction of the applied field, and of these discrete populations of nuclei in the solution has
its spin states, because they are of opposite polarity to a corresponding Larmor frequency for the nuclear mag-
each other, become different in energy. This difference netic resonance absorption associated with it.
in energy, DE, is directly proportional to the magnetic When a population of chemically identical funda-
flux density Bi (tesla) at the location of nucleus i; and as mental particles, such as electrons or atomic nuclei, is
in optical spectroscopy, the difference in energy deter- exposed to electromagnetic radiation of a wavelength
mines the frequency ni (hertz) of electromagnetic energy equivalent in energy to the difference between two
that is absorbed by nucleus i: energy levels accessible to the particles, the electromag-
netic radiation catalyzes the movement of the members
gi h B i of that population of particles between these two energy
DE = = h ni (12–53)
2p levels. The reason is that the electromagnetic radiation is
in resonance with the transition between the two energy
where h is Planck’s constant and gi is the magnetogyric levels. Photons with this energy will be absorbed during
ratio (radians tesla–1 second–1) for nucleus i, which is this process only if, at the time of irradiation, the popu-
614 Physical Measurements of Structure
lation of particles occupying the lower energy level is tical nuclei i is the normalized difference between its
greater than the population occupying the higher energy Larmor frequency ni, the frequency at which it absorbs,
level. The absorption of photons, however, necessarily and the frequency nstd at which the population of a stan-
increases the population in the higher energy level at the dard nucleus absorbs:
expense of the population in the lower energy level.
When the populations in the two resonating energy n std – n i Bstd – Bi
levels become equal to each other, absorption can no di = = (12–54)
n std Bstd
longer occur, and a state of saturation is reached.*
In the electronic transitions and vibrational transi-
tions of electrons in atomic and molecular orbitals The units used for chemical shift are parts per million
(Figure 12–5), the energy levels are sufficiently different (ppm) relative to the absorption of the standard nucleus
that almost all unexcited electrons are in the state of in a particular reference compound because the local dif-
lower energy, and relaxation back to the state of lower ferences in magnetic flux density are never greater than
energy is sufficiently rapid (>10 ns–1)86 that absorption of about 0.0002 (200 ppm) of the applied field. Chemical
a particular wavelength of light by a given population of shift cannot be expressed in absolute units of energy
chemically identical electrons rarely displays saturation. because the energy difference between a particular
In nuclear magnetic resonance, however, the energy dif- absorption and that of the standard varies with the mag-
ference between the two spin states that can be achieved nitude of the applied field (Equation 12–53). The magni-
with the available magnetic flux densities is so small tude of the chemical shift provides chemical information
(<0.5 J mol–1) that the equilibrium constant Ksp between about the disposition of the electrons in the environment
the two spin states for a population of identical nuclei at surrounding the nucleus, in other words, the molecular
normal temperatures (300 K) is very close to 1 (1 < Ksp < structure in its vicinity.
1.0002). This means that the difference in the concentra- Nuclear magnetic resonance spectroscopy meas-
tions of the nuclei in the two spin states at equilibrium ures the same phenomenon as optical spectroscopy. In
will be less than 200 ppm. The difference between the an external magnetic field every nucleus of spin " has
populations in the two energy levels set in resonance is two energy levels. Depending on the flux density of
small enough and the rate of relaxation of a nucleus in the applied magnetic field and the type of nucleus,
the level of higher energy back to the level of lower electromagnetic energy of a discrete wavelength (fre-
energy is slow enough (≥1 s–1) that saturation occurs quency) somewhere between 2 ¥ 1010 nm (15 MHz) and
readily. This causes the amplitude of the observed 3 ¥ 108 nm (1000 MHz) will be absorbed by a particular
absorption of the electromagnetic energy in nuclear population of identical nuclei in the process of exciting
magnetic resonance spectroscopy to be sensitive to the nuclei in the population of the spin state with lower
rate of relaxation of the populations of individual nuclei energy to the spin state with higher energy. In theory,
from the state of saturation to the state of equilibrium. these absorptions of energy by each discrete population
The faster the population relaxes, the more energy it can could be recorded as a function of wavelength to obtain
absorb. For this relaxation to occur, the excess energy a spectrum, as is done with an optical absorption spec-
that has been absorbed has to be dissipated. trum. Continuous wave (CW) nuclear magnetic spec-
The chemical shift of the nuclear magnetic reso- trometers approximate this ideal. They measure the
nance absorption of a population of nuclei is a measure absorption of energy of a fixed frequency as the flux den-
of the frequency of the electromagnetic energy at which sity of the applied magnetic field is varied slowly and
the absorption appears in the spectrum. The chemical continuously, and they record the variation in the inten-
shift of the absorption of a particular population of sity of the radiofrequency signal as it is absorbed by the
nuclei is determined by the chemical environment of the sample. This measurement produces a scan of absorp-
chemically identical nuclei that compose the population. tion as a function of the flux density of the magnetic field.
The electrons surrounding a given nucleus circulate in The direct proportionality between magnetic flux density
response to the applied magnetic field as a current would and frequency (Equation 12–53) permits the spectrum of
in a copper coil. This current decreases the local mag- absorption to be presented as a function of frequency.
netic flux density Bi experienced by the nucleus and Maxima of absorption appear in the spectrum at the
establishes its characteristic Larmor frequency (Equation Larmor frequencies of the different populations of nuclei
12–53). The chemical shift di of a nuclear magnetic reso- in the sample. Nuclei of the different elements differ dra-
nance absorption from a population of chemically iden- matically in their ability to absorb radiofrequency energy
(Table 12–2) and hence the intensities of their maxima.
Almost all instruments used today, however, are
* Because the absorption of electromagnetic energy is the conse- Fourier transform (FT) nuclear magnetic resonance
quence of resonance and because the resonance is so much closer
to equilibrium in nuclear magnetic resonance spectroscopy than in spectrometers. In such a spectrometer there is a radio-
optical spectroscopy, nuclear magnetic resonance spectroscopists transmitter that generates radiowaves of a set frequency,
often use the word “resonance” in place of the word “absorption”. for example 600 MHz, referred to as the carrier fre-
Nuclear Magnetic Resonance 615
spin–spin coupling constant J also depends on the angles nucleus A, there is no requirement that the two nuclei be
at which two nuclei are held with respect to each other. associated by covalent bonds as there is with spin–spin
For example, for two 1hydrogen nuclei coupled through coupling. Nuclear Overhauser effects can indicate that
two bonds, the range of values is 10–15 Hz if the bond the two nuclei involved are adjacent to each other in the
angle is close to the sp3 tetrahedral angle of 109 ∞, but the tertiary structure of a protein even though they may be
value of the coupling constant falls to 2–3 Hz at the sp2 distant from each other in its primary structure. As with
angle of 120 ∞. In three-bond coupling, the value of the the transfer of energy by resonance, however, there are
spin–spin coupling constant depends on the dihedral factors other than the distance between the nuclei asso-
angle between the two nuclei along the bond connecting ciated with the dipolar interactions producing nuclear
the two neighboring atoms to which they are attached: Overhauser effects, causing the intensity of a nuclear
Overhauser effect not to be directly proportional to the
inverse of the sixth power of this distance.205
X Because nuclear Overhauser effects are manifested
only as changes in the net spin state of the population of
A one nucleus under the influence of a change in the net
spin state of the population of the other, no change
12–9 occurs in the chemical shift of either nucleus involved in
the nuclear Overhauser coupling as there was with
The coupling constant JAX is at its maximum when the spin–spin coupling. Rather, a nuclear Overhauser effect
dihedral angle is 0 ∞ or 180 ∞ and at its minimum when is registered as a change in the intensity of the absorp-
the dihedral angle is 90 ∞ or 270 ∞. At these latter angles, tion for nucleus X. For example, if the net rate of relax-
the spin–spin coupling constant can be almost zero. The ation of the population of nucleus X is increased by the
maxima at 0 ∞ and 180 ∞ for such coupling between change that has occurred in the net spin state of the pop-
the nuclei of two 1hydrogens is about 10 Hz, and between ulation of nuclei A, then the amplitude of the absorption
the nuclei of a 1hydrogen and a 13carbon, about 8 Hz.203 for nucleus X will increase. If the occupation of the two
Two populations of nuclei, X and A, can also be cou- spin states available to the members of the population of
pled by a nuclear Overhauser effect. A nuclear nuclei X is caused to become more equal by the change
Overhauser effect of the population of nuclei A on the that has occurred in the net spin state of the population
population of nuclei X is any change in the net spin state of nuclei A, then the amplitude of the absorption for
of the population of nuclei X produced by a change in the nucleus X will decrease.
net spin state of the population of nuclei A. For example, In large, relatively rigid macromolecules such as
a nuclear Overhauser effect can be the consequence of proteins, nuclear Overhauser effects between 1hydrogen
either an alteration in the relaxation rate between the nuclei are usually a consequence of the transfer of satu-
two spin states accessible to the members of the popula- ration.206 Transfer of saturation is the transfer of a por-
tion of nuclei X or the consequence of an alteration in the tion of the saturation of one population of nuclei,
levels of occupation of the two spin states within the nuclei A, to another population of nuclei, nuclei X. For
population of nuclei X caused by a change in the net spin transfer of saturation to occur, each of the individual
state of the population of nuclei A. A change in the net nuclei X must be adjacent in space to a nucleus A.
spin state of the population of one nucleus produced by Transfer of saturation results from the summation of a
a change in the net spin state of the population of large number of individual exchanges of spin state
another nucleus results from dipolar interactions between a nucleus A and a nucleus X. The two adjacent
between the two respective nuclei. A dipolar interaction nuclei simultaneously and reciprocally exchange their
is a function of, among other things, the distance spin states in opposite directions with essentially zero
between the two nuclei, r, and its magnitude is propor- change in the total energy of the two exchanging nuclei.
tional to r –3. The change in the spin state of nucleus X by During each exchange, the spin state of that particular
nucleus A caused by a dipolar interaction is a second nucleus X becomes what was the spin state of the partic-
order perturbation, and hence it is proportional to r –6. ular nucleus A and vice versa. As a result of a large
The transfer of energy between two transition dipoles by number of such exchanges of spin state at the atomic
resonance is also a dipolar interaction and has the same level, a portion of the saturation of the population of
dependence on distance (Equation 12–48). Because of nuclei A is transferred to the population of nuclei X. The
the inverse dependence on the sixth power of the dis- driving force is the resulting increase in entropy. A
tance, nuclear Overhauser effects are significant only if sequence of such transfers of saturation among a
the nucleus X and the nucleus A in a particular molecule number of populations of adjacent nuclei can cause the
are close to each other. saturation of one population of nuclei to spread outward
Because a nuclear Overhauser effect is not trans- over populations of nearby nuclei.
mitted by changes in the static, local magnetic field of Spin diffusion is this outward spread of saturation
nucleus X brought about by a change in the spin state of from the saturated population of nuclei A. For spin diffu-
Nuclear Magnetic Resonance 617
sion to occur, the populations of the nuclei in the vicin- of hundreds of individual absorptions.208 The ranges in
ity of the nuclei A must be unsaturated so they are able to which these overlapping absorptions occur can be
assume the saturation transferred from the population of assigned to particular classes of nuclei: those of methyl
1
nuclei A. It is the saturation of only the population of hydrogens on leucines, isoleucines, valines, alanines,
nuclei A that permits the diffusive force to be observed, and threonines (d = 0.9–1.5 ppm); methylene 1hydrogens
just as the creation of a gradient of concentration permits (d = 1.5–3.5 ppm); a 1hydrogens on each amino acid
the diffusive force to be observed. (d = 3.5–5.5 ppm); the 1hydrogens on the peripheries of
Spin diffusion by transfer of saturation can be the aromatic rings of tryptophans, phenylalanines, his-
observed in an experiment analogous to the transfer of tidines, and tyrosines (d = 6.4–7.4 ppm); and the unex-
energy by resonance. The population of nuclei A is irra- changed amido 1hydrogens of the peptide bonds and
diated at the radio frequency with which its spins res- glutamines and asparagines (d = 7.0–9.0 ppm).
onate and with sufficient amplitude to saturate its The central difficulty in nuclear magnetic reso-
absorption, which equalizes the number of nuclei in its nance spectroscopy of even a small protein is that in a
two spin states. The stimulating radiation is then turned one-dimensional spectrum of absorption as a function of
off. The population of nuclei A will slowly relax back to its chemical shift, regardless of the nucleus chosen, the
equilibrium distribution by losing the excess energy it peaks of absorption from the individual nuclei overlap
has gained. One of the ways the population of nuclei A and cannot be distinguished from each other, let alone
may relax is by transferring saturation to the population assigned. It has been possible, however, to dissect the
of nuclei X, if within the molecule nucleus X is close to nuclear-magnetic resonance spectra of small molecules
nucleus A. If the absorption of the population of nuclei X of protein209,210 into their individual components by
is measured after a time, tm, sufficient for some of the sat- using the two-dimensional spectroscopy that has been
uration in the population of nuclei A to be transferred to developed by Ernst and his colleagues211,212 and the
the population of the nuclei X, the absorption of the pop- three-dimensional spectroscopy that has been devel-
ulation of nuclei X will have decreased relative to its oped by Bax and his colleagues (Table 12–3).
absorption in the absence of transfer of saturation All of the techniques of multidimensional nuclear
because the population of nuclei X will have been moved magnetic resonance spectroscopy rely upon the tech-
closer to saturation. nique of frequency labeling, which is a direct elabora-
An example of a nuclear Overhauser effect that was tion of Fourier transform nuclear magnetic
the consequence of spin diffusion was observed during a spectroscopy. To label the absorption of a population of
spectroscopic study of cytochrome c from Katsuwonas nuclei i with its Larmor frequency, two successive 90 ∞
pelamis.207 The heme in cytochrome c, as it is a large aro- pulses are used. Following the first 90 ∞ pulse applied in
matic ring (2–4), produces a substantial toroidal mag- the x direction, the net magnetization of the population
netic field when its p electrons circulate as a ring current of nuclei i is aligned with the y axis and then begins to
in the presence of the applied magnetic field. The precess in the xy plane around the z axis at its Larmor fre-
d2 methyl group of Leucine 68 (Figure 7–9C) resides adja- quency. After a period of time t1, a second 90 ∞ pulse in
cent to the heme in the region of this local field that is the x direction is applied. The second pulse is in phase at
opposed to the applied field, and the chemical shift the carrier frequency n0 with the first pulse to insure that
(–2.7 ppm) of the absorption of its three equivalent
1
hydrogens is even less than that of the reference absorp-
n0 t1 = n (12–55)
tion. This substantial displacement isolates this peak of
absorption from the absorptions of the rest of the methyl
1
hydrogens in the protein. When the population of the d2 where n is an integer. In this way, the carrier frequency
methyl 1hydrogens on Leucine 68 was saturated by preir- acts as an internal clock. This second 90 ∞ pulse diverts
radiation at the frequency of its chemical shift, the only the y component of the precessing net magnetiza-
absorptions of four other populations of 1hydrogens tion at that instant into the z direction but leaves the
were found to decrease. These were assigned to 1hydro- x component at that instant in the xy plane.
gens neighboring the d2 methyl group of Leucine 68 in Consequently, the amplitude of the remaining net mag-
the crystallographic molecular model of the protein. netization in the xy plane after the second 90 ∞ pulse is
The initial nuclear magnetic resonance spectra of equal to Mi sin 2p nit1 where Mi is the amplitude of the net
proteins were of 1hydrogen nuclei in molecules dissolved magnetization from the population of nuclei i before the
in [2H]H2O. They were one-dimensional spectra of second pulse and ni is its Larmor frequency. Because
absorption as a function of chemical shift. Even a small
protein of 100 amino acids has more than 700 hydrogens Mi sin ( 2p ni t 1 ) = Mi sin 2 p ( ni t 1 – n ) =
in it, most of them unique and most of their absorptions
split by spin–spin coupling. It is not surprising, therefore, Mi sin 2 p (ni – n 0 ) t 1
that such spectra contain, by and large, several broad,
unresolved absorptions, each resulting from the overlap (12–56)
618 Physical Measurements of Structure
Table 12–3: Couplings Giving Rise to a Peak in a Three-Dimensional or Four-Dimensional Nuclear Magnetic Resonance
Spectrum
H H O H H O H H O H H O
HNCO213
N C C N C C HNHB222 N C C N C C
H C H C
H H O H H O
HNCA213
N C C N C C H H O H H O
HCA(CO)Ne 213
N C C N C C
H H O H H O
HCACO213
N C C N C C H H O H H O
HCA(CO)NHe 224
N C C N C C
H H O H H O
HNCACO214
N C C N C C H H O H H O
HN(CO)CAe 225,226
N C C N C C
H H O H H O
HNCOCA214
N C C N C C H H O H H O
CBCA(CO)NH e 227 N C C N C C
H H O H H O C C
CBCANHb 215 N C C N C C
HNCACBb 216
C C H H O H H O
HN(CO)HAHBf 228 N C C N C C
HBHA(CO)NHf 229
H H O H H O C H C H
b 217 N C C N C C
CBCACO(CA)HA
C C H H O H H O
N C C N C C
H H HCB(CGCD)HD230 C H C H
HCCHc 218,219 C C C C
H H C H C H
H H O H H O
H H O H H O
HNHAd 213,220,221 N C C N C C
N C C N C C
C H C H
HCB(CGCDCE)HE230
H H O H H O C C
HACAHB 222,223
N C C N C C C C
C H C H C H H C H
a
Boxes enclose the three or four coupled atoms that both produce the peak or peaks and determine the three or four chemical shifts, one for each of the three or four atoms,
in the three or four respective dimensions. bThe three-dimensional peaks from the a 13carbon and the b 13carbon are separate peaks on the same field, each coupled respec-
tively to the same amido 15nitrogen and amido 1hydrogen or to the same combination of acyl 13carbon and a 1hydrogen, respectively, in the other two dimensions. cThe
three dimensions are 13carbon, 13carbon, and 1hydrogen. The 1hydrogens coupled to one or the other of the two 13carbons appear on the three-dimensional field at the
chemical shifts of the other 13carbon and their own 13carbon. dThe coupling between the amido 1hydrogen and the a 1hydrogen is the usual strong three-bond J coupling
between adjacent 1hydrogens. This coupling can be relayed through the a 13carbon or can be enhanced by using HOHAHA. eThe coupling between the a 13carbon and
amido 15nitrogen is relayed through the acyl 13carbon. fThe three-dimensional peaks from the a 1hydrogen and the b 1hydrogens are separate peaks each coupled respec-
tively to the same amido 15nitrogen and amido 1hydrogen through the acyl 13carbon.
Nuclear Magnetic Resonance 619
after the second 90 ∞ pulse, the amplitude of the net intervals t1 must be in the range of tens of microseconds
magnetization of the population of nuclei i precessing at to milliseconds to produce reliable modulations of the
its Larmor frequency in the xy plane has become a func- amplitudes.
tion of the length of the interval t1 between the pulses. A simple two-dimensional spectrum is an extension
Furthermore, as t1 is varied, this amplitude will vary of this procedure of frequency labeling. A series of free
harmonically with a frequency ni – n0, which is directly induction decays from the sample are gathered at sys-
proportional to the chemical shift (Equation 12–54) of tematically increasing intervals t1. The time dimension of
the population of nuclei i if the applied magnetic flux each free induction decay is designated as t2. The com-
density Bapp is such that the carrier frequency n0 is equal plete set of this series of free induction decays defines a
to the frequency at which the standard nuclei absorb, function that is two-dimensional in time, f(t1,t2). The
nstd. information in the first dimension of this function is
Immediately following the second 90 ∞ pulse, the encoded in the modulations of the amplitudes (AM) of
free induction decay of the excited sample is gathered in the signals from the individual populations of nuclei, and
the usual way. The Fourier transform of the output of the the information in the second dimension is encoded in
radio receiver produces a nuclear magnetic spectrum. the modulations of the frequency (FM) of the free induc-
The amplitude of each peak in the spectrum, however, tion decays. A two-dimensional Fourier transform of this
because it was derived only from the net magnetization function extracts the frequencies of these modulations in
remaining in the xy plane, has become a harmonic func- the two dimensions. The two-dimensional Fourier trans-
tion of t1 with a frequency ni – n0. If spectra are gathered form of the function f(t1,t2) is a two-dimensional function
at systematically increasing values of t1, the amplitude of in frequency, which when divided by the carrier fre-
each peak will vary with a frequency proportional to its quency (Equation 12–54) is a two-dimensional function
chemical shift (Figure 12–15).203 Each peak has become in chemical shift, f(d1,d2).
labeled with its own Larmor frequency, and this label- If none of the populations of nuclei in the sample is
ing is manifested in the amplitude modulation of its spin–spin-coupled to any other, the amplitude of the
peak in the spectrum with a frequency equal to the dif- signal from each population of nuclei is modulated only
ference between its Larmor frequency and the carrier by its own Larmor frequency, and f(d1,d2) has peaks only
frequency. In a spectrometer with a carrier signal of when d1 = d2. In such a case, the diagonal of the two-
600 MHz, the values of ni – n0 for 1hydrogens are less than dimensional spectrum replicates the one-dimensional
6000 Hz, and in a spectrometer with a carrier signal of nuclear magnetic resonance spectrum, and nothing has
150 MHz, the values of ni – n0 for 13carbon are less than been gained. If, however, one population of nuclei is
15,000 Hz, so the systematically increasing lengths of the spin–spin-coupled to another population of nuclei, the
modulations of the amplitudes of their precessions are
transferred between themselves during the second 90 ∞
pulse, and each of their precessions becomes labeled not
only with its own Larmor frequency but also with the
d2 Larmor frequency of the other population of nuclei to
which it is spin–spin-coupled.
t1 The Fourier transform picks out these coupled fre-
quencies, and on the two-dimensional field, in addition
to the one-dimensional spectrum along the diagonal,
there are off-diagonal cross-peaks. Each of these cross
peaks is located on the field at a chemical shift d1 of one
population of nuclei and a chemical shift d2 of another
population of nuclei to which the first population is
spin–spin-coupled. Because spin–spin coupling is fully
reciprocal, these cross-peaks are distributed symmetri-
cally about the diagonal of the two-dimensional field.
Correlated spectroscopy (COSY) is the technique that has
Figure 12–15: Amplitude modulation of the absorption of a just been described. A two-dimensional correlated spec-
nucleus produced during correlated spectroscopy.203 The absorp-
tion of a particular population of identical nuclei produces a peak trum is a two-dimensional spectrum in which the off-
in the spectrum of absorption as a function of the chemical shift d2 diagonal cross-peaks arise from spin–spin couplings
after f(t1, t2) has been submitted to Fourier transformation only in between different populations of nuclei and identify
the second dimension. Each trace is this spectrum of absorption as populations of spin–spin-coupled nuclei by their chemi-
a function of the chemical shift d2 at a different t1. The amplitude of cal shifts. It is these off-diagonal cross-peaks that pull out
the peak of absorbance records the harmonic precession of the
nuclear spin relative to the carrier frequency during the interval t1. individual absorptions from the one-dimensional spec-
Reprinted with permission from ref 203. Copyright 1995 John Wiley trum, spread them into two dimensions, and permit
and Sons Ltd. them to be observed individually.
620 Physical Measurements of Structure
A two-dimensional correlated spectrum (Figure dimensional spectrum is just beyond the lower right
12–16)210 is a presentation of absorption as a function of hand corner of the figure. Each cross-peak within the
two values of chemical shift (in parts per million), d1 and panel arises from the spin–spin coupling between the
d2. Each off-diagonal cross-peak in the spectrum has the amido 1hydrogen of one of the amino acids in the protein
same value of the chemical shift d2 as a peak buried in the and its own a 1hydrogen. Each cross-peak has pulled the
one-dimensional spectrum and the same value of the absorption of each amido 1hydrogen and the absorption
chemical shift d1 as another peak buried elsewhere in the of its adjacent a 1hydrogen out of the unresolved one-
one-dimensional spectrum but connected to the first by dimensional spectrum so that they can be individually
spin–spin coupling. The result is that two individual observed. Each cross-peak also assigns numerical values
absorptions unresolved in the one-dimensional spec- for the individual chemical shifts (d1 and d2) of the two
trum are simultaneously drawn out of it and placed in nuclei of each of these pairs of spin–spin-coupled
1
isolation from all of the other absorptions otherwise hydrogens and states that the two 1hydrogens with these
overlapping them. This provides the resolution. The two chemical shifts are connected to each other by three
information provided by an off-diagonal cross-peak is covalent bonds. This region of a two-dimensional
that the two nuclei responsible for these two now iso- (1H–1H) correlated spectrum is a fingerprint for the pro-
lated absorptions are connected through covalent bonds tein because almost every amino acid in its sequence is
that mediate spin–spin coupling. represented by a single cross-peak,* and the distribution
The off-diagonal region displayed in Figure 12–16 of the cross-peaks on the field is unique to that protein.
has a range for d2 (6.5–10.6 ppm) that includes the chem-
ical shifts for the amido 1hydrogens of the polypeptide
* Glycines, because they have two diastereotopic a 1hydrogens,
backbone and a range for d1 (1.7–6 ppm) that includes usually produce two cross-peaks of the same chemical shift in the
the chemical shifts of the 1hydrogens on the a carbons amido 1hydrogen dimension, and prolines, because they have no
of the amino acids in the protein. The diagonal, one- a 1hydrogens, produce none.
Many improvements have been made to the origi- sional correlated spectroscopy has been expanded to
nal correlated spectrum. There are many sophisticated three dimensions (Table 12–3). The cross-peaks in these
and intricate elaborations of the sequence of the pulses spectra (Figure 12–18)213 are produced by spin–spin cou-
of oriented radiowaves that narrow the cross-peaks, pling among three nuclei, usually of two or three differ-
eliminate background, and enhance dramatically the sig- ent elements. By expanding the dimensions, these
nals from weakly absorbing nuclei such as 13carbon and procedures are able to resolve absorptions that overlap
15
nitrogen. Each of these elaborations is identified by its in two dimensions just as two dimensions separate
own acronym (Table 12–4). Because the nucleus of absorptions that overlap in one dimension. Each cross-
12
carbon has no magnetic moment and the nucleus of peak in such spectra assigns chemical shifts simultane-
14
nitrogen has a spin quantum number of 1, the proteins ously to three or four individual nuclei.
examined are now almost always modified so that all of Each of the cross-peaks in the two-dimensional
their nitrogens are 15nitrogen and all of their carbons are (1H–1H) correlated spectrum of basic pancreatic trypsin
13
carbon by expressing them in bacteria grown on inhibitor (Figure 12–16) and the two-dimensional
[15N]NH4+ as their sole source of nitrogen and on a (15N–1H) HSQC correlated spectrum of dihydrofolate
[13C]nutrient as their sole source of carbon. In this way, reductase (Figure 12–17) has been labeled with the posi-
cross-peaks from these enriched proteins produced by tion in the sequence of the protein of the amino acid con-
heteronuclear spin–spin coupling can be observed. For taining the two nuclei that produced it. The spectra
example, the heteronuclear spin–spin coupling between themselves do not come with labels, and each of these
an amido 15nitrogen and its own amido 1hydrogen, assignments has been performed by tracing connections
enhanced by an HSQC pulse sequence (Figure 12–17),251 among cross-peaks that affiliate nuclei in the polypep-
provides an alternative fingerprint of the protein in tide backbone through spin–spin coupling and by
which every amino acid is also represented. Two-dimen- assigning the type of each amino acid from the pattern of
Table 12–4: Methods for Improving Cross-Peaks in Two-Dimensional and Three-Dimensional Nuclear Magnetic
Resonance Spectra
HMQC231,232 heteronuclear multiple- increases amplitude of cross-peaks arising from coupling involving nuclei with
quantum coherence small magnetogyric ratios such as 13carbon and 15nitrogen (Table 12–2)
HSQC233 heteronuclear single-quantum
coherence
HSMQC234 heteronuclear single–multiple-
quantum coherence
HOHAHA238–240 homonuclear Hartmann–Hahn increases amplitude of cross-peaks and extends, by relaying coherence, the
number of bonds through which coupling can produce a cross-peak
TOCSY238,241 total correlation spectroscopy
(same method as HOHAHA)
HMBC242,243 heteronuclear multiple bond extends the number of bonds through which coupling can produce a
correlation cross-peak
PS244 pseudo single quantum narrows the line widths of the cross-peaks
random fractional replacement of 50% of the 1hydrogens in a protein with 2hydrogens at random
deuteration245,246 narrows the widths of the cross peaks in correlated spectroscopy and increases
the amplitude and the number of the detectable nuclear Overhauser effects
from larger proteins
TROSY247 transverse relaxation-optimized narrows the line widths of the cross-peaks
spectroscopy
CRINEPT248 cross relaxation insensitive increases amplitude of cross-peaks arising from coupling involving nuclei with
nuclei enhanced polarization small magnetogyric ratios such as 13carbon and 15nitrogen
transfer
SBC249,250 single bond correlation permits spin–spin coupling between 13carbon and 15nitrogen to be used for
two-dimensional spectrum
622 Physical Measurements of Structure
15
I71 L89 V181 A106 by horizontal lines) are those arising from the
I114 Y162
I51 E62 I175 A96 V120
M52 D141 H127 spin–spin coupling of the two respective dis-
L93 I7 V169 I151
F134 T136
F147
E172 K176 D186 tinct 1hydrogens on each amido nitrogen in
F148 I138 D110
R28 K183 L158 V74 L27 L131 the primary amides of the glutamines and
W113 E143 asparagines in the protein. They are also
V135 E44 S41
D152 130
W57 e 1 R137 Q170 I60
labeled by the amino acid to which they have
L153 W113 e 1
L73 been assigned. Reprinted with permission
L75 E30
W24 e 1 L166 from ref 251. Copyright 1992 American
L159 A9 Chemical Society.
V8 M139
10 9 8 7
1hydrogen chemical shift (ppm)
affiliations among cross-peaks that trace connections spin–spin coupling among the 1hydrogens of each side
out into each side chain. chain registered in a two-dimensional (1H–1H) TOCSY
Connections among nuclei along the polypeptide spectrum (Figure 12–19)256 begins at the cross-peak
backbone are usually traced252–255 in a systematic between the amido 1hydrogen and the a 1hydrogen
sequence of sections through three-dimensional corre- (Figure 12–16). The coupling between the a 13carbon and
lated spectra (Figure 12–18). The large set of three- the b 13carbon of each side chain registered in a three-
dimensional spectra available for tracing the dimensional (13C–15N–1H) CBCA(CO)NH correlated spec-
connections among nuclei (Table 12–3) is redundant, so trum (Figure 12–20)257 begins at the cross-peak between
that when connections are obscured by the overlap of the amido 15nitrogen and the amido 1hydrogen (Figure
peaks or when a particular cross-peak is missing (for 12–17) of the next amino acid in the sequence (Table
example, Lysine 46 in Figure 12–16), the trace can take an 12–3). Connections among 1hydrogens of a side chain
alternative path. The result of such a trace is that all of the (Figure 12–19) or 13carbons of a side chain (Figure 12–20)
nuclei—1hydrogens, 13carbons, and 15nitrogens—in long can be extended to their own 13carbons or 1hydrogens,
segments of polypeptide backbone are consecutively respectively, with two-dimensional (1H–13C) HSQC cor-
connected to each other and each individually assigned related spectra (Figure 12–21).258 When two-dimensional
a chemical shift. spectra (Figure 12–19) become too crowded to trace con-
The respective positions of these segments of con- nections, they can be expanded in a third dimension
nected nuclei in the amino acid sequence of the protein (Figure 12–22)259 to resolve the individual cross-peaks.
are then established by tracing connections from the Each side chain has a characteristic pattern of con-
nuclei out into each side chain (Figures 12–19 through nections among nuclei of characteristic chemical shifts
12–22). Each of these paths of connections out into a that identifies it in the spectrum. For example, glutamate
side chain usually starts at one of the cross-peaks in a fin- has two b 1hydrogens that have smaller chemical shifts
gerprint of the protein. For example, the relayed than its two a 1hydrogens (Figure 12–19). Isoleucine has
Nuclear Magnetic Resonance 623
1
hydrogens on a d methyl group and a g methyl group as the 15nitrogen–1hydrogen bond is fixed rigidly in the pro-
well as two 1hydrogens on a g methylene, all in the tein so that it reorients at the same rate at which the
aliphatic range of chemical shifts (Figures 12–19 and entire molecule of protein reorients by its normal rota-
12–22). Lysine has 1hydrogens on a b methylene, a tional and translational diffusion and which assumes a
g methylene, and a d methylene in the aliphatic range value of 0 when it is so loosely attached to the protein
but two 1hydrogens on an e methylene with chemical that it reorients completely independently of the reori-
shifts around 3 ppm (Figures 12–19 and 12–22). entation of the molecule of protein. Therefore, the order
Threonine and serine have b 1hydrogens with larger parameter S2 is a measure of the flexibility of a particular
chemical shifts (around 4 ppm), and threonine has position within a molecule of the protein.
g 1hydrogens with chemical shifts characteristic of a Two-dimensional spectra of proteins in which each
methyl group (Figures 12–19 and 12–22). From these pat- amido 15nitrogen in the polypeptide backbone has been
terns, from the sequence of the connections along the assigned its position in the amino acid sequence can be
backbone (Figure 12–18), and from the amino acid used to measure relaxation rates of these individual
15
sequence of the protein, it is usually possible to identify nitrogens and calculate values of the order parameter
the long, unbroken segments of connections among the S2 for each. Most of the values of the order parameter S2
atoms of the backbone that run through the spectra with for these 15nitrogens are near 1 (≥0.8) because most of the
segments in the amino acid sequence of the protein and polypeptide backbone of a molecule of protein is rigidly
thereby assign the cross-peaks to specific positions in the fixed within the tertiary structure, but there are flexible
amino acid sequence, much as the pattern of protrusions segments that can be identified by the smaller values
from the polypeptide backbone in a map of electron den- (0.4–0.6) of the order parameter S2 of the amido 15nitro-
sity allows segments of the amino acid sequence to be gens they contain.262,263 These segments are often flexible
identified. loops on the surface of the molecule of protein and cor-
The final results of this process are that each cross- respond to segments in a crystallographic map of elec-
peak on the various two- and three-dimensional corre- tron density that are so flexible that they do not appear in
lated spectra has been assigned to the two or three nuclei the map or to segments the atoms of which have high
in the amino acid sequence of the protein that produce it B-factors.263 High values of these B-factors indicate that
and that the nucleus of almost every 1hydrogen, the segment is also flexible in the crystal. The informative
13
carbon, and 15nitrogen in the protein has been assigned exceptions are those in which the order parameters are
a specific chemical shift. By themselves these assign- low but the segment appears to be rigid in the crystallo-
ments are not very informative. They are, however, an graphic molecular model and its constituent atoms have
indispensable prelude to using nuclear magnetic reso- low B-factors. These exceptions indicate that the crystal
nance to provide insight into the dynamics of a protein, packing has confined an otherwise flexible segment of
to produce its molecular model, to measure the acid dis- polypeptide.
sociation constants of its side chains, and to follow the Order parameters can also be obtained for bonds
rates of exchange of its protons with protons in the solu- between 15nitrogens and 1hydrogens in side chains. For
tion. example, the order parameters for bonds between
15
When the spin state of a particular population of nitrogens and 1hydrogens in the side chains of trypto-
chemically identical nuclei is saturated by the absorption phans buried in the core of the protein are usually greater
of electromagnetic energy at its Larmor frequency and than 0.8, but those for the bonds between 15nitrogens and
1
then allowed to relax back to its equilibrium distribution, hydrogens in the side chains of arginines on the surface
the rate of its relaxation contains information about the of a protein can be as small as 0.05.263 Unfortunately,
dynamics of the structure in which each of these nuclei there is no simple correlation between values of the
is contained. For example, the relaxation of a particular order parameter S2 and the rates at which the flexible
population of identical nuclei of amido 15nitrogens, each segments or two side chains are fluctuating relative to
at the same position in the polypeptide backbones of the the entire molecule, and motions slower than the rota-
identical molecules of a protein in a solution, is domi- tional diffusion of the entire molecule of protein do not
nated by the dipolar interactions between the nuclei of register in the order parameter.
the 15nitrogens in that population and the nuclei of the In the rare instances in which a flexible segment of
directly attached 1hydrogens on each of the individual the folded polypeptide assumes only two significantly
amido nitrogens. The rate at which the bonds between occupied conformations of about equal occupancy, each
the 15nitrogens and the 1hydrogens in those two particu- nucleus in the flexible segment will have a different
lar populations reorient relative to the magnetic field chemical shift in each conformation. If these chemical
determines the rate of the relaxation of the population of shifts are different enough, each pair of nuclei that is
15
nitrogen nuclei. From an analysis of this rate of relax- spin–spin-coupled will produce two cross-peaks in a
ation, information about the rate of this reorientation two-dimensional spectrum, each with the chemical
can be extracted.260,261 This information is expressed as shifts of the respective nuclei in the two respective con-
an order parameter S2, which assumes a value of 1 when formations.264 In such cases, it is possible to obtain a rate
4.01 ppm
L116
K21
15nitrogen
7.66
7.5
HNCO HNCA HNHA
R74 E127
D118
13carbon
E82
L105
L32
HCACO 180
O
H
15nitrogen
178.3 ppm HCA(CO)N 124
chemical shift (ppm)
15N 15N 15N
F = 114.1 ppm G = 114.1 ppm H = 114.2 ppm
8.4
D95 D95
D95
D22 D58
T110
T110
T79
15nitrogen
7.5
HNCO HNCA HNHA
Figure 12–18: Sequential assignment of the chemical shifts for the 1hydrogens, 13carbons, and 15nitrogens in a polypeptide backbone by use
of three-dimensional nuclear magnetic resonance spectra.213 The complementary DNA encoding calmodulin from D. melanogaster (naa =
148) was expressed in E. coli grown on [15N]NH4Cl and [13C6]glucose as sole nitrogen and carbon sources, so that the protein was uniformly
and completely (>95%) labeled with 15nitrogen and 13carbon. Spectra were recorded from a 1.5 mM solution of calmodulin in a 93:7 mixture
of [1H]H2O to [2H]H2O at pH 6.3 and 47 ∞C. Individual panels (A–H) are two-dimensional sections through a series of three-dimensional
nuclear magnetic resonance spectra (Table 12–3) of the protein. In panels A, B, and C, the sections through the respective three-dimensional
spectra are only wide enough to contain cross-peaks from 15nitrogens the chemical shifts of which are 124.4 ± 0.2 ppm, which includes the
chemical shift of the amido 15nitrogen of Lysine 21. Each of these three sections has as its vertical axis the chemical shift of 1hydrogen between
7.4 and 8.5 ppm, the region covering the chemical shifts of 1hydrogens on amido nitrogens. (A) Section containing the cross-peak produced
by the spin–spin coupling connecting the amido 1hydrogen of Lysine 21, the amido 15nitrogen of Lysine 21, and the acyl 13carbon of Aspartate
20. This section has as its horizontal axis the chemical shift for 13carbon between 172 and 180 ppm, the region covering the chemical shifts
of acyl 13carbons. This cross-peak is located by the chemical shift of the acyl 13carbon of Aspartate 20 (177.3 ppm) and assigns the chemical
shift of the amido 1hydrogen of Lysine 21 as 7.66 ppm. (B) Section containing the cross-peak produced by the spin–spin coupling connect-
ing the amido 1hydrogen, the amido 15nitrogen, and the a 13carbon of Lysine 21. The section has as its horizontal axis the chemical shift for
13
carbon between 47 and 64 ppm, the region covering the chemical shifts of the a 13carbons. The position of this cross-peak, located by the
value for the chemical shift of its amido 1hydrogen (horizontal line), assigns the chemical shift of the a 13carbon of Lysine 21 as 58.5 ppm.
(C) Section containing the cross-peak produced by the spin–spin coupling connecting the amido 1hydrogen, the amido 15nitrogen, and the
a 1hydrogen of Lysine 21. This section has as its horizontal axis the chemical shift for 1hydrogen between 3.7 and 4.9 ppm, the region cover-
ing the chemical shifts of a 1hydrogens. The position of this cross-peak, located by the value of the chemical shift of its amido 1hydrogen (hor-
izontal line), assigns the chemical shift of the a 1hydrogen of Lysine 21 as 4.01 ppm. In panels D and E, the sections through the respective
three-dimensional spectra are only wide enough to contain cross-peaks from 13carbons the chemical shifts of which are 58.3 ± 0.3 ppm, which
includes the chemical shift of the a 13carbon of Lysine 21 (assigned in panel B). Each of these two sections has as its horizontal axis the chem-
ical shift for 1hydrogen between 3.7 and 4.9 ppm, the region covering the chemical shifts of a 1hydrogens. (D) Section containing the cross-
peak produced by the spin–spin coupling connecting the a 1hydrogen, the a 13carbon, and the acyl 13carbon of Lysine 21. This section has as
its vertical axis the chemical shift for 13carbon between 174 and 181 ppm, the region covering the chemical shifts of acyl 13carbons. The posi-
tion of this cross-peak, located by the value of the chemical shift of its a 1hydrogen (vertical line), assigns the chemical shift of the acyl
13
carbon of Lysine 21 as 178.3 ppm. (E) Section containing the cross-peak produced by the spin–spin coupling connecting the a 1hydrogen
of Lysine 21, the a 13carbon of Lysine 21, and the amido 15nitrogen of Aspartate 22. This section has as its vertical axis the chemical shift for
15
nitrogen between 111 and 125 ppm, the region covering the chemical shifts of amido 15nitrogens. The position of the cross-peak for Lysine
21, located by the value of the chemical shift of its a 1hydrogen (vertical line), assigns the chemical shift of the amido 15nitrogen of Aspartate
22 as 114.0 ppm. (F–H) Sections through three-dimensional spectra containing cross-peaks for Aspartate 22 corresponding respectively to
the sections in panels A–C that contain cross-peaks for Lysine 21. The sections in panels F, G, and H are only wide enough to contain cross-
peaks from 15nitrogens, the chemical shifts of which are 114.1 ± 0.2 ppm, which includes the chemical shift of the amido 15nitrogen of
Aspartate 22 assigned in panel E. The value of the chemical shift of the amido 15nitrogen of Lysine 21 (124.4 ppm) that was used to set the
position of the slabs in panels A, B, and C was assigned with a section corresponding to panel E but with the section for the sequential assign-
ment of the chemical shifts of the nuclei in Aspartate 20 rather than Lysine 21. The cross-peak in panel F produced by spin–spin couplings
connecting the amido 1hydrogen of Aspartate 22, the amido 15nitrogen of Aspartate 22, and the acyl 13carbon of Lysine 21 was located with
the chemical shift of the acyl 13carbon of Lysine 21 assigned in panel D. The position of the cross-peak from Lysine 21 in panel A was located
with the chemical shift of the acyl 13carbon of Aspartate 20 assigned in a section for the sequential assignment of the nuclei in Aspartate 20
corresponding to that in panel D for Lysine 21. Cross-peaks from Leucine 116 appear in panels A, B, and C because the chemical shift of its
amido 15nitrogen is 124.2 ppm. Cross-peaks from Leucine 32, Arginine 74, Glutamate 82, Leucine 105, Aspartate 118, and Glutamate 127
appear in panels D and E because the chemical shifts of their a 13carbons are 58.2, 58.1, 58.2, 58.5, 58.5, and 58.5 ppm, respectively. Cross-
peaks from Aspartate 58, Threonine 79, Aspartate 95, and Threonine 110 appear in panels F, G, and H because the chemical shifts of their
amido 15nitrogens are 113.9, 114.0, 114.2, and 114.4 ppm, respectively. Reprinted with permission from ref 213. Copyright 1990 American
Chemical Society.
constant for the exchange of the flexible segment are resolved from each other by using two-dimensional
between the two conformations. For example, a loop and three-dimensional nuclear Overhauser enhanced
between Alanine 9 and Leucine 24 in dihydrofolate spectroscopy (NOESY) just as the individual absorptions
reductase from E. coli exchanges between its two con- of each of the thousands of nuclei in a protein are
formations265 at a rate of 35 s–1. The heterologous associ- resolved by using two-dimensional and three-dimen-
ation between two molecules of protein also causes sional correlated spectroscopy.
changes in the chemical shifts of nuclei that end up in the A two-dimensional nuclear Overhauser enhanced
interface. These changes produce pairs of cross-peaks, spectrum is a two-dimensional spectrum in which the
one from the unassociated protein and one from the off-diagonal cross-peaks arise from nuclear Overhauser
associated protein. These changes in chemical shift iden- effects between two different populations of 1hydrogen
tify the amino acids involved in the interface,266 and rates nuclei and identify by their chemical shifts those pairs of
1
of exchange of the participants between free and bound hydrogen nuclei that are connected by those respective
states can be calculated from such spectra.267 nuclear Overhauser effects. A two-dimensional nuclear
A nuclear magnetic resonance molecular model of Overhauser enhanced spectrum (Figure 12–23)268,269 is
a protein is produced from a list of the individual nuclear produced in the same way as a two-dimensional corre-
Overhauser effects between its 1hydrogens. The thou- lated spectrum, except that after the second 90 ∞ pulse
sands of nuclear Overhauser effects that occur between has labeled the precession of the net magnetization of
pairs of the thousands of unique 1hydrogens in a protein each population of nuclei with its Larmor frequency by
626 Physical Measurements of Structure
b
b frequency of the population of nuclei A as well as its own
e
b e b 3 Larmor frequency, and in the two-dimensional nuclear
e
Overhauser enhanced spectrum that results, there is an
G95 off-diagonal cross-peak at chemical shift d2 of nucleus X
G6
and chemical shift d1 of nucleus A. Transfer of saturation,
G34
however, is reciprocal, and the saturation reciprocally
and coincidentally transferred from the population of
b 4 nuclei X to the population of nuclei A, which is labeled
V114
A3 K116 with the Larmor frequency of nuclei X, produces an off-
b F65
E70 E26 I19 K5 F98 diagonal peak at chemical shift d2 of nucleus A and
D55 T33
chemical shift d1 of nucleus X. Each of these two sym-
K32 E9 metrically displayed peaks connects nucleus A and
K52
nucleus X by a nuclear Overhauser effect and identifies
the two nuclei connected by their chemical shifts. Each
C13 5
of the hundreds of symmetrically displayed peaks in a
Y53 nuclear Overhauser enhanced spectrum (Figure 12–23)
9 8 7 makes its own respective connection between two
1
1hydrogen chemical shift (ppm) hydrogens, each identified by its chemical shift.
A specific example illustrates the spin diffusion
resulting from these individual transfers of saturation. A
modulating the amplitude of its precession in the cross section through a two-dimensional nuclear
xy plane (Figure 12–15), there is a fixed delay or time of Overhauser enhanced spectrum of bovine acrosin
mixing, tm, of 50 ms to several hundred milliseconds to inhibitor IIA, at the chemical shift d1 equivalent to the
allow the saturation of each population of nuclei, which Larmor frequency of the amido 1hydrogen of Alanine 37
has been labeled with its own characteristic Larmor fre- (8.45 ppm), contains cross-peaks at the chemical shifts
quency, to diffuse outward, mixing with the spin states of d2 of the amido 1hydrogens of Asparagine 34, Cystine 36,
nuclei in its vicinity. After this fixed delay, a third 90 ∞ and Phenylalanine 38; of the a 1hydrogens of Cystine 36
pulse initiates the collection of the free induction decay and Alanine 37; and of the b 1hydrogens of Cystine 36 and
from the sample during t2. Alanine 37 (Figure 12–24)270 because the saturation
In the two-dimensional spectrum that results, there transferred to each of these populations of 1hydrogen
is a cross-peak produced by the transfer of some of the nuclei retains the amplitude modulation, labeling it with
saturation from the population of nuclei A, which has the Larmor frequency of the population of the nuclei of
been labeled with its own Larmor frequency, to the pop- the amido 1hydrogens of Alanine 37. Consequently, in
ulation of nuclei X, each of which is adjacent to a the dimension of chemical shift d1 cross-peaks are
nucleus A in a molecule of the protein. As a result, the located at the chemical shifts of these other populations
Nuclear Magnetic Resonance 627
70
a
E61 K62 T63 Q64 R65 M66 L67 S68 G69 F70
12
I22d
I33d
I51d
V17g 2
spectrum presented covers the range of chemical shift for 1hydro- L101d 2 V17g 1 L20d 2
gen (horizontal axis) and 13carbon (vertical axis) of the methyl T4g V106g 2
groups of threonine, valine, leucine, and isoleucine. Each cross- V61g 2 L2d 2 L28d 2
peak is produced by the spin–spin coupling between a methyl L62d 1 24
L45d 2 L86d1 L54d 2
13
carbon and its three 1hydrogens. Each is labeled with the amino L64d1 L62d 2
acid in the sequence of the protein to which it has been assigned L101d 1 L2d 1 L20d1
and the Greek letter designating the position of the methyl group L64d 2 L83d 2 L28d 1
V79g 1 L83d 1 L86d 2
within that amino acid. The inset is the boxed region within the full L45d 1 L54d 1
spectrum expanded in the dimension of the chemical shift of
13
carbon to resolve peaks that overlapped in the full spectrum.
Reprinted with permission from ref 258. Copyright 1996 American
1.2 0.8 0.4 0.0 – 0.4
1hydrogen
Chemical Society. chemical shift (ppm)
628 Physical Measurements of Structure
58.6 ppm e bd g the position of the other coupled hydrogen in the side chain, and
E96 a
a
g g b b b
K93
g g g
4.0 each cross-peak assigns the chemical shift of that other hydrogen.
I106 a g g
a a
b
V72 The sections through the three-dimensional spectrum centered on
H30 a
S17 a
bb b g g V19
5.0 chemical shifts for 13carbon of 55.0, 56.7, 58.0, and 58.6 ppm, respec-
b b
tively, contain cross-peaks arising from the absorptions of the 13car-
bons in the a positions of amino acids with chemical shifts in each of
6 5 4 3 2 1 0
these ranges. The cross-peaks produced by the self-coupling of the
1
hydrogen chemical shift (ppm) a 1hydrogens on each of these selected a 13carbons lie on the
diagonal. Every cross-peak along each horizontal line is coupled,
respectively, to this a 13carbon and a 1hydrogen, each labeled with its
eventual assignment. Each cross-peak is labeled by the position of
the other coupled hydrogen in the side chain, and each cross-peak
assigns the chemical shift of that other hydrogen. Reprinted from
ref 259. Copyright 1990 American Chemical Society.
of 1hydrogens. The existence of these cross-peaks states error to maximize the amount of transfer to immediately
that all of these 1hydrogens are in the vicinity of the adjacent nuclei while minimizing the spread to more dis-
amido 1hydrogen of Alanine 37 in the tertiary structure of tant locations (Figure 12–24). Because the nuclear
the protein. As the length of the fixed delay tm was Overhauser effect results from a dynamic, inhomoge-
increased, the intensity of the cross-peaks increased as neous process, no reliable absolute measurements of
more and more of the amplitude modulation of the pop- particular distances between nuclei can be made. An
ulation of the nuclei of amido 1hydrogens of Alanine 37 intuition of relative distances between the nuclei can be
was transferred to the populations of neighboring nuclei. gained, however, by following the changes in the inten-
Two problems with nuclear Overhauser effects sity of the nuclear Overhauser effects as a function of the
that are the consequence of spin diffusion are that they time interval tm. If a nuclear Overhauser effect is one that
are complicated by the spectral density function205 inher- develops early in the progress of spin diffusion, the two
ent to the dipolar interaction and that they are usually nuclei connected by that nuclear Overhauser effect are
not confined to nuclei immediately adjacent to the presumed to be close to each other (<0.5 nm) in the
source of the diffusing spin but spread outward from the folded polypeptide.270
source in rather complex pathways that cannot be delin- In the full two-dimensional nuclear Overhauser
eated unless the detailed structure of the molecule is spectrum of a protein (Figure 12–23A), the one-dimen-
already known.206,271,272 The time tm between the second sional spectrum of the individual absorptions of the
1
and the third 90 ∞ pulses must be chosen by trial and hydrogens lies along the diagonal. The nuclear
Nuclear Magnetic Resonance 629
1
1
hydrogens; da,a, connections between 1hydrogens on two different
a carbons; and d(b,g,d),(b,g,d), connections between two different b, g,
or d 1hydrogens. Reprinted from ref 269. Copyright 1996 American d N,N
Chemical Society. (B) Expansion of the da,N region of spectrum A. 10
Each cross-peak is a connection produced by a nuclear Overhauser 10 8 6 4 2 0
effect between the a 1hydrogen on one amino acid and the amido 1hydrogen chemical shift (ppm)
1
hydrogen on another. The identity of the amino acids on which the
paired 1hydrogens are located is determined by the chemical shifts
at which the cross-peak is situated. Those that connect consecutive
B E64,T65
1
the two 1hydrogens. Reprinted with permission from ref 268.
Copyright 1995 American Chemical Society. 9.5 9.0 8.5 8.0 7.5
1
hydrogen chemical shift (ppm)
C
Overhauser effect draws connected pairs of these 41d 32d 41e 32 e 41z
absorptions out of the diagonal as individual cross- 14d 14d
14d 14d 14d 14d 14d
A37NH
L93d
tm (ms) V29g
F38NH I10g 2 L27d
L86d
1hydrogen
120 F112b Y56b
I10a S107b I80a
L7a
T6b S107b E60a
10 8 6 4 2 0 4
T28b Y56a
1 F112a
hydrogen chemical shift (ppm)
V29a
Figure 12–24: Diffusion of saturation, labeled with its Larmor fre-
quency, from a nucleus of 1hydrogen into surrounding nuclei of 5
1
hydrogen as a function of the length of the fixed delay tm.270 Each
I10d V29g I80d
trace is a cross section through a two-dimensional (1H–1H) nuclear
Figure 12–25: Three-dimensional (1H–13C–1H) NOESY–HMQC
Overhauser enhanced spectrum of a 16 mM solution of acrosin
spectra of a 2 mM solution of human interleukin-4 (naa = 129) that
inhibitor IIA from bovine seminal plasma (naa = 57) in [1H]H2O at
had been expressed in E. coli grown on [15N](NH4)2SO4 and
pH 5.3 and 47 ∞C. Each cross section cuts through one of the two-
[13C3]glycerol as sole sources of nitrogen and carbon and that was
dimensional spectra at a chemical shift d1 of 8.45 ppm, which is the
dissolved in [2H]H2O at pH 4.5 and 20 ∞C.273 The three strips are
chemical shift of the amido 1hydrogen on Alanine 37. Other 1hydro-
from two-dimensional sections through the three-dimensional
gens connected to this hydrogen by nuclear Overhauser effects are
spectra. The three two-dimensional sections, cut in the 13carbon
represented by cross-peaks in the dimension of 1hydrogen chemi-
dimension, contain cross-peaks from 13carbons with chemical
cal shift d2 (horizontal axis). They are labeled by the 1hydrogen to
shifts of 13.3, 16.6, and 6.7 ppm, respectively, which are the chem-
which they have been assigned by the individual values of their
ical shifts of the 13carbons in the d methyl group of Isoleucine 10,
chemical shifts. The large peak labeled A37NH is the self-connec-
one of the g methyl groups of Valine 29, and the d methyl group of
tion of the amido 1hydrogen of Alanine 37. Each cross section is
Isoleucine 80. Each strip from each of these two-dimensional sec-
from a two-dimensional spectrum gathered with a different fixed
tions is centered on the chemical shift (horizontal axis) of the
delay tm, noted to its left in milliseconds. Reprinted with permis- 1
hydrogens of the respective methyl group. The most intense cross-
sion from ref 270. Copyright 1985 Academic Press.
peak on each strip is the self-connection of those hydrogens and is
not labeled. The vertical axis defines the chemical shifts of the other
spin–spin-coupled to a 13carbon or a 15nitrogen the hydrogens connected to the respective methyl hydrogens by
nuclear Overhauser effects. Each of these cross-peaks, identified by
chemical shift of which falls within a narrow range of its chemical shift, is labeled with the amino acid and the position
values are registered. For example, only the 1hydrogens in that amino acid at which the 1hydrogen producing the nuclear
connected by nuclear Overhauser effects to 1hydrogens Overhauser effect is located. Reprinted from ref 273. Copyright
on 13carbons with chemical shifts of 13.3 ppm are regis- 1994 Elsevier B.V.
tered in the left strip in Figure 12–25. The nuclear
Overhauser effects in such spectra are assigned to partic- acid sequences of these proteins with as few as 5–10
ular pairs of hydrogens on the basis of the two chemical involving 1hydrogens on one particular amino acid to as
shifts of each cross-peak (Figure 12–23B,C) or the two many as 160 involving 1hydrogens on another.277 The
chemical shifts of the cross-peak and the chemical shift latter values are extraordinarily impressive because no
of a 13carbon or 15nitrogen to which one or the other of one hydrogen in a protein can have more than about
the 1hydrogens is spin–spin-coupled. These chemical 20–25 hydrogens within 0.5 nm of it, and the 1hydrogens
shifts were determined during the initial assignments of on methyl groups are indistinguishable from each other
chemical shifts to all of the nuclei in the protein. in chemical shift and do not register as separate 1hydro-
As many pairs of hydrogens coupled by nuclear gens. It is from this catalogue of pairs of connected
1
Overhauser effects as possible are identified and cata- hydrogens that a molecular model is built.
logued. For example, 531 pairs of 1hydrogens were It is the nuclear Overhauser effects between 1hydro-
identified as being connected by nuclear Overhauser gens that are on amino acids two or more positions away
effects in spectra of the major cold-shock protein from each other in the amino acid sequence of a protein
(naa = 70) from E. coli;276 1281 pairs, in spectra of glutare- that provide the information on which the molecular
doxin 2 (naa = 215) from E. coli;277 and 3125 pairs, in model is based. Nuclear Overhauser effects between
1
spectra of phosphoglycerate mutase (naa = 205) from hydrogens within the same amino acid or on immedi-
Schizosaccharomyces pombe.278 As is usually the case, ately adjacent amino acids are usually uninformative
these connections were spread unevenly over the amino because the covalent structure requires that they occur.
Nuclear Magnetic Resonance 631
Position in sequence
acids at adjacent positions in the amino acid sequence 20
(labeled cross-peaks in Figure 12–23B), while other
regions, such as the one containing connections between
aromatic 1hydrogens and aliphatic 1hydrogens (labeled 30
cross-peaks in Figure 12–23C), are dominated by pairs of
1
hydrogens distant from each other in the primary struc-
ture but adjacent to each other in the tertiary structure of 40
the protein. It is these latter types of nuclear Overhauser
effects that draw together two distant hydrogens as the
covalent structure of the protein is folded into the molec- 50
ular model.
A nuclear magnetic resonance molecular model is
a molecular model of the covalent structure of the protein
Figure 12–26: Diagonal plot of the nuclear Overhauser effects
folded by the builder of the model into a tertiary structure
observed between different amino acids in the amino acid
in which the maximum number of 1hydrogens observed sequence of acrosin inhibitor IIA from bovine seminal plasma.270
to be connected by short-range nuclear Overhauser Each nuclear Overhauser effect was established by the existence of
effects end up close (≤0.5 nm) to each other. It is possible a cross-peak in a two-dimensional nuclear Overhauser enhanced
to start with a molecular model of the extended polypep- spectrum of the protein that had two chemical shifts identical to
the respective chemical shifts of particular 1hydrogens on the two
tide in a computer and use molecular dynamics and sim-
different amino acids. The two axes are the numbering of the
ulated annealing, modified so that the potential function amino acids in the sequence of the protein, and a square represents
includes the constraints of the nuclear Overhauser a connection between the two positions in the sequence. Solid
effects, to produce a preliminary molecular model, simi- squares represent nuclear Overhauser effects between 1hydrogens
lar to the preliminary crystallographic molecular model on a carbons or amido 1hydrogens from the respective amino
acids; hatched squares, between a 1hydrogen on an a carbon or
that results from inserting the molecular model of the
amido 1hydrogen on one amino acid and a 1hydrogen on the side
polypeptide into the map electron density. chain of the other; squares with ¥, between 1hydrogens on the side
The validity of this initial molecular model can be chains of both amino acids. Patterns of connections can be recog-
assessed by examining its secondary structure. Just as nized in the plot that define three turns of a helix from positions 34
a helices and b structure can be recognized in a map of to 45 and one turn of a helix from positions 8 to 11, and three seg-
ments from positions 52 to 55, 27 to 23, and 29 to 33 define three
electron density, segments of the polypeptide that are
strands of antiparallel b structure in a pleated sheet. Reprinted with
a helices or b structure in the actual molecule of protein permission from ref 270. Copyright 1985 Academic Press.
can be recognized by patterns in the observed nuclear
Overhauser effects (Figure 12–26).270
The most dominant pattern is that of the 1hydro- pairs to 1hydrogens in another string of successive amino
gens in an a helix. In an a helix, nuclear Overhauser acids in the order in which those spectrally connected
effects systematically connect hydrogens in each amino pairs of amino acids occur in the sequence. In anti-
acid to those in amino acids three positions and four parallel b structure, the pairs of amino acids are con-
positions (Figure 6–6) away from it in the amino acid nected in the reverse order to the order in which they
sequence.280–282 An a helix holds the consecutive nitro- occur in the sequence. For example, connections
gen–hydrogen bonds of the amides of the backbone close between amino acids at positions 20 and 17, 21 and 16,
to each other, and these short distances promote trans- 22 and 15, 23 and 14, 24 and 13, and 25 and 12 defined an
fer of saturation. For example, in the two-dimensional antiparallel b hairpin in a-amylase inhibitor HOE-467A
nuclear Overhauser enhanced spectrum of the anaphyla- from Streptomyces tendae.284
toxin from human complement factor 3a, 36 of the 44 Other patterns indicating the organization of the
nuclear Overhauser effects observed between amido tertiary structure of the actual molecule of protein can
1
hydrogens on spatially adjacent amino acids were those also be recognized in the spectra. For example, patterns
for amino acids in a helices, while only 46 of the 77 similar to those for antiparallel b structure, in which
amino acids in the protein are in a helices.283 1
hydrogens in one segment of amino acids are connected
In parallel b structure, 1hydrogens in a string of in reverse order to 1hydrogens in another segment of
successive amino acids in the sequence are connected in amino acids, can identify two adjacent, antiparallel
632 Physical Measurements of Structure
Copyright 1996 American Chemical Society.
Reprinted with permission from ref 293.
tures are indicated by adjacent crosses.
cation (Zn) in the various superposed struc-
imposed. The positions of a structural Zn2+
equally compatible with the constraints
tures is presented in stereo, each of which is
volumes. An ensemble of superposed struc-
angles and to eliminate overlap of atomic
nated hydrogen bonds and optimal dihedral
ture to obtain the optimal length for desig-
of each other and then adjusting the struc-
nuclear Overhauser effects to within 0.5 nm
bringing as many hydrogens connected by
mal bond lengths and bond angles by first
from a model of the polypeptide with opti-
program written to build a tertiary structure
were used as constraints in a computing
effects observed in spectra of the protein
tein of E. coli.293,294 The nuclear Overhauser
domain (naa = 92) of the ADA regulatory pro-
molecular model of the amino-terminal
Figure 12–27: Nuclear magnetic resonance
Overhauser effects, other constraints can be applied to
the process of refinement. Designated donors and accep-
tors of hydrogen bonds in a helices and b structure can
be assigned ideal lengths.288 Dihedral angles within the
structure can be constrained to particular ranges by
values of observed coupling constants.289,290 If the protein
contains a paramagnetic metallic cation, the effect of
that cation on the relaxation rates of 1hydrogens in the
protein can provide estimates of the distances between
each of those 1hydrogens and the metallic cation.291
In the final refined nuclear magnetic resonance
molecular model, segments of random meander con-
necting clearly defined segments of secondary structure
will often be less well defined even though there is crys-
tallographic or chemical evidence that they do assume
specific structures.292,293 This problem is emphasized by
the practice of presenting a nuclear magnetic resonance
molecular model as an ensemble of structures, each of
which satisfies the constraints of the nuclear Overhauser
effects (Figure 12–27).293,294 In such representations, the
certainty of the structures of a helices and b structure is
set in sharp contrast to the uncertainty of the structure of
the random meander. Although such a representation
implies that these poorly defined segments are more
flexible and less rigidly confined than the central regions
of regular secondary structure, crystallographic molecu-
lar models of the same protein often show no evidence of
such flexibility.295 It is possible to distinguish whether or
not the poor definition of these regions of random mean-
der results from dynamic flexibility by examining the
rates of relaxation of the 15nitrogens in these regions.
These rates of relaxation are sensitive to thermal motion
and can be used to identify segments of polypeptide that
are dynamically flexible in the actual molecule of pro-
tein. In the absence of evidence for flexibility, it must be
assumed that the poor definition of random meander is
a consequence of an insufficiency of constraints in the
data.
In refined nuclear magnetic resonance molecular
models, it is those regions of the protein that are within
or sandwiched between a helices or b structure that are
the most precisely defined. There are, however, regions
of the secondary structure that are poorly defined by
nuclear magnetic resonance. Hydrogens known to be
within short segments of secondary structure, such as
b turns or 310 helix, participate in so few nuclear
Overhauser effects that those that are observed are often
inadequate to define the structure of these segments.296
a helices. In this instance, however, the patterns are dis- Molecules of water confined to particular locations
continuous because only 1hydrogens within the interface on the surface of the protein can be incorporated into the
between the two a helices are connected by nuclear molecular model on the basis of the nuclear Overhauser
Overhauser effects.285–287 effects between their 1hydrogens and 1hydrogens of the
Just as the preliminary crystallographic molecular amino acids by using rotating-frame Overhauser
model is then submitted to refinement against the data enhanced spectroscopy (ROESY).297 Molecules of water,
set, the preliminary nuclear magnetic resonance molec- however, at locations buried within the structure of the
ular model is submitted to refinement with the nuclear molecule of protein have residence times long enough to
Overhauser effects as constraints. In addition to nuclear be observed directly by their nuclear Overhauser effects.
Nuclear Magnetic Resonance 633
These two types of locations for molecules of water, exte- ture.279 The arrangement of these regular secondary
rior and interior, are usually found to occupy the same structures in the tertiary structure, however, often differs
positions in the nuclear magnetic resonance molecular significantly, but not dramatically, between the two
model that they do in the crystallographic molecular molecular models.303–306 Some of these differences are
model of the same protein.298 The position of metallic real and informative.307 They often result from the fact
cations in the molecular model can be established by that the protein in question is small and flexible or has
substituting the natural cation with a cation of nuclear flexible domains and the fact that contacts between the
spin ". For example, the Zn2+ cations normally bound to molecules of protein in the crystal are able to shift sec-
the amino-terminal domain of regulatory protein GAL4 ondary structures relative to each other.306 If a protein is
from S. cerevisiae (naa = 62) were replaced with 113Cd2+ constructed so that in solution it assumes two or more
cations, and spin–spin couplings between the 113Cd2+ conformations because shifts among these conforma-
cations and the b 1hydrogens on the cysteines that cova- tions are required for its function, nuclear magnetic res-
lently bind them (6–19) produced a two-dimensional onance can be used to determine which crystallographic
correlated spectrum.299 molecular models of these various conformations repre-
The fundamental problem with building a molecu- sent the species actually present in solution.308
lar model from nuclear Overhauser effects is that those In superpositions of nuclear magnetic resonance
nuclear Overhauser effects do not define a distance and crystallographic molecular models of the same
between two 1hydrogens because the relative rates of protein, the root mean square deviations between heavy
spin diffusion are too dependent on the character of the atoms (oxygen, nitrogen, and carbon) are usually the
unique surroundings around each nucleus to obtain reli- least within the polypeptide backbone of the regular sec-
able estimates of distances. The distances in the final ondary structure in the core, greater for side chains
refined model between pairs of 1hydrogens connected by buried between these secondary structures in the core,
nuclear Overhauser effects are always quite different and greatest for random meander at the periph-
even though almost all of them can be made less than ery.300,303,309,310 In the comparison of the two molecular
0.5 nm.288 When distances between 1hydrogens con- models for a-amylase inhibitor HOE-467A (naa = 74), the
nected by nuclear Overhauser effects are measured in a value of the root mean square deviation for the heavy
crystallographic molecular model of the same protein,296 atoms of the polypeptide backbone was 0.105 nm; that
there is usually little correlation between the actual dis- for the heavy atoms of the side chains buried in the core
tance between the hydrogens and the strength of the was 0.125 nm; and that for all heavy atoms was
nuclear Overhauser effect at the optimal mixing time tm. 0.184 nm.300 This protein, however, is almost entirely
Those nuclear Overhauser effects observed only after b structure with little random meander. In the compari-
extended mixing times do arise from hydrogens that are son of the two molecular models of human granulocyte
farther apart, but the range of those longer distances is colony-stimulating factor (naa = 174 aa), a much larger
broad, and consequently they are not very useful.296 If a protein with significant amounts of random meander,
nuclear Overhauser effect is observed between two the root mean square deviation for the heavy atoms of
1
hydrogens after an optimal mixing time tm, it can only be the polypeptide backbone in its four a helices was
assumed that they are less than 0.5 nm apart,205,272 but 0.286 nm; that for all of the heavy atoms in these
there are usually notable exceptions even to this a helices was 0.333 nm; that for all of the heavy atoms in
limit.296,300–302 the entire polypeptide backbone was 0.315 nm; and that
In effect, the existence of a nuclear Overhauser for all heavy atoms was 0.370 nm.303
effect allows the investigator to connect the two hydro- It is in the details of the atomic structure that
gens in a molecular model of the covalent structure of the nuclear magnetic resonance and crystallographic molec-
polypeptide with a rubber band that is elastic enough to ular models differ most significantly. For example, the
stretch to a distance equivalent to about 0.5 nm. If dispositions of the aromatic rings of Tyrosine 3, Tyrosine
enough of these rubber bands were inserted into a 45, and Phenylalanine 52 were different in two nuclear
molecular model of the polypeptide and the regions of magnetic resonance molecular models of the
the amino acid sequence identified as b structure and immunoglobulin-binding domain of immunoglobulin G
a helix (Figure 12–26) had already been locked into these binding protein G from Streptococcus (naa = 56), and
secondary structures, the model would snap into a con- those dispositions in turn were both different from the
formation that resembles the native folded conformation dispositions in the crystallographic molecular model.311
of the polypeptide. In nuclear magnetic resonance molecular models it is the
It has been possible in many instances to compare conformations of the side chains that are always more
nuclear magnetic resonance molecular models with uncertain than those of the polypeptide backbone;300
crystallographic molecular models. It is usually observed but, unfortunately, it is the conformations of the side
that the two molecular models resemble each other, chains that usually accomplish the function of the pro-
occasionally quite closely.279 The resemblance is tein. There are, however, some instances in which the
strongest in the assignment of a helices and b struc- conformation of a particular side chain in a nuclear mag-
634 Physical Measurements of Structure
netic resonance molecular model is more compatible magnetic molecular model causes it to become closer to
with its known chemical properties than its conforma- the crystallographic molecular model of the same pro-
tion in the crystallographic molecular model of the same tein rather than to assume its own distinct structure.290
protein312 and other instances in which the atomic There are, however, informative exceptions to this
details of the crystallographic molecular model could be rule and in these instances, nuclear magnetic resonance
adjusted by nuclear magnetic resonance.313 does reveal differences in the structure of a protein
A nuclear magnetic resonance molecular model is a when it is in solution and when it is in a crystal.307,308,315
significantly less accurate205 representation of the actual As with solution scattering of X-rays, these differences
structure of a molecule of protein than is a crystallo- permit the crystallographic molecular model to be
graphic molecular model for the following reasons. First, adjusted, often minutely,313 to a conformation represent-
a crystallographic data set contains significantly more ing the molecule of protein in solution, which is the goal
information than even the most extensive list of nuclear of all structural studies.
Overhauser effects and coupling constants310 because Perhaps the greatest drawback of nuclear magnetic
the number of observable nuclear Overhauser effects is resonance spectroscopy is that it is confined to small pro-
always less than the number of observable reflections teins. This confinement is due not only to the problem of
and because each reflection comes with an amplitude. overlapping cross-peaks on two-dimensional and three-
Second, the heavy reliance on energy minimization in dimensional spectra. Because the rate at which a mole-
building the nuclear magnetic resonance molecular cule rotationally diffuses in the solution affects both the
model assures that unusual local conformations that are ratio of signal to noise in a nuclear magnetic resonance
excluded by the procedure used for the minimization spectrum and the effectiveness of the pulse sequences
either intentionally or unintentionally will be missed. used for multidimensional and multinuclear spectra, the
Third, it has been demonstrated by calculation that even size of a molecule of the protein determines whether or
with an average of 19 nuclear Overhauser effects for each not it will even yield a spectrum. Although methods have
amino acid, a value in excess of the number usually avail- been reported that can increase the rate at which a mole-
able, the root mean square deviation of the atoms in a cule of protein rotationally diffuses in a solution,316 this
nuclear magnetic resonance molecular model from their problem has yet to be solved satisfactorily. From 1986 to
actual positions in the structure of the protein has to be 2001, the size of the largest asymmetric units and the size
at least 0.1 nm.205 Even this is an overestimate of the of the largest symmetric dimers317,318 for which nuclear
accuracy because the paucity of nuclear Overhauser magnetic resonance provided a molecular model
effects from random meander was not considered when increased from 120 to 220 aa and from 200 to 450 aa,
the pairs of hydrogens incorporated into the calculations respectively.* The average size of the proteins for which
were chosen. Fourth, because the crystallographic data molecular models were reported, however, increased
constrains the crystallographic molecular model much only modestly during the same period, from about 90 to
more than the nuclear magnetic resonance data con- about 125 aa. Unfortunately, most proteins are oligomers
strains the nuclear magnetic resonance molecular with asymmetric units larger than 300 aa, sizes that pres-
model, upon refinement with a combination of both the ent no difficulty to crystallography.
crystallographic data set and the observed nuclear Because of their poor definition of the structure of
Overhauser effects, the crystallographic molecular random meander and of the conformation of side chains,
model of a protein quickly converges with only small because of their inaccuracy, because of their indistin-
changes in its structure to accommodate both sets of guishability from crystallographic molecular models,
data while the nuclear magnetic resonance molecular and because of their confinement to small proteins,
model undergoes far more extensive changes to reach nuclear magnetic resonance molecular models have pro-
accommodation.310 Fifth, the fact that the number of vided far less structural information than have crystallo-
nuclear Overhauser effects observed between 1hydro- graphic molecular models. Although there are situations
gens in the same amino acid that are inescapably greater in which nuclear magnetic resonance can be applied
than 0.5 nm apart in the final nuclear magnetic reso- when crystallography cannot, for example in defining the
nance molecular model is much greater than the number details of the conformational change that occurs upon
of nuclear Overhauser effects observed between 1hydro- the binding of porcine phospholipase A2 to micelles of
gens in different amino acids that are greater than 0.5 nm dodecyl phosphocholine,320 and situations in which
apart in the final nuclear magnetic resonance molecular nuclear magnetic resonance establishes clear and signif-
model indicates that in bringing as many 1hydrogens icant differences between a crystallographic molecular
connected by nuclear Overhauser effects as possible to model and the structure of a protein in solution, for
within 0.5 nm of each other, the construction of the example in showing that the central unsupported a helix
molecular model has produced a structure significantly
different from the actual structure of the molecule of pro- * A preliminary nuclear magnetic molecular model of malate syn-
tein.314 Sixth, it is usually observed that increasing the thase from E. coli, which is a monomer of 723 amino acids, has
number of constraints in the construction of the nuclear been reported.319
Nuclear Magnetic Resonance 635
on each aspartate (chemical shifts on the abscissa of adjacent to each other in the crystallographic molecular
Figure 12–30A). Aspartates 94, 108, and 134 had values model of ribonuclease H337, and their acid–base titrations
of pKa expected for carboxylates exposed on the surface are coupled tautomerically to each other (Figure
of a protein (3.2, 3.2, and 4.1, respectively). Aspartates 12–30B). The two coupled titration curves should be
102 and 194 had values of pKa less than 2, which suggests complex functions of the microscopic acid dissociation
that in the native structure of the protein they are in elec- constants (Equation 2–24) and of the four values of the
tropositive surroundings. chemical shift of Aspartate 10 and the four values of the
Aspartate 10 and Aspartate 70 are immediately chemical shift for Aspartate 70 (the ones when Aspartate
10 and Aspartate 70 are both protonated, the ones when
Aspartate 10 is protonated and Aspartate 70 is not, the
A ones when Aspartate 10 is not protonated and Aspartate
70 is, and the one when both are unprotonated).338 To
obtain exact values for these 12 parameters, additional
N45
experiments in which each of the aspartates in turn is
N44 mutated to an asparagine would have to be performed.
N84
174 If, however, it is assumed that neither the chemical shift
N130
of protonated Aspartate 10 nor the chemical shift of
unprotonated Aspartate 10 is perturbed by the ionization
D10 chemical shift (ppm) of Aspartate 70 and that neither the chemical shift of pro-
N100
N16 tonated Aspartate 70 nor the chemical shift of unproto-
N143 176 nated Aspartate 70 is perturbed by the ionization of
D108 Aspartate 10, then the two respective titration curves reg-
D134
D148 Q105 ister only the fraction of each aspartate that is ionized at
D70
Q113 a particular pH. If this is the case, as it seems to be, then
Q72 Q76
Q80 178 the equilibrium constant between the concentrations of
D94 the two tautomers, the one in which Aspartate 10 is pro-
Q115
Q4 Q152 tonated and Aspartate 70 is unprotonated and the one in
13carbon
173 Asp108 Asp134 Asp148 Val155 ferent values of pH. The chemical shift of the acyl 13carbon of each
side chain (ordinate of the pairs of cross-peaks in panel A) is plotted
2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8 as a function of pH. Reprinted with permission from ref 336.
pH Copyright 1994 American Chemical Society.
638 Physical Measurements of Structure
the two tautomers, because in this example, the peak of In the spectrum, the off-diagonal peaks arise from
absorption from the acyl 13carbon of Glutamate 78 drifts nuclear Overhauser enhancements (tm = 150 ms). The
to a smaller value of the chemical shift rather than to a symmetrically displayed peaks on the two sides across
larger one upon the titration of Glutamate 172, and the the diagonal result from the same nuclear Overhauser
peak of absorption from Glutamate 172 also drifts to a effects. You should convince yourself that the patterns
smaller value of the chemical shift rather than to a larger are symmetric across the diagonal.
one upon the titration of Glutamate 78.
(A) What hydrogens in a protein produce off-diagonal
The carboxylates on the side chains of glutamates and
peaks in this region of the spectrum?
aspartates sometimes form hydrogen bonds with amido
nitrogen–hydrogens from the polypeptide backbone. (B) The peaks connected by the horizontal and verti-
These can be identified in nuclear magnetic resonance cal lines have been identified with a particular
spectra because the chemical shift of the amido 1hydrogen subset of these hydrogens. Each of the peaks con-
participating in the hydrogen bond will increase in nected by these lines is produced by a nuclear
magnitude in concert with the decrease in the chemical Overhauser enhancement between two hydro-
shift of the 1hydrogens adjacent to the carboxyl group as it gens. Draw a polypeptide in the extended confor-
loses its proton, becomes a carboxylate, and forms the mation as in 2–15. On your drawing show with
hydrogen bond with the amido nitrogen–hydrogen.339 double arrows only the connections that give rise
to those peaks that are highlighted by the hori-
zontal and vertical lines.
Suggested Reading
(C) What do the horizontal and vertical lines indicate
Ikura, M., Kay, L.E., & Bax, A. (1990) A novel approach for sequential about these peaks? Why are the peaks labeled
assignment of 1H, 13C, and 15N spectra of proteins: heteronuclear with pairs of numbers that increase consecu-
triple-resonance three-dimensional NMR spectroscopy. tively?
Application to calmodulin, Biochemistry 29, 4659–4667.
Ikura, M., Spera, S., Barbato, G., Kay, L.E., Krinks, M., & Bax, A.
(D) What other information was used to assign the
(1991) Secondary structure and side-chain 1H and 13C resonance numbers to the particular peaks?
assignments of calmodulin in solution by heteronuclear multi-
dimensional NMR spectroscopy, Biochemistry 30, 9216–9228.
Chou, J.J., Li, S., Klee, C.B., & Bax, A. (2001) Solution structure of
Ca(2+)-calmodulin reveals flexible hand-like properties of its
domains, Nat. Struct. Biol. 8, 990–997. Problem 12–14: The figure is a portion of the two-dimen-
sional nuclear Overhauser enhanced spectrum341 of the
Problem 12–13: This figure is a two-dimensional lipoyl domain from the pyruvate dehydrogenase complex
nuclear magnetic resonance spectrum of the of B. stearothermophilus. Reprinted with permission from
cytochrome c-551 from Pseudomonas aeruginosa.340 ref 341. Copyright 1991 Blackwell Publishing.
Reprinted with permission from ref 340. Copyright 1990
American Chemical Society.
14,15 7
4,5
chemical shift (ppm)
6,5 4
D41
K42
A43
10,11
1hydrogen
8 V38 N40 5
12,11
Q39 C36
1hydrogen
E37
6,7 8,7
10,9
8,9 9 8 7
1hydrogen chemical shift (ppm)
9
1hydrogen
spectrum of the protein. What is this other type of
spectrum, and what type of connections does it
record? 104.2 98.8 98.8 99.9 97.6 99.7 100.0 101.1
(D) Some of the squares coincide with peaks in the E30 E31 E32 L33 V34 K35 L36 V37
nuclear Overhauser enhanced spectrum and
some do not. Why are there peaks in these posi-
(A) Peaks in the figure are labeled with pairs of num-
tions in the other spectrum but not in the nuclear
bers. Draw the polypeptide between Glutamate
Overhauser enhanced spectrum?
30 and Valine 37 as the polypeptide is drawn in
2–15. Draw, however, the actual side chains along
Problem 12–15: Immunity protein Im9 is a folded the polypeptide to identify each amino acid. Draw
polypeptide of 86 amino acids. It is responsible for double-headed arrows labeled with the same
inhibiting the action of colicin E9, an extracellular pairs of numbers as the peaks in the strips are
antibiotic protein produced by E. coli. Immunity pro- labeled and connecting every pair of hydrogens in
tein Im9 uniformly labeled with 15nitrogen was obtained your drawings that produce a labeled peak in the
by growing E. coli JM105 cells expressing high levels of strips for Glutamate 30 to Valine 37.
the protein on minimal medium made with [15N]NH4Cl.
The protein was purified, and the chemical shifts of (B) What are the peaks in the spectra that are labeled
most of the 1hydrogens and the 15nitrogens in the pro- with only a single number?
tein were assigned by the usual procedures. After the (C) What are the peaks in the spectra that are not in
chemical shifts had been assigned, a three-dimensional the center of their strip?
spectrum was taken in which the three dimensions
were the chemical shifts of 1hydrogen, 1hydrogen, and Human interleukin-4 is a member of the family of
15
nitrogen.342 The two hydrogen dimensions display hematopoietic cytokines that modulate cell proliferation
nuclear Overhauser connections between pairs of and differentiation within the immune system. It is a
hydrogens. Because the chemical shifts of each of the folded polypeptide of 130 amino acids. A gene encoding
nitrogens in the polypeptide had been assigned, it was human interleukin-4 was inserted into a pTR550 plasmid
possible to select sections through the three-dimen- so that the protein could be expressed in E. coli. The
sional spectrum, each fixed at the chemical shift of a human interleukin-4 expressed at high levels by these
particular backbone nitrogen in the dimension of the cells was uniformly labeled with 13carbon by growing
chemical shifts of the 15nitrogens. Narrow strips from them on minimal medium made with [13C]glycerol (99
the resulting successive two-dimensional 1H–1H nuclear atom%). The protein was purified and the chemical shifts
Overhauser enhanced spectra are displayed in the of most of the 1hydrogens and the 13carbons in the pro-
figure. Each strip is centered on the horizontal axis at tein were assigned by the usual procedures. After the
the chemical shift of the hydrogen on the amido nitro- chemical shifts had been assigned, a three-dimensional
gen the chemical shift of which is fixed. The value of the spectrum was taken in which the three dimensions were
chemical shift of each amido 15nitrogen at which the the chemical shifts of 1hydrogen, 1hydrogen, and
13
section was fixed and the identity of that amido nitro- carbon.273 The two hydrogen dimensions display
gen are indicated on each strip. Reprinted with permis- nuclear Overhauser connections between pairs of hydro-
sion from ref 342. Copyright 1994 American Chemical gens. Because the chemical shifts of each of the carbons
Society. in the polypeptide had been assigned, it was possible to
choose sections through the three-dimensional spec-
trum each fixed at the chemical shift of a particular
a carbon in the protein in the dimension of the chemical
shifts of the 13carbons. Strips from the resulting succes-
sive two-dimensional 1H–1H nuclear Overhauser
enhanced spectra are displayed in the figure. Reprinted
with permission from ref 273. Copyright 1994 Elsevier
B.V.
640 Physical Measurements of Structure
Exchange of Protons
d
An acidic proton at any position in the covalent structure
g2
of a protein is subject to exchange with protons in the
g1 g 1 solution. Protons on the side chains of exposed polar
amino acids such as asparagines, glutamines, aspartic
k ex = k D+ [ D+ ] + k OD – [ OD – ] (12–58)
0.4
where D stands for deuterium. At the minimum rate
(Figure 12–31), acid catalysis and base catalysis are of
0.2 equal magnitude and the domination of one over the
other inverts. The catalytic mechanisms are known to be
’(
0.1
’O‘ O
H ’O H
+ H+ – H+
1 1
:
N – H+ N + H+ N
0.04 H H “
O (2) O ( ) (2H for 1H)
CH3CNHCHCNHCH3 (12–59)
0.02
CH3
and
(min–1)
O (fi) O ( )
CH3CNHCHCNHCH3 ’O‘ ’O‘
CH2OH HO - + 1 + H2O
kobs
:
N N
0.4 O O( )
H “
(2H for 1H)
CH3CNHCHCNHCH3
(12–60)
CH2
0.2
C O respectively.345,346 In both cases, the removal of the
NH2 proton is the rate-limiting step in the reaction. The
0.1 second-order rate constants, kD+ and kOD-, have been
tabulated for the amido protons on either side of specific
side chains of amino acids in model compounds345 and
small peptides.347 The second-order rate constants for
0.04 acid catalysis, kD+, vary between 6 and 5000 M–1 min–1
and those for base catalysis, kOD-, vary between 2 ¥ 108
and 1 ¥ 1011 M–1 min–1.
0.02 When a protein such as myoglobin is incubated in
tritiated water for an extended period of time at moder-
3 4 5 6 ate temperature (37 ∞C), most of its amido protons reach
equilibrium with the protons and tritons in the water.
p2H
The tritiated water can then be replaced with untritiated
Figure 12–31: Magnitude of the observed first-order rate con- water by molecular exclusion chromatography. During
stants for the exchange of the amido proton on the amino-terminal the chromatography all of the tritons on exposed polar
side (solid symbols) or carboxy-terminal side (open symbols) in the
Na-acetyl-N-methyl amides of alanine (2, 3), serine (䉱, 䉭), and side chains exchange with protons. When the tritium
asparagine (䉫).345 The structure of each compound is drawn and remaining on the protein is measured by liquid scintilla-
the protons that exchange are in boldface type. Solutions of each tion counting as a function of time at low temperature
model compound were prepared in [2H]H2O at the noted p2H and (0 ∞C), a population of tritiated amides that lose their tri-
immediately introduced into a nuclear magnetic resonance spec- tons very slowly can be distinguished (Figure 12–32).348
trometer. The two amido protons in each compound produced the
usual splitting into a doublet of the absorption of the spin–spin- The rates at which these amides exchange are far slower
coupled 1hydrogen or 1hydrogens on the respective, immediately than the rates observed for small peptides in solution.347
adjacent carbons. As the proton on each nitrogen exchanged for a The amido hydrons on peptide bonds in a protein
deuteron, the respective doublet was converted into a singlet. The that exchange slowly with other hydron isotopes in the
areas of doublet and singlet were measured as a function of time, solvent are the hydrons that participate in stable hydro-
and it was observed that the doublet was converted to the singlet in
a first-order process. The rate of this process was converted to an gen bonds in the folded polypeptide.349 That the number
observed first-order rate constant, kobs (minute–1), and its value is of slowly exchanging tritons in myoglobin (Figure 12–32)
plotted logarithmically as a function of p2H. The curves are fits of is about equal to the number of amido protons partici-
Equation 12–58 to the data. Reprinted with permission from ref pating in buried hydrogen bonds in the crystallographic
345. Copyright 1972 American Chemical Society. molecular model (approximately 120) is consistent with
this conclusion.348 In the case of lysozyme, the exchange
642 Physical Measurements of Structure
160
A limitation of such global measurements of the
exchange of amido protons is that the identity of those
140
amides in the polypeptide that display slow exchange is not
established. One way to increase the resolution is to digest
120 quickly samples of the protein removed at different inter-
vals over the time during which exchange is permitted to
occur, separate the resulting peptides in each sample chro-
100 matographically, and use a mass spectrometer to assess
the extent of incorporation of deuterium into each of the
80 peptides.351,354 The exchange of protons is quenched at the
end of each interval in [2H]H2O by dropping the pH to 3
0 1 2 3 4 5 6 7 and the temperature to 0 ∞C to minimize further exchange.
Time (hr) The protein, unfolded by the low pH, is digested with
Figure 12–32: Exchange of tritons from myoglobin equilibrated pepsin A, an endopeptidase that functions best at low pH;
with [3H]H2O and then transferred to [1H]H2O at pH 5, 0 ∞C.348 and the resulting peptides are separated by chromatogra-
Myoglobin from P. catodon (naa = 153) was incubated at 37 ∞C and phy in [1H]H2O at low pH and low temperature. Each peptic
pH 9 with [3H]H2O until equilibrium was reached (20 h) at all of its peptide is then identified by its mass and its pattern of frag-
amides. The solution was cooled to 0 ∞C, and the protein was rap- mentation (Figure 3–8). In this way the amount of incor-
idly transferred by molecular exclusion chromatography to
[1H]H2O. Over these intervals of preequilibrium, only protons on poration of deuterium into the amides of the peptide bonds
amides and more acidic acid–bases on the protein should within a particular segment of the folded polypeptide in
exchange with tritons, and only the amides should retain any tri- the native protein, namely, that segment ending up in the
tium through the chromatography. The amount of tritium associ- peptic peptide, can be monitored.355,356 Still, the protons at
ated with the protein [moles of tritium (mole of protein)–1] was the individual amides within that segment cannot be dis-
followed as a function of time (hours). Myoglobin contains 162
amido protons, and this number agrees favorably with the 155 tri- tinguished one from the other.
tons found on the protein at the earliest time after the chromatog- The advantage of such endopeptidolytic analyses
raphy. Reprinted with permission from ref 348. Copyright 1969 is that they can be applied to large proteins355–358 such as
Academic Press. rabbit fructose-bisphosphate aldolase (naa = 4 ¥ 363), the
a-catalytic subunit of cyclic AMP-dependent protein
of protons for deuterons at the amides of the polypeptide kinase (naa = 350), human dual specificity mitogen-acti-
could be followed directly by observing the decrease in vated protein kinase kinase 1 (naa = 393), and dihy-
the absorbance of the amide II vibration in the infrared drodipicolinate reductase from E. coli (naa = 4 ¥ 273).
spectrum relative to the absorbance of the amide I vibra- Such large proteins cannot be analyzed by nuclear mag-
tion. The agreement between the number of slowly netic resonance. If, however, the protein is small enough,
exchanging amido protons (44 moles for every mole of nuclear magnetic resonance spectroscopy can provide
lysozyme) and the number of buried hydrogen bonds rates of exchange of most of the resolved amido protons
involving the amido protons of the polypeptide in the along the polypeptide backbone in the native structure.
crystallographic molecular model is also quite close.350 The cross-peaks in the fingerprint region of a two-
It is also possible to follow such global exchange of dimensional (1H–1H) nuclear magnetic resonance cor-
the protons in a protein by mass spectrometry. The pro- related spectrum (Figure 12–33)359 or a two-dimensional
tein is diluted into [2H]H2O, and samples are removed at (15N–1H) HSQC correlation spectrum360 arise from
successive times and submitted to electrospray mass spin–spin coupling between an amido proton and its
spectrometry to determine the number of deuterons that adjacent a 1hydrogen or amido 15nitrogen, respectively.
have been incorporated by exchange during each inter- When the protein is transferred from [1H]H2O to [2H]H2O,
val.351–353 The samples are usually submitted to liquid each of these cross-peaks decreases in intensity as the
chromatography in [1H]H2O immediately prior to mass amido proton exchanges for a deuteron (Figure
spectrometry to remove the [2H]H2O. This chromatogra- 12–33).359,360 The rate of exchange of each proton is equal
phy is performed at pH 3 and 0 ∞C to prevent exchange of to the rate at which its cross-peak decreases in intensity.
the amido deuterons with protons but to permit It is generally assumed that a proton on a particular
exchange at carboxylic acids, amines, hydroxyls, thiols, amide in the polypeptide backbone of a protein can
and imidazoles, because the intention of such measure- exchange with a deuteron in the [2H]H2O surrounding
ments is to monitor the global exchange of amido pro- the protein only when that proton is exposed to the solu-
tons in peptide bonds. In some instances different tion. An unexposed position in the polypeptide back-
populations of protons that exchange at different rates bone becomes exposed as the result of a conformational
can be distinguished. In the case of fructose-bisphos- change in the protein.361,362 Although the details of such
phate aldolase from rabbit, these populations were conformational changes are unknown, they must differ
Exchange of Protons 643
15,000 min
from one location in the protein to another because pro-
4 tons exchange with a wide range of rate constants.
L6 Conformational changes that expose protected protons
to the solution may be relatively rapid movements of
5 loops on the surface, motions opening crevices in the
chemical shift (ppm)
1,720 min
hydrogen bonds must be broken during the conforma-
4 tional change to permit the unretarded368 exchange of
the protons to occur.349
C14 A16
Associated with whatever conformational change is
5 responsible for the exchange of a particular proton in a
Y10 protein is a rate of opening kop and a rate of closing kcl.
The kinetic mechanism for the process is369
660 min
G28 kop ex k
V34 G56
4 unexchangeable 1
k
exchangeable Æ exchanged
cl (12–61)
C5
rate constants kop for the various conformational it is bound by a monoclonal immunoglobulin.379 It was
changes permitting exchange. concluded that this region formed the epitope on the
In lysozyme from G. gallus, there is a group of surface of the cytochrome c. When a synthetic peptide
amido protons in the core of the protein that all with the amino acid sequence from Alanine 1730 to
exchange their protons 106 times more slowly than the Leucine 1747 in smooth muscle [myosin-light-chain]
same amides in a small peptide.350 This group of amides kinase from G. gallus, which was known to be the site to
in the center of the protein may be exposed to the sol- which calmodulin binds in the intact protein, associates
vent only during large cooperative unfoldings of a con- with calmodulin, the rates of exchange of 12 of its amido
siderable fraction of the protein that expose many amido protons decrease by factors between 103 and 106 as it
protons simultaneously for a short time before the forms the a helix embraced by the calmodulin in the
polypeptide snaps shut again. If this is the case, these complex.363
deeply buried regions of lysozyme spend 10–6 of their life Results of such experiments, however, should be
in an unfolded state. The individual amido protons in evaluated with caution. Effects on the exchange of amido
the polypeptide backbone of bovine basic pancreatic protons can be felt at locations distant from the site of
trypsin inhibitor, however, have a rather heterogeneous interaction because the association of a protein with a
set of rate constants of exchange (Figure 12–33).376 This ligand always increases its global stability and conse-
suggests that local vibrational modes producing local quently decreases the time its entire structure spends in
unfoldings are responsible, at least in this protein, for open conformations. For example, the binding of thymi-
performing the disconnections of a particular structure dine 3¢,5¢-diphosphate to micrococcal nuclease, causes
required to expose the amide to the solvent so that its the rate of exchange of at least 34 of its amido protons to
proton can exchange. Whether the motions responsible decrease dramatically380 even though the site at which
for the exchange of a particular proton are local, the ligand binds encompasses far fewer amino acids. In
regional, or global, the recurring observation that all or this instance, the binding of the ligand stabilizes the pro-
almost all of the amido protons, at least of small pro- tein globally and decreases its conformational fluctua-
teins, eventually exchange means that a molecule of pro- tions. Likewise, when NADH is bound to dihydro-
tein is continuously breathing or unfolding throughout dipicolinate reductase, the rates of exchange of amido
its lifetime. protons in widely different locations showed significant
It is also possible to observe the exchange of pro- decreases.355
tons by neutron diffraction. Crystals of protein are rou-
tinely prepared for neutron diffraction by soaking them Suggested Reading
in [2H]H2O to replace as many protons as possible with
Wand, A.J., Roder, H., & Englander, S.W. (1986) Two-dimensional
deuterons, which scatter neutrons more strongly. 1
H NMR studies of cytochrome c: hydrogen exchange in the
Molecules of trypsin377 and ribonuclease378 that had been N-terminal helix, Biochemistry 25, 1107–1114.
soaking within crystals in [2H]H2O for periods of 1 year
retained protons on 54 and 28 of their amides, respec-
tively. All of the sites that remained unexchanged were in Electron Paramagnetic Resonance
the interior of the folded polypeptide. These were mainly
on the central strands of b sheets and at the centers of If an atomic orbital or a molecular orbital contains a
a helices. Most of the sites retaining protons after 1 year pair of electrons, the magnetic moments of their spins
were not even partially exchanged with deuterons, while cancel because of the Pauli principle, and that orbital is
those sites that had deuteriums were almost fully diamagnetic. If an orbital contains a single unpaired
exchanged. The location of these sites and their occur- electron, that electron has an uncancelled magnetic
rence in regions with very little thermal motion suggest moment and that orbital is paramagnetic. There are
that, in a crystal, only local motions are responsible for several ways in which a molecule of protein can contain
what exchange takes place. Because large regional an orbital with an unpaired electron. A paramagnetic
unfolding or the complete unfolding of the polypeptide ion of a transition metal such as Mn2+, Fe3+, Co2+, Ni3+, or
cannot occur in a crystal, these observations provide fur- Cu2+ can be bound to the protein either on its own or
ther evidence that the exchange of protons at deep loca- within a coenzyme like the heme in ferrimyoglobin
tions in a protein observed in free solution result from (Figure 4–18). A stable organic radical like the glycyl
such types of extensive unfolding. radical in formate C-acetyltransferase or the tyrosyl rad-
Observations of the exchange of protons can be ical in ribonucleoside-diphosphate reductase can be
used to examine heterologous associations. For exam- formed by posttranslational modification of the protein
ple, the rates of exchange of 11 amido protons in the (Table 3–1). A coenzyme bound to the protein can con-
polypeptide backbone of equine cytochrome c, which tain an organic radical.381 The protein can be modified
are at unrelated positions in its amino acid sequence with a reagent in which there is a stable organic radical,
but form a continuous region on the surface of the terti- such as the one in a 1-oxyl-2,2,5,5-tetramethylpyrrolin-
ary structure of the protein, decrease significantly when 3-yl group:
646 Physical Measurements of Structure
H 3C CH3
d A /d B
.
H 3C N( CH3 0
:
O
‘) ’
12–11
Absorption
Such a functional group can be coupled covalently either
by incorporating an unnatural amino acid containing it
into the protein in a cell-free translation157 or by modify-
ing the protein with an electrophilic reagent containing
it.382,383
An unpaired electron can have a spin quantum
number of +" or –", and these two quantum numbers
dictate two respective spin states with two respective 0.32 0.33
angular velocities of the same magnitude but opposite Flux density (T)
polarity. When an unpaired electron is placed in an Figure 12–34: Electron paramagnetic resonance spectra of a
external homogeneous magnetic field, the axis of its 0.5 mM solution of CDP-6-deoxy-L-threo-D-glycero-4-hexulose-
spin tends to align with the direction of the applied field. 3-dehydrase in the presence of 10 mM CDP-6-deoxy-L-glycero-
The two degenerate energy levels of the spinning elec- 4-hexulose and 1 mM NADH frozen at 77 K.384 The bottom trace is
the absorption as a function of the flux density of the magnetic field
tron are split into two distinct energy levels, one for the (tesla) at a carrier frequency of 9.05 GHz. The spectrum was
spin aligned in the direction of the magnetic field and obtained by varying the magnetic field while the microwave fre-
one for the spin aligned in the direction opposed to the quency remained fixed. The top trace is the first derivative (change
magnetic field. The difference in energy between these in absorption/change in magnetic flux density) of the bottom trace;
two spin states for a given population i of identical the dotted line sits on a value of zero for the first derivative. The
[2Fe–2S] cluster in the protein contains an unpaired electron that
unpaired electrons, DEi, is directly proportional to the absorbs at g factors of 2.012, 1.950, and 1.932, producing the three
magnetic flux density Bi (tesla) at the location of peaks of absorption in the two spectra at 0.321, 0.331, and 0.334 T,
unpaired electron i. The frequency ni (hertz) of electro- respectively. Reprinted with permission from ref 384. Copyright
magnetic energy that is absorbed by the population of 1996 American Chemical Society.
electrons i during its transition between these spin
states is trum that distinguish it from a nuclear magnetic reso-
nance spectrum.
DE i = g e m B B i = h ni (12–64) For unpaired electrons on ions of transition metals,
the intensity of the electron paramagnetic absorption
where ge is the g factor of a free electron (2.0023) and mB increases and the width of the peak of absorption
is the Bohr magneton (9.27 ¥ 10–24 J T–1). At magnetic flux decreases dramatically as the temperature of the sample
densities normally used for electron paramagnetic reso- is lowered. For unpaired electrons on carbon, nitrogen,
nance (<2 T) the difference in energy is less than 20 J or oxygen, the intensity of the absorption also increases
mol–1, the energy contained in a photon of frequency less as the temperature is lowered. Consequently, electron
than 40 GHz, which is the microwave range of electro- paramagnetic resonance is often monitored while a
magnetic energy. sample of the protein containing the unpaired electron is
As with continuous-wave nuclear magnetic reso- in the frozen state at low temperature. For example, the
nance spectrometers, an electron paramagnetic reso- electron paramagnetic resonance spectrum of aminocy-
nance spectrometer has a microwave generator of a clopropane carboxylate oxidase, the iron in whose heme
fixed frequency, for example, 35 or 9 GHz, and the mag- had been complexed with nitrous oxide, was observed at
netic flux density is varied while the absorption of energy 8 K,385 and that of the molybdenum–iron protein of nitro-
is monitored (Figure 12–34).384 Peaks of absorption are genase, the metal cluster of which had been complexed
observed when Equation 12–64 is satisfied for a given with ethene, was observed at 2 K.386 At such low temper-
population of identical electrons of unpaired spin. As atures, molecular motions and chemical reactions are
with nuclear magnetic resonance, saturation of the severely limited. The most convenient low temperature
absorption occurs readily in electron paramagnetic reso- is 77 K, the boiling point of liquid nitrogen, but the spec-
nance owing to the small difference in the occupancy of tra of many types of organic radicals can be observed at
the two spin states (1 < Ksp < 1.008) and the slow rate of room temperature even though the amplitudes of the
relaxation between them. There are, however, several peaks of absorption are less than they would be at lower
features of an electron paramagnetic resonance spec- temperatures.
Electron Paramagnetic Resonance 647
The absorption of microwave energy is usually density, Bloc, created by the nitrogen nucleus assumes
monitored as its first derivative with respect to the flux three values, one of which is zero. The spectrum consists
density of the magnetic field. Consequently, peaks of of a central absorption, arising from unpaired electrons
absorbance appear as pairs of positive and negative coupled to nitrogen nuclei of spin quantum number 0
deflections representing the positive slope of the rising and from unpaired electrons not coupled to any mag-
phase and the negative slope of the declining phase of netic nucleus, and two peaks of hyperfine absorption on
the absorption itself, and the differential passes through either side of the central peak, arising from unpaired
zero at the maximum of each absorption (Figures 12–34 electrons coupled to nitrogen nuclei of spin quantum
and 12–35).383 number +1 and –1. The hyperfine absorptions are of vari-
The absorption from a particular population of able magnitude and are split different distances from the
identical unpaired electrons is often split into two or central absorption depending on the quality of the cou-
more separate peaks by spin–spin coupling, either pling between the nitrogen and the electron, but the cen-
between the electron and magnetic nuclei connected to tral absorption will always be fixed because it is at the
it by covalent bonds or between the electron and a mag- position where the contribution of the 14nitrogen to the
netic nucleus on which it happens to reside. As in nuclear local magnetic flux density is zero. Information about
magnetic resonance, this spin–spin coupling results environment, rotational diffusion, and anisotropy is con-
from local perturbations to the applied magnetic field. tained in the hyperfine absorptions.
These perturbations are caused by differences in the ori- The full coupling of the electron to the 14nitrogen is
entations of the spins of the coupled nuclei, the magnetic expressed in aqueous solution (Figure 12–35) because
fields of which are transmitted through the diamagnetic the electronic structure in which the radical occupies a
electrons in the covalent bonds surrounding the p orbital over nitrogen, a distribution which requires a
unpaired electron. As a result, the magnetic flux density separation of charge, can be readily solvated by the
sensed by a given electron i, Bi, is the sum of the flux den- water. In nonpolar environments such as within a mole-
sity of the applied magnetic field, Bapp, and the flux den- cule of protein, the nitroxyl radical can shift its hybridiza-
sities of any local magnetic fields, Bloc, created by these tion to form an ethenyl molecular orbital system
coupled magnetic nuclei. The spin–spin splitting in elec-
tron paramagnetic resonance is referred to as hyperfine
splitting; the resulting pattern of peaks, as hyperfine
bonding orbital
structure; and the spin–spin coupling, as hyperfine cou- N O antibonding orbital
pling.
The 1-oxyl-2,2,5,5-tetramethylpyrrolin-3-yl group
(12–11) serves as a simple example of hyperfine splitting 12–12
(Figure 12–35). In this stable free radical, the nucleus that
dominates the local magnetic field is that of the 14nitro- composed from one of the lone pairs on oxygen and the
gen, which has a spin quantum number of 1. The 14nitro- radical. The unpaired electron occupies the antibonding
gen nucleus is quadripolar and can assume spins of +1, 0, molecular orbital and spends more of its time over
and –1 with equal probability, because the distribution oxygen, which is diamagnetic. This delocalization
among its energy levels is insignificantly affected by the decreases the effect of the quadripolar nucleus of the
14
applied magnetic field. As a result, the local magnetic flux nitrogen, and the two hyperfine components decrease
accordingly in intensity.
Hyperfine coupling can be used to draw conclu-
sions about the properties and location of the unpaired
electron. The unpaired electron on the glycyl radical in
formate C-acetyltransferase is unaffected by the diamag-
netic 12carbon on which it resides but is split into two
peaks by the single 1hydrogen on that carbon (Figure
12–36).387 When formate C-acetyltransferase is trans-
ferred to [2H]H2O, the 1hydrogen exchanges with 2hydro-
gen in the solution in a reaction catalyzed by a nearby
cysteine, and the spin–spin splitting disappears.
Figure 12–35: Electron paramagnetic spectrum of 3-carbamoyl- Tryptophan tryptophylquinone is present in amine
1-oxyl-2,2,5,5-tetramethylpyrroline (see 12–11) in water at room dehydrogenase as a posttranslational modification
temperature.383 The scale indicates the dimension of the horizon- (Table 3–1, Figure 3–18). A model compound for trypto-
tal axis in units of magnetic flux density (tesla). The carrier fre- phan tryptophylquinone in which the two
quency of the spectrophotometer was 9.5 GHz. The first derivative +
of the absorption (change in absorption/change in magnetic flux H3N(COO–)CH– groups of the bis(amino acid) are
density) is presented. Reprinted with permission from ref 383. replaced with hydrogens can be reduced with one elec-
Copyright 1965 held by authors. tron to produce the semiquinone with an unpaired elec-
648 Physical Measurements of Structure
hn 0
g = (12–65)
B app m B
Selected Data for Molecular Biology (Sober, H., Ed.) pp Vachette, P. (1997) Proteins: Struct., Funct., Genet. 27,
C-3 to C-12, CRC Press, Cleveland, OH. 110–117.
17. Zubrzycki, I.Z., Frankel, L.K., Russo, P.S., & Bricker, 46. Pickover, C.A., McKay, D.B., Engelman, D.M., & Steitz,
Biochemistry 37, 13553–13558. T.A. (1979) J. Biol. Chem. 254, 11323–11329.
18. Phillips, M.L., Lembertas, A.V., Schumaker, V.N., Lawn, 47. Jin, L., Stec, B., Lipscomb, W.N., & Kantrowitz, E.R.
R.M., Shire, S.J., & Zioncheck, T.F. (1993) Biochemistry (1999) Proteins: Struct., Funct., Genet. 37, 729–742.
32, 3722–3728. 48. Fetler, L., & Vachette, P. (2001) J. Mol. Biol. 309,
19. Mani, R.S., Karimi-Busheri, F., Cass, C.E., & Weinfeld, 817–832.
M. (2001) Biochemistry 40, 12967–12973. 49. Valentine, R.C., Shapiro, B.M., & Stadtman, E.R. (1968)
20. Simha, R. (1940) J. Phys. Chem. 44, 25–34. Biochemistry 7, 2143–2152.
21. Van Holde, K.E. (1985) Physical Biochemistry, 2nd ed., 50. Richards, K.E., & Williams, R.C. (1972) Biochemistry 11,
Prentice-Hall, Englewood Cliffs, NJ. 3393–3395.
22. Rocco, M., Infusini, E., Daga, M.G., Gogioso, L., & 51. Williams, R.C. (1981) J. Mol. Biol. 150, 399–408.
Cuniberti, C. (1987) EMBO J. 6, 2343–2349. 52. Yang, Z., Kollman, J.M., Pandi, L., & Doolittle, R.F.
23. Levinson, B.L., Pickover, C.A., & Richards, F.M. (1983) (2001) Biochemistry 40, 12515–12523.
J. Biol. Chem. 258, 10967–10972. 53. Koch, M., Bohrmann, B., Matthison, M., Hagios, C.,
24. Lai, C.S., Wolff, C.E., Novello, D., Griffone, L., Trueb, B., & Chiquet, M. (1995) J. Cell Biol. 130,
Cuniberti, C., Molina, F., & Rocco, M. (1993) J. Mol. 1005–1014.
Biol. 230, 625–640. 54. Koch, M., Bernasconi, C., & Chiquet, M. (1992) Eur. J.
25. Kataoka, M., Nishii, I., Fujisawa, T., Ueki, T., Tokunaga, Biochem. 207, 847–856.
F., & Goto, Y. (1995) J. Mol. Biol. 249, 215–228. 55. Shotton, D.M., Burke, B.E., & Branton, D. (1979) J. Mol.
26. Olah, G.A., Mitchell, R.D., Sosnick, T.R., Walsh, D.A., & Biol. 131, 303–329.
Trewhella, J. (1993) Biochemistry 32, 3649–3657. 56. Schramm, H.J., & Jennissen, H.P. (1985) J. Mol. Biol.
27. Trewhella, J., Carlson, V.A., Curtis, E.H., & Heidorn, 181, 503–516.
D.B. (1988) Biochemistry 27, 1121–1125. 57. Suzuki, K., Dahlbeack, B., & Stenflo, J. (1982) J. Biol.
28. Guinier, A. (1939) Ann. Phys. 12, 161–237. Chem. 257, 6556–6564.
29. Vachette, P., Koch, M.H., & Svergun, D.I. (2003) 58. Laue, T.M., Johnson, A.E., Esmon, C.T., & Yphantis,
Methods Enzymol. 374, 584–615. D.A. (1984) Biochemistry 23, 1339–1348.
30. Chacon, P., Diaz, J.F., Moran, F., & Andreu, J.M. (2000) 59. Dahlbeack, B. (1986) J. Biol. Chem. 261, 9495–9501.
J. Mol. Biol. 299, 1289–1302. 60. Fox, J.W., Mayer, U., Nischt, R., Aumailley, M.,
31. Koch, M.H., Vachette, P., & Svergun, D.I. (2003) Q. Rev. Reinhardt, D., Wiedemann, H., Mann, K., Timpl, R.,
Biophys. 36, 147–227. Krieg, T., Engel, J., et al. (1991) EMBO J. 10, 3137–
32. Perkins, S.J., Nealis, A.S., Sutton, B.J., & Feinstein, A. 3146.
(1991) J. Mol. Biol. 221, 1345–1366. 61. Ertl, H., Hallmann, A., Wenzl, S., & Sumper, M. (1992)
33. Shilton, B.H., Flocco, M.M., Nilsson, M., & Mowbray, EMBO J. 11, 2055–2062.
S.L. (1996) J. Mol. Biol. 264, 350–363. 62. Sasaki, T., Kostka, G., Gohring, W., Wiedemann, H.,
34. Grossmann, J.G., Neu, M., Pantos, E., Schwab, F.J., Mann, K., Chu, M.L., & Timpl, R. (1995) J. Mol. Biol. 245,
Evans, R.W., Townes-Andrews, E., Lindley, P.F., Appel, 241–250.
H., Thies, W.G., & Hasnain, S.S. (1992) J. Mol. Biol. 225, 63. Wang, C.L., Chalovich, J.M., Graceffa, P., Lu, R.C.,
811–819. Mabuchi, K., & Stafford, W.F. (1991) J. Biol. Chem. 266,
35. Svergun, D., Barberato, C., & Koch, M.H.J. (1995) J. 13958–13963.
Appl. Crystallogr. 28, 768–773. 64. Voss, T., Eistetter, H., Schafer, K.P., & Engel, J. (1988) J.
36. Grossman, J.G., Hasnain, S.S., Yousafzai, F.K., Smith, Mol. Biol. 201, 219–227.
B.E., & Eady, R.R. (1997) J. Mol. Biol. 266, 642–648. 65. Boisset, N., Taveau, J.C., Pochon, F., Barray, M., Delain,
37. Mendelson, R.A., Schneider, D.K., & Stone, D.B. (1996) E., & Lamy, J.N. (1991) J. Struct. Biol. 106, 31–41.
J. Mol. Biol. 256, 1–7. 66. Crowther, R.A. (1971) Philos. Trans. R. Soc. London, Ser.
38. DiCapua, E., Schnarr, M., Ruigrok, R.W., Lindner, P., & B: Biol. Sci. 261, 221–230.
Timmins, P.A. (1990) J. Mol. Biol. 214, 557–570. 67. Kessel, M., Frank, J., & Goldfarb, W. (1980) J. Supramol.
39. Perkins, S.J., Nealis, A.S., & Sim, R.B. (1991) Struct. 14, 405–422.
Biochemistry 30, 2847–2857. 68. Baker, T.S., Newcomb, W.W., Booy, F.P., Brown, J.C., &
40. Perkins, S.J., Nealis, A.S., & Sim, R.B. (1990) Steven, A.C. (1990) J. Virol. 64, 563–573.
Biochemistry 29, 1167–1175. 69. Fuller, S.D. (1987) Cell 48, 923–934.
41. Capel, M.S., Kjeldgaard, M., Engelman, D.M., & Moore, 70. Baker, T.S., Newcomb, W.W., Olson, N.H., Cowsert,
P.B. (1988) J. Mol. Biol. 200, 65–87. L.M., Olson, C., & Brown, J.C. (1991) Biophys. J. 60,
42. Moore, P.B., & Engelman, D.M. (1979) Methods 1445–1456.
Enzymol. 59, 629–638. 71. Crowther, R.A., Amos, L.A., Finch, J.T., De Rosier, D.J.,
43. Brodersen, D.E., Clemons, W.M., Jr., Carter, A.P., & Klug, A. (1970) Nature 226, 421–425.
Wimberly, B.T., & Ramakrishnan, V. (2002) J. Mol. Biol. 72. Kolatkar, P.R., Bella, J., Olson, N.H., Bator, C.M., Baker,
316, 725–768. T.S., & Rossmann, M.G. (1999) EMBO J. 18, 6249–6259.
44. Grossmann, J.G., Sharff, A.J., O’Hare, P., & Luisi, B. 73. Olson, N.H., Kolatkar, P.R., Oliveira, M.A., Cheng, R.H.,
(2001) Biochemistry 40, 6267–6274. Greve, J.M., McClelland, A., Baker, T.S., & Rossmann,
45. Svergun, D.I., Barberato, C., Koch, M.H., Fetler, L., & M.G. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 507–511.
652 Physical Measurements of Structure
74. Radermacher, M., Wagenknecht, T., Verschoor, A., & 104. Wang, Y., Purrello, R., Georgiou, S., & Spiro, T.G. (1991)
Frank, J. (1986) J. Microsc. 141, RP1–2. J. Am. Chem. Soc. 113, 6368–6377.
75. Boisset, N., Penczek, P., Pochon, F., Frank, J., & Lamy, 105. Chi, Z., Chen, X.G., Holtz, J.S., & Asher, S.A. (1998)
J. (1993) J. Mol. Biol. 232, 522–529. Biochemistry 37, 2854–2864.
76. Radermacher, M., Wagenknecht, T., Verschoor, A., & 106. Susi, H., & Byler, D.M. (1986) Methods Enzymol. 130,
Frank, J. (1987) EMBO J. 6, 1107–1114. 290–311.
77. Carazo, J.M., Wagenknecht, T., Radermacher, M., 107. Lee, D.C., Haris, P.I., Chapman, D., & Mitchell, R.C.
Mandiyan, V., Boublik, M., & Frank, J. (1988) J. Mol. (1990) Biochemistry 29, 9185–9193.
Biol. 201, 393–404. 108. Holloway, P.W., & Mantsch, H.H. (1989) Biochemistry
78. Ban, N., Nissen, P., Hansen, J., Capel, M., Moore, P.B., 28, 931–935.
& Steitz, T.A. (1999) Nature 400, 841–847. 109. Heimburg, T., Schuenemann, J., Weber, K., & Geisler,
79. Shulman, S. (1953) J. Am. Chem. Soc. 75, 5846–5852. N. (1996) Biochemistry 35, 1375–1382.
80. Lowey, S., Slayter, H.S., Weeds, A.G., & Baker, H. (1969) 110. Garfinkel, D., & Edsall, J.T. (1958) J. Am. Chem. Soc. 80,
J. Mol. Biol. 42, 1–29. 3818–3822.
81. Rayment, I., Rypniewski, W.R., Schmidt-Base, K., Smith, 111. Holzwarth, G., & Doty, P. (1965) J. Am. Chem. Soc. 87,
R., Tomchick, D.R., Benning, M.M., Winkelmann, D.A., 218–228.
Wesenberg, G., & Holden, H.M. (1993) Science 261, 50–58. 112. Beychok, S. (1967) in Poly-a-Amino Acids: Protein
82. Tsao, T.C., Bailey, K., & Adair, G.S. (1951) J. Biochem. Models for Conformational Studies (Fasman, G.D., Ed.)
(Tokyo) 49, 27–36. pp 293–337, Marcel Dekker, New York.
83. Holtzer, A., & Lowey, S. (1959) J. Am. Chem. Soc. 81, 113. Yu, C.A., Yong, F.C., Yu, L., & King, T.E. (1971) Biochem.
1370–1377. Biophys. Res. Commun. 45, 508–513.
84. Yoshimura, T., Kameyama, K., Maezawa, S., & Takagi, 114. Angelaccio, S., Pascarella, S., Fattori, E., Bossa, F.,
T. (1991) Biochemistry 30, 4528–4534. Strong, W., & Schirch, V. (1992) Biochemistry 31,
85. Roberts, J.D., & Caserio, M.C. (1977) Basic Principles of 155–162.
Organic Chemistry, 2nd ed., W. A. Benjamin, Menlo 115. Slutter, C.E., Sanders, D., Wittung, P., Malmstrom,
Park, CA. B.G., Aasa, R., Richards, J.H., Gray, H.B., & Fee, J.A.
86. Kandori, H., Yoshihara, K., & Tokutomi, S. (1992) J. Am. (1996) Biochemistry 35, 3387–3395.
Chem. Soc. 114, 10958–10959. 116. MacColl, R., Williams, E.C., Eisele, L.E., &
87. Susi, H. (1972) Methods Enzymol. 26C, 455–472. McNaughton, P. (1994) Biochemistry 33, 6418–6423.
88. Dong, A., Huang, P., & Caughey, W.S. (1990) 117. Brahms, S., & Brahms, J. (1980) J. Mol. Biol. 138,
Biochemistry 29, 3303–3308. 149–178.
89. Bussian, B.M., & Sander, C. (1989) Biochemistry 28, 118. Moffitt, W. (1956) Proc. Natl. Acad. Sci. U.S.A. 42,
4271–4277. 736–746.
90. Chetverin, A.B., & Brazhnikov, E.V. (1985) J. Biol. Chem. 119. Greenfield, N., & Fasman, G.D. (1969) Biochemistry 8,
260, 7817–7819. 4108–4116.
91. Challou, N., Goormaghtigh, E., Cabiaux, V., Conrath, 120. Griffin, J.H., Rosenbusch, J.P., Weber, K.K., & Blout,
K., & Ruysschaert, J.M. (1994) Biochemistry 33, E.R. (1972) J. Biol. Chem. 247, 6482–6490.
6902–6910. 121. Gresalfi, T.J., & Wallace, B.A. (1984) J. Biol. Chem. 259,
92. Fahmy, K., Weidlich, O., Engelhard, M., Sigrist, H., & 2622–2628.
Siebert, F. (1993) Biochemistry 32, 5862–5869. 122. Renzoni, D.A., Pugh, D.J., Siligardi, G., Das, P., Morton,
93. Potter, W.T., Houtchens, R.A., & Caughey, W.S. (1985) C.J., Rossi, C., Waterfield, M.D., Campbell, I.D., &
J. Am. Chem. Soc. 107, 3350–3352. Ladbury, J.E. (1996) Biochemistry 35, 15646–15653.
94. Sage, J.T., & Jee, W. (1997) J. Mol. Biol. 274, 21–26. 123. Edelhoch, H. (1967) Biochemistry 6, 1948–1954.
95. Lord, R.C., & Yu, N.T. (1970) J. Mol. Biol. 51, 203–213. 124. Fasman, G.D. (1975–1977), Handbook of Biochemistry
96. Benevides, J.M., Kukolj, G., Autexier, C., Aubrey, K.L., and Molecular Biology, 3rd ed., Vol. I, pp 186, CRC
DuBow, M.S., & Thomas, G.J., Jr. (1994) Biochemistry Press, Cleveland, OH.
33, 10701–10710. 125. Cuatrecasas, P., Fuchs, S., & Anfinsen, C.B. (1968) J.
97. Overman, S.A., & Thomas, G.J., Jr. (1999) Biochemistry Biol. Chem. 243, 4787–4798.
38, 4018–4027. 126. Kirtley, M.E., & Koshland, D.E., Jr. (1972) Methods
98. Whiting, A.K., & Peticolas, W.L. (1994) Biochemistry 33, Enzymol. 26C, 578–601.
552–561. 127. Jacobson, G.R., & Stark, G.R. (1973) J. Biol. Chem. 248,
99. Duff, L.L., Appelman, E.H., Shriver, D.F., & Klotz, I.M. 8003–8014.
(1979) Biochem. Biophys. Res. Commun. 90, 1098–1103. 128. Wang, C., Yang, Y.R., Hu, C.Y., & Schachman, H.K.
100. Carey, P.R., Schneider, H., & Bernstein, H.J. (1972) (1981) J. Biol. Chem. 256, 7028–7034.
Biochem. Biophys. Res. Commun. 47, 588–595. 129. Lakowicz, J.R., & Weber, G. (1973) Biochemistry 12,
101. Ling, J., Nestor, L.P., Czernuszewicz, R.S., Spiro, T.G., 4171–4179.
Fraczkiewicz, R., Sharma, K.D., Loehr, T.M., & Sanders- 130. Calhoun, D.B., Vanderkooi, J.M., & Englander, S.W.
Loehr, J. (1994) J. Am. Chem. Soc. 116, 7682–7691. (1983) Biochemistry 22, 1533–1539.
102. Kahlow, M.A., Zuberi, T.M., Gennis, R.B., & Loehr, T.M. 131. Szpikowska, B.K., Beechem, J.M., Sherman, M.A., &
(1991) Biochemistry 30, 11485–11489. Mas, M.T. (1994) Biochemistry 33, 2217–2225.
103. Hildebrandt, P., Matysik, J., Schrader, B., Scharf, B., & 132. Nishimura, J.S., Mann, C.J., Ybarra, J., Mitchell, T., &
Engelhard, M. (1994) Biochemistry 33, 11426–11431. Horowitz, P.M. (1990) Biochemistry 29, 862–865.
References 653
133. Nelson, S.W., Iancu, C.V., Choe, J.Y., Honzatko, R.B., & 164. Tamada, T., Kitadokoro, K., Higuchi, Y., Inaka, K.,
Fromm, H.J. (2000) Biochemistry 39, 11100–11106. Yasui, A., de Ruiter, P.E., Eker, A.P., & Miki, K. (1997)
134. Divita, G., Rittinger, K., Restle, T., Immendorfer, U., & Nat. Struct. Biol. 4, 887–891.
Goody, R.S. (1995) Biochemistry 34, 16337–16346. 165. Kim, S.T., Heelis, P.F., Okamura, T., Hirata, Y., Mataga,
135. Willaert, K., Loewenthal, R., Sancho, J., Froeyen, M., N., & Sancar, A. (1991) Biochemistry 30, 11262–11270.
Fersht, A., & Engelborghs, Y. (1992) Biochemistry 31, 166. Wu, P., Li, Y.K., Talalay, P., & Brand, L. (1994)
711–716. Biochemistry 33, 7415–7422.
136. Harris, D.L., & Hudson, B.S. (1990) Biochemistry 29, 167. Knighton, D.R., Zheng, J.H., Ten Eyck, L.F., Ashford,
5276–5285. V.A., Xuong, N.H., Taylor, S.S., & Sowadski, J.M. (1991)
137. Lehrer, S.S. (1971) Biochemistry 10, 3254–3263. Science 253, 407–414.
138. Eftink, M.R., & Ghiron, C.A. (1975) Proc. Natl. Acad. Sci. 168. Grossman, S.H. (1989) Biochemistry 28, 4894–4902.
U.S.A. 72, 3290–3294. 169. Rao, J.K., Bujacz, G., & Wlodawer, A. (1998) FEBS Lett.
139. Varley, P.G., & Pain, R.H. (1991) J. Mol. Biol. 220, 439, 133–137.
531–538. 170. Kempe, T.D., & Stark, G.R. (1975) J. Biol. Chem. 250,
140. Calhoun, D.B., Vanderkooi, J.M., Woodrow, G.V.d., & 6861–6869.
Englander, S.W. (1983) Biochemistry 22, 1526–1532. 171. Hahn, L.H., & Hammes, G.G. (1978) Biochemistry 17,
141. Prasad, A.R., Nishimura, J.S., & Horowitz, P.M. (1983) 2423–2429.
Biochemistry 22, 4272–4275. 172. Jin, L., Stec, B., & Kantrowitz, E.R. (2000) Biochemistry
142. Wolodko, W.T., Fraser, M.E., James, M.N., & Bridger, 39, 8058–8066.
W.A. (1994) J. Biol. Chem. 269, 10883–10890. 173. Kosman, R.P., Gouaux, J.E., & Lipscomb, W.N. (1993)
143. Merrill, A.R., Palmer, L.R., & Szabo, A.G. (1993) Proteins: Struct., Funct. Genet. 15, 147–176.
Biochemistry 32, 6974–6981. 174. Gouaux, J.E., & Lipscomb, W.N. (1990) Biochemistry 29,
144. Stryer, L., & Haugland, R.P. (1967) Proc. Natl. Acad. Sci. 389–402.
U.S.A. 58, 719–726. 175. Brejc, K., van Dijk, W.J., Klaassen, R.V., Schuurmans,
145. Wu, C.W., & Stryer, L. (1972) Proc. Natl. Acad. Sci. U.S.A. M., van Der Oost, J., Smit, A.B., & Sixma, T.K. (2001)
69, 1104–1108. Nature 411, 269–276.
146. Latt, S.A., Cheung, H.T., & Blout, E.R. (1965) J. Am. 176. Amir, D., & Haas, E. (1987) Biochemistry 26, 2162–
Chem. Soc. 87, 995–1003. 2175.
147. Forster, T. (1948) Ann. Phys. (Berlin) [6 Folge] 2, 55–75. 177. Babu, Y.S., Sack, J.S., Greenhough, T.J., Bugg, C.E.,
148. Berman, H.A., Yguerabide, J., & Taylor, P. (1980) Means, A.R., & Cook, W.J. (1985) Nature 315, 37–40.
Biochemistry 19, 2226–2235. 178. Li, F., Gangal, M., Juliano, C., Gorfain, E., Taylor, S.S., &
149. Hansen, J.E., Longworth, J.W., & Fleming, G.R. (1990) Johnson, D.A. (2002) J. Mol. Biol. 315, 459–469.
Biochemistry 29, 7329–7338. 179. Knighton, D.R., Bell, S.M., Zheng, J., Ten Eyck, L.F.,
150. Ellis, J., Bagshaw, C.R., & Shaw, W.V. (1995) Xuong, N.H., Taylor, S.S., & Sowadski, J.M. (1993) Acta
Biochemistry 34, 3513–3520. Crystallogr. D 49, 357–361.
151. Steiner, R.F., Albaugh, S., & Kilhoffer, M.C. (1991) J. 180. Allen, D.J., & Benkovic, S.J. (1989) Biochemistry 28,
Fluoresc. 1, 15–22. 9586–9593.
152. Yamashita, S., Nishimoto, E., Szabo, A.G., & Yamasaki, 181. McWherter, C.A., Haas, E., Leed, A.R., & Scheraga, H.A.
N. (1996) Biochemistry 35, 531–537. (1986) Biochemistry 25, 1951–1963.
153. Hu, L., & Colman, R.F. (1997) Biochemistry 36, 182. Xing, J., Forsee, W.T., Lamani, E., Maltsev, S.D.,
1635–1645. Danilov, L.L., Shibaev, V.N., Schutzbach, J.S., Cheung,
154. First, E.A., Johnson, D.A., & Taylor, S.S. (1989) H.C., & Jedrzejas, M.J. (2000) Biochemistry 39,
Biochemistry 28, 3606–3613. 7886–7894.
155. Lillo, M.P., Beechem, J.M., Szpikowska, B.K., Sherman, 183. Patel, L.R., Curran, T., & Kerppola, T.K. (1994) Proc.
M.A., & Mas, M.T. (1997) Biochemistry 36, 11261–11272. Natl. Acad. Sci. U.S.A. 91, 7360–7364.
156. Pober, J.S., Iwanij, V., Reich, E., & Stryer, L. (1978) 184. Lin, S.H., & Faller, L.D. (1996) Biochemistry 35,
Biochemistry 17, 2163–2168. 8419–8428.
157. Cornish, V.W., Benson, D.R., Altenbach, C.A., Hideg, K., 185. Tao, T., Gowell, E., Strasburg, G.M., Gergely, J., &
Hubbell, W.L., & Schultz, P.G. (1994) Proc. Natl. Acad. Leavis, P.C. (1989) Biochemistry 28, 5902–5908.
Sci. U.S.A. 91, 2910. 186. Bilderback, T., Fulmer, T., Mantulin, W.W., & Glaser, M.
158. Steward, L.E., Collins, C.S., Gilmore, M.A., Carlson, J.E., (1996) Biochemistry 35, 6100–6106.
Ross, J.B.A., & Chamberlin, A.R. (1997) J. Am. Chem. 187. Hadfield, A.T., Harvey, D.J., Archer, D.B., MacKenzie,
Soc. 119, 6–11. D.A., Jeenes, D.J., Radford, S.E., Lowe, G., Dobson,
159. Carraway, K.L.r., Koland, J.G., & Cerione, R.A. (1990) C.M., & Johnson, L.N. (1994) J. Mol. Biol. 243, 856–872.
Biochemistry 29, 8741–8747. 188. Alley, S.C., Abel-Santos, E., & Benkovic, S.J. (2000)
160. Fernando Valenzuela, C., Weign, P., Yguerabide, J., & Biochemistry 39, 3076–3090.
Johnson, D.A. (1994) Biophys. J. 66, 674–682. 189. Adams, S.R., Harootunian, A.T., Buechler, Y.J., Taylor,
161. Ahn, T., Guengerich, F.P., & Yun, C.-H. (1998) S.S., & Tsien, R.Y. (1991) Nature 349, 694–697.
Biochemistry 37, 12860–12866. 190. Hall, J., Moubarak, A., O’Brien, P., Pan, L.P., Cho, I., &
162. Dale, R.E., Eisinger, J., & Blumberg, W.E. (1979) Millett, F. (1988) J. Biol. Chem. 263, 8142–8149.
Biophys. J. 26, 161–193. 191. Takashi, R., & Kasprzak, A.A. (1987) Biochemistry 26,
163. Wu, P., & Brand, L. (1992) Biochemistry 31, 7939–7947. 7471–7477.
654 Physical Measurements of Structure
192. Nomanbhoy, T.K., Erickson, J.W., & Cerione, R.A. L.L., Hupe, D.J., & Zuiderweg, E.R. (1993) Biochemistry
(1999) Biochemistry 38, 1744–1750. 32, 13109–13122.
193. Latham, G.J., Pietroni, P., Dong, F., Young, M.C., & von 221. Vuister, G.W., & Bax, A. (1993) J. Am. Chem. Soc. 115,
Hippel, P.H. (1996) J. Mol. Biol. 264, 426–439. 7772–7777.
194. Kohler, J.J., & Schepartz, A. (2001) Biochemistry 40, 222. Archer, S.J., Ikura, M., Torchia, D.A., & Bax, A. (1991) J.
130–142. Magn. Reson. (1969–1992) 95, 636–641.
195. Chabbert, M., Cazenave, C., & Helene, C. (1987) 223. Grzesiek, S., Kuboniwa, H., Hinck, A.P., & Bax, A. (1995)
Biochemistry 26, 2218–2225. J. Am. Chem. Soc. 117, 5312–5315.
196. Bjornson, K.P., Amaratunga, M., Moore, K.J., & 224. Boucher, W., Laue, E.D., Campbell-Burk, S., &
Lohman, T.M. (1994) Biochemistry 33, 14306–14316. Domaille, P.J. (1992) J. Am. Chem. Soc. 114, 2262–2264.
197. Gumbs, O.H., & Shaner, S.L. (1998) Biochemistry 37, 225. Bax, A., & Ikura, M. (1991) J. Biomol. NMR 1, 99–104.
11692–11706. 226. Grzesiek, S., & Bax, A. (1992) J. Magn. Reson.
198. Lorenz, M., Hillisch, A., Payet, D., Buttinelli, M., (1969–1992) 96, 432–440.
Travers, A., & Diekmann, S. (1999) Biochemistry 38, 227. Grzesiek, S., & Bax, A. (1992) J. Am. Chem. Soc. 114,
12150–12158. 6291–6293.
199. Jung, K., Jung, H., Wu, J., Prive, G.G., & Kaback, H.R. 228. Grzesiek, S., Ikura, M., Clore, G.M., Gronenborn, A.M., &
(1993) Biochemistry 32, 12273–12278. Bax, A. (1992) J. Magn. Reson. (1969–1992) 96, 215–221.
200. Zhan, H., Choe, S., Huynh, P.D., Finkelstein, A., 229. Grzesiek, S., & Bax, A. (1993) J. Biomol. NMR 3, 185–204.
Eisenberg, D., & Collier, R.J. (1994) Biochemistry 33, 230. Yamazaki, T., Forman-Kay, J.D., & Kay, L.E. (1993) J.
11254–11263. Am. Chem. Soc. 115, 11054–11055.
201. Wingfield, P.T., Stahl, S.J., Williams, R.W., & Steven, 231. Morris, G.A., & Freeman, R. (1979) J. Am. Chem. Soc.
A.C. (1995) Biochemistry 34, 4919–4932. 101, 760–762.
202. Crowther, R.A., Kiselev, N.A., Bottcher, B., Berriman, 232. Mueller, L. (1979) J. Am. Chem. Soc. 101, 4481–4484.
J.A., Borisova, G.P., Ose, V., & Pumpens, P. (1994) Cell 233. Bodenhausen, G., & Ruben, D.J. (1980) Chem. Phys.
77, 943–950. Lett. 69, 185–189.
203. Gunther, H. (1995) NMR spectroscopy: basic principles, 234. Zuiderweg, E.R.P. (1990). J. Magn. Reson. (1969–1992)
concepts, and applications in chemistry, 2nd ed., Wiley, 86, 346–357.
New York. 235. Bax, A., Freeman, R., & Kempsell, S.P. (1980) J. Am.
204. Derome, A.E. (1987) Modern NMR techniques for Chem. Soc. 102, 4849–4851.
chemistry research, Vol. 6, Pergamon Press, Oxford, 236. Rance, M., Soerensen, O.W., Bodenhausen, G.,
England. Wagner, G., Ernst, R.R., & Wuethrich, K. (1983)
205. Zhao, D., & Jardetzky, O. (1994) J. Mol. Biol. 239, Biochem. Biophys. Res. Commun. 117, 479–485.
601–607. 237. Piantini, U., Sorensen, O.W., & Ernst, R.R. (1982) J. Am.
206. Gordon, S.L., & Wuethrich, K. (1978) J. Am. Chem. Soc. Chem. Soc. 104, 6800–6801.
100, 7094–7096. 238. Braunschweiler, L., & Ernst, R.R. (1983) J. Magn. Reson.
207. Williams, G., Moore, G.R., Porteous, R., Robinson, (1969–1992) 53, 521–528.
M.N., Soffe, N., & Williams, R.J. (1985) J. Mol. Biol. 183, 239. Davis, D.G., & Bax, A. (1985) J. Am. Chem. Soc. 107,
409–428. 2820–2821.
208. Mandel, M. (1965) J. Biol. Chem. 240, 1586–1592. 240. Eich, G., Bodenhausen, G., & Ernst, R.R. (1982) J. Am.
209. Wagner, G., Kumar, A., & Weuthrich, K. (1981) Eur. J. Chem. Soc. 104, 3731–3732.
Biochem. 114, 375–384. 241. Bax, A., & Davis, D.G. (1985) J. Magn. Reson.
210. Wagner, G., & Weuthrich, K. (1982) J. Mol. Biol. 155, (1969–1992) 65, 355–360.
347–366. 242. Westler, W.M., Kainosho, M., Nagao, H., Tomonaga, N.,
211. Aue, W.P., Bartholdi, E., & Ernst, R.R. (1976) J. Chem. & Markley, J.L. (1988) J. Am. Chem. Soc. 110, 4093–4095.
Phys. 64, 2229–2246. 243. Bax, A., Sparks, S.W., & Torchia, D.A. (1988) J. Am.
212. Jeener, J., Meier, B.H., Bachmann, P., & Ernst, R.R. Chem. Soc. 110, 7926–7927.
(1979) J. Chem. Phys. 71, 4546–4553. 244. Bax, A., Kay, L.E., Sparks, S.W., & Torchia, D.A. (1989) J.
213. Ikura, M., Kay, L.E., & Bax, A. (1990) Biochemistry 29, Am. Chem. Soc. 111, 408–409.
4659–4667. 245. Nietlispach, D., Clowes, R.T., Broadhurst, R.W., Ito, Y.,
214. Yang, D., & Kay, L.E. (1999) J. Am. Chem. Soc. 121, Keeler, J., Kelly, M., Ashurst, J., Oschkinat, H., Domaille,
2571–2575. P.J., & Laue, E.D. (1996) J. Am. Chem. Soc. 118, 407–415.
215. Grzesiek, S., & Bax, A. (1992) J. Magn. Reson. (1969– 246. LeMaster, D.M., & Richards, F.M. (1988) Biochemistry
1992) 99, 201–207. 27, 142–150.
216. Wittekind, M., & Mueller, L. (1993) J. Magn. Reson. Ser. 247. Pervushin, K., Riek, R., Wider, G., & Wuthrich, K. (1997)
B 101, 201–205. Proc. Natl. Acad. Sci. U.S.A. 94, 12366–12371.
217. Kay, L.E. (1993) J. Am. Chem. Soc. 115, 2055–2057. 248. Riek, R., Wider, G., Pervushin, K., & Wuthrich, K. (1999)
218. Bax, A., Clore, G.M., & Gronenborn, A.M. (1990) J. Proc. Natl. Acad. Sci. U.S.A. 96, 4918–4923.
Magn. Reson. (1969–1992) 88, 425–431. 249. Oh, B.H., Mooberry, E.S., & Markley, J.L. (1990)
219. Fesik, S.W., Eaton, H.L., Olejniczak, E.T., Zuiderweg, Biochemistry 29, 4004–4011.
E.R.P., McIntosh, L.P., & Dahlquist, F.W. (1990) J. Am. 250. Mooberry, E.S., Oh, B.H., & Markley, J.L. (1989) J. Magn.
Chem. Soc. 112, 886–888. Reson. (1969–1992) 85, 147–149.
220. Van Doren, S.R., Kurochkin, A.V., Ye, Q.Z., Johnson, 251. Stockman, B.J., Nirmala, N.R., Wagner, G., Delcamp,
References 655
T.J., DeYarman, M.T., & Freisheim, J.H. (1992) 278. Uhrinova, S., Uhrin, D., Nairn, J., Price, N.C.,
Biochemistry 31, 218–229. Fothergill-Gilmore, L.A., & Barlow, P.N. (2001) J. Mol.
252. Zhang, X., Gonnella, N.C., Koehn, J., Pathak, N., Ganu, Biol. 306, 275–290.
V., Melton, R., Parker, D., Hu, S.I., & Nam, K.Y. (2000) J. 279. Gargaro, A.R., Soteriou, A., Frenkiel, T.A., Bauer, C.J.,
Mol. Biol. 301, 513–524. Birdsall, B., Polshakov, V.I., Barsukov, I.L., Roberts,
253. Allain, F.H., Gilbert, D.E., Bouvet, P., & Feigon, J. (2000) G.C., & Feeney, J. (1998) J. Mol. Biol. 277, 119–
J. Mol. Biol. 303, 227–241. 134.
254. Mosbah, A., Belaich, A., Bornet, O., Belaich, J.P., 280. Ikura, M., Spera, S., Barbato, G., Kay, L.E., Krinks, M., &
Henrissat, B., & Darbon, H. (2000) J. Mol. Biol. 304, Bax, A. (1991) Biochemistry 30, 9216–9228.
201–217. 281. Moy, F.J., Glasfeld, E., Mosyak, L., & Powers, R. (2000)
255. Arora, A., Abildgaard, F., Bushweller, J.H., & Tamm, Biochemistry 39, 9146–9156.
L.K. (2001) Nat. Struct. Biol. 8, 334–338. 282. Drohat, A.C., Baldisseri, D.M., Rustandi, R.R., & Weber,
256. Gooley, P.R., Caffrey, M.S., Cusanovich, M.A., & D.J. (1998) Biochemistry 37, 2729–2740.
MacKenzie, N.E. (1990) Biochemistry 29, 2278–2290. 283. Chazin, W.J., Hugli, T.E., & Wright, P.E. (1988)
257. Moy, F.J., Diblasio, E., Wilhelm, J., & Powers, R. (2001) Biochemistry 27, 9139–9148.
J. Mol. Biol. 310, 219–230. 284. Kline, A.D., Braun, W., & Weuthrich, K. (1986) J. Mol.
258. Hinck, A.P., Archer, S.J., Qian, S.W., Roberts, A.B., Biol. 189, 377–382.
Sporn, M.B., Weatherbee, J.A., Tsang, M.L., Lucas, R., 285. Redfield, C., Smith, L.J., Boyd, J., Lawrence, G.M.,
Zhang, B.L., Wenker, J., & Torchia, D.A. (1996) Edwards, R.G., Smith, R.A., & Dobson, C.M. (1991)
Biochemistry 35, 8517–8534. Biochemistry 30, 11029–11035.
259. Clore, G.M., Bax, A., Driscoll, P.C., Wingfield, P.T., & 286. Garrett, D.S., Powers, R., March, C.J., Frieden, E.A.,
Gronenborn, A.M. (1990) Biochemistry 29, 8172–8184. Clore, G.M., & Gronenborn, A.M. (1992) Biochemistry
260. Lipari, G., & Szabo, A. (1982) J. Am. Chem. Soc. 104, 31, 4347–4353.
4546–4559. 287. Powers, R., Garrett, D.S., March, C.J., Frieden, E.A.,
261. Sorensen, M.D., Bjorn, S., Norris, K., Olsen, O., Gronenborn, A.M., & Clore, G.M. (1992) Science 256,
Petersen, L., James, T.L., & Led, J.J. (1997) Biochemistry 1673–1677.
36, 10439–10450. 288. Driscoll, P.C., Gronenborn, A.M., Beress, L., & Clore,
262. Wolf-Watz, M., Grundstrom, T., & Hard, T. (2001) G.M. (1989) Biochemistry 28, 2188–2198.
Biochemistry 40, 11423–11432. 289. Hansen, P.E. (1991) Biochemistry 30, 10457–10466.
263. Buck, M., Boyd, J., Redfield, C., MacKenzie, D.A., 290. Delaglio, F., Kontaxis, G., & Bax, A. (2000) J. Am. Chem.
Jeenes, D.J., Archer, D.B., & Dobson, C.M. (1995) Soc. 122, 2142–2143.
Biochemistry 34, 4041–4055. 291. Huber, J.G., Moulis, J.M., & Gaillard, J. (1996)
264. Otting, G., Liepinsh, E., & Wuthrich, K. (1993) Biochemistry 35, 12705–12711.
Biochemistry 32, 3571–3582. 292. Golden, B.L., Hoffman, D.W., Ramakrishnan, V., &
265. Falzone, C.J., Wright, P.E., & Benkovic, S.J. (1994) White, S.W. (1993) Biochemistry 32, 12812–12820.
Biochemistry 33, 439–442. 293. Habazettl, J., Myers, L.C., Yuan, F., Verdine, G.L., &
266. Jones, D.D., Stott, K.M., Reche, P.A., & Perham, R.N. Wagner, G. (1996) Biochemistry 35, 9335–9348.
(2001) J. Mol. Biol. 305, 49–60. 294. Myers, L.C., Verdine, G.L., & Wagner, G. (1993)
267. Yi, Q., Erman, J.E., & Satterlee, J.D. (1994) 1H NMR J. Biochemistry 32, 14089–14094.
Am. Chem. Soc. 116, 1981–1987. 295. Braun, W., Vasak, M., Robbins, A.H., Stout, C.D.,
268. Freedman, S.J., Furie, B.C., Furie, B., & Baleja, J.D. Wagner, G., Kagi, J.H., & Wuthrich, K. (1992) Proc. Natl.
(1995) Biochemistry 34, 12126–12137. Acad. Sci. U.S.A. 89, 10124–10128.
269. Jaishree, T.N., Ramakrishnan, V., & White, S.W. (1996) 296. Smith, L.J., Sutcliffe, M.J., Redfield, C., & Dobson, C.M.
Biochemistry 35, 2845–2853. (1993) J. Mol. Biol. 229, 930–944.
270. Williamson, M.P., Havel, T.F., & Weuthrich, K. (1985) J. 297. Otting, G., & Wuethrich, K. (1989) J. Am. Chem. Soc. 111,
Mol. Biol. 182, 295–315. 1871–1875.
271. Dubs, A., Wagner, G., & Wuethrich, K. (1979) Biochim. 298. Xu, R.X., Meadows, R.P., & Fesik, S.W. (1993)
Biophys. Acta 577, 177–194. Biochemistry 32, 2473–2480.
272. Clore, G.M., Robien, M.A., & Gronenborn, A.M. (1993) 299. Gardner, K.H., Pan, T., Narula, S., Rivera, E., &
J. Mol. Biol. 231, 82–102. Coleman, J.E. (1991) Biochemistry 30, 11292–11302.
273. Redfield, C., Smith, L.J., Boyd, J., Lawrence, G.M., 300. Billeter, M., Kline, A.D., Braun, W., Huber, R., &
Edwards, R.G., Gershater, C.J., Smith, R.A., & Dobson, Wuthrich, K. (1989) J. Mol. Biol. 206, 677–687.
C.M. (1994) J. Mol. Biol. 238, 23–41. 301. Fraenkel, E., & Pabo, C.O. (1998) Nat. Struct. Biol. 5,
274. Kay, L.E., Clore, G.M., Bax, A., & Gronenborn, A.M. 692–697.
(1990) Science 249, 411–414. 302. Shaanan, B., Gronenborn, A.M., Cohen, G.H., Gilliland,
275. Powers, R., Garrett, D.S., March, C.J., Frieden, E.A., G.L., Veerapandian, B., Davies, D.R., & Clore, G.M.
Gronenborn, A.M., & Clore, G.M. (1993) Biochemistry (1992) Science (Washington, D.C.) 257, 961–964.
32, 6744–6762. 303. Zink, T., Ross, A., Luers, K., Cieslar, C., Rudolph, R., &
276. Feng, W., Tejero, R., Zimmerman, D.E., Inouye, M., & Holak, T.A. (1994) Biochemistry 33, 8453–8463.
Montelione, G.T. (1998) Biochemistry 37, 10881–10896. 304. Gopal, B., Haire, L.F., Cox, R.A., Jo Colston, M., Major,
277. Xia, B., Vlamis-Gardikas, A., Holmgren, A., Wright, P.E., S., Brannigan, J.A., Smerdon, S.J., & Dodson, G. (2000)
& Dyson, H.J. (2001) J. Mol. Biol. 310, 907–918. Nat. Struct. Biol. 7, 475–478.
656 Physical Measurements of Structure
305. Chan, M.K., Gong, W., Rajagopalan, P.T., Hao, B., Tsai, 332. Mendz, G.L., Moore, W.J., & Martenson, R.E. (1983)
C.M., & Pei, D. (1997) Biochemistry 36, 13904–13909. Biochim. Biophys. Acta 742, 215–223.
306. Li, N., Zhang, W., White, S.W., & Kriwacki, R.W. (2001) 333. McIntosh, L.P., Hand, G., Johnson, P.E., Joshi, M.D.,
Biochemistry 40, 4293–4302. Korner, M., Plesniak, L.A., Ziser, L., Wakarchuk, W.W.,
307. Chou, J.J., Li, S., Klee, C.B., & Bax, A. (2001) Nat. Struct. & Withers, S.G. (1996) Biochemistry 35, 9958–9966.
Biol. 8, 990–997. 334. Joshi, M.D., Sidhu, G., Pot, I., Brayer, G.D., Withers,
308. Lukin, J.A., Kontaxis, G., Simplaceanu, V., Yuan, Y., Bax, S.G., & McIntosh, L.P. (2000) J. Mol. Biol. 299, 255–279.
A., & Ho, C. (2003) Proc. Natl. Acad. Sci. U.S.A. 100, 335. Kohda, D., Sawada, T., & Inagaki, F. (1991)
517–520. Biochemistry 30, 4896–4900.
309. Bertini, I., Dikiy, A., Kastrau, D.H., Luchinat, C., & 336. Oda, Y., Yamazaki, T., Nagayama, K., Kanaya, S.,
Sompornpisut, P. (1995) Biochemistry 34, 9851–9858. Kuroda, Y., & Nakamura, H. (1994) Biochemistry 33,
310. Schiffer, C.A., Huber, R., Wuthrich, K., & van 5275–5284.
Gunsteren, W.F. (1994) J. Mol. Biol. 241, 588–599. 337. Katayanagi, K., Miyagawa, M., Matsushima, M.,
311. Gallagher, T., Alexander, P., Bryan, P., & Gilliland, G.L. Ishikawa, M., Kanaya, S., Nakamura, H., Ikehara, M.,
(1994) Biochemistry 33, 4721–4729. Matsuzaki, T., & Morikawa, K. (1992) J. Mol. Biol. 223,
312. Baldwin, E.T., Weber, I.T., St. Charles, R., Xuan, J.C., 1029–1052.
Appella, E., Yamada, M., Matsushima, K., Edwards, 338. Shrager, R.I., Cohen, J.S., Heller, S.R., Sachs, D.H., &
B.F., Clore, G.M., Gronenborn, A.M., et al. (1991) Proc. Schechter, A.N. (1972) Biochemistry 11, 541–547.
Natl. Acad. Sci. U.S.A. 88, 502–506. 339. Szyperski, T., Antuch, W., Schick, M., Betz, A., Stone,
313. Ulmer, T.S., Ramirez, B.E., Delaglio, F., & Bax, A. (2003) S.R., & Wuthrich, K. (1994) Biochemistry 33, 9303–9310.
J. Am. Chem. Soc. 125, 9179–9191. 340. Chau, M.H., Cai, M.L., & Timkovich, R. (1990)
314. Baistrocchi, P., Banci, L., Bertini, I., Turano, P., Bren, Biochemistry 29, 5076–5087.
K.L., & Gray, H.B. (1996) Biochemistry 35, 13788–13796. 341. Dardel, F., Laue, E.D., & Perham, R.N. (1991) Eur. J.
315. Wolf-Watz, M., Thai, V., Henzler-Wildman, K., Biochem. 201, 203–209.
Hadjipavlou, G., Eisenmesser, E.Z., & Kern, D. (2004) 342. Osborne, M.J., Lian, L.Y., Wallis, R., Reilly, A., James, R.,
Nat. Struct. Mol. Biol. 11, 945–949. Kleanthous, C., & Moore, G.R. (1994) Biochemistry 33,
316. Wand, A.J., Ehrhardt, M.R., & Flynn, P.F. (1998) Proc. 12347–12355.
Natl. Acad. Sci. U.S.A. 95, 15299–15302. 343. Kawata, Y., Goto, Y., Hamaguchi, K., Hayashi, F.,
317. Arrowsmith, C.H., Pachter, R., Altman, R.B., Iyer, S.B., Kobayashi, Y., & Kyogoku, Y. (1988) Biochemistry 27,
& Jardetzky, O. (1990) Biochemistry 29, 6332–6341. 346–350.
318. Kelly, M.J., Ball, L.J., Krieger, C., Yu, Y., Fischer, M., 344. Endo, T., Ueda, T., Yamada, H., & Imoto, T. (1987)
Schiffmann, S., Schmieder, P., Kuhne, R., Bermel, W., Biochemistry 26, 1838–1845.
Bacher, A., Richter, G., & Oschkinat, H. (2001) Proc. 345. Molday, R.S., Englander, S.W., & Kallen, R.G. (1972)
Natl. Acad. Sci. U.S.A. 98, 13025–13030. Biochemistry 11, 150–158.
319. Tugarinov, V., Choy, W.Y., Orekhov, V.Y., & Kay, L.E. 346. Perrin, C.L., Lollo, C.P., & Johnston, E.R. (1984) J. Am.
(2005) Proc. Natl. Acad. Sci. U.S.A. 102, 622–627. Chem. Soc. 106, 2749–2753.
320. Peters, A.R., Dekker, N., van den Berg, L., Boelens, R., 347. Bai, Y., Milne, J.S., Mayne, L., & Englander, S.W. (1993)
Kaptein, R., Slotboom, A.J., & de Haas, G.H. (1992) Proteins: Struct., Funct., Genet. 17, 75–86.
Biochemistry 31, 10024–10030. 348. Englander, S.W., & Staley, R. (1969) J. Mol. Biol. 45,
321. Meadows, D.H., Markley, J.L., Cohen, J.S., & Jardetzky, 277–295.
O. (1967) Proc. Natl. Acad. Sci. U.S.A. 58, 1307– 349. Hvidt, A., & Linderstrom-Lang, K. (1954) Biochim.
1313. Biophys. Acta 14, 574–575.
322. Zhou, M.M., Davis, J.P., & Van Etten, R.L. (1993) 350. Nakanishi, M., Tsuboi, M., & Ikegami, A. (1972) J. Mol.
Biochemistry 32, 8479–8486. Biol. 70, 351–361.
323. Meadows, D.H., Jardetzky, O., Epand, R.M., Ruterjans, 351. Johnson, R.S., & Walsh, K.A. (1994) Protein Sci. 3,
H.H., & Scheraga, H.A. (1968) Proc. Natl. Acad. Sci. 2411–2418.
U.S.A. 60, 766–772. 352. Katta, V., & Chait, B.T. (1993) J. Am. Chem. Soc. 115,
324. Matthew, J.B., & Richards, F.M. (1982) Biochemistry 21, 6317–6321.
4989–4999. 353. Deng, Y., & Smith, D.L. (1999) J. Mol. Biol. 294, 247–258.
325. Botelho, L.H., & Gurd, F.R. (1978) Biochemistry 17, 354. Zhang, Z., & Smith, D.L. (1993) Protein Sci. 2, 522–531.
5188–5196. 355. Wang, F., Blanchard, J.S., & Tang, X.J. (1997)
326. Bycroft, M., & Fersht, A.R. (1988) Biochemistry 27, Biochemistry 36, 3755–3759.
7390–7394. 356. Andersen, M.D., Shaffer, J., Jennings, P.A., & Adams,
327. Pesando, J.M. (1975) Biochemistry 14, 675–681. J.A. (2001) J. Biol. Chem. 276, 14204–14211.
328. Pesando, J.M. (1975) Biochemistry 14, 681–688. 357. Zhang, Z., Post, C.B., & Smith, D.L. (1996) Biochemistry
329. Zhang, P.H., Graminski, G.F., & Armstrong, R.N. (1991) 35, 779–791.
J. Biol. Chem. 266, 19475–19479. 358. Resing, K.A., & Ahn, N.G. (1998) Biochemistry 37,
330. Botelho, L.H., Friend, S.H., Matthew, J.B., Lehman, 463–475.
L.D., Hanania, G.I., & Gurd, F.R. (1978) Biochemistry 17, 359. Wagner, G., & Wuthrich, K. (1982) J. Mol. Biol. 160,
5197–5205. 343–361.
331. Glickson, J.D., Phillips, W.D., & Rupley, J.A. (1971) J. 360. Skelton, N.J., Kordel, J., Akke, M., & Chazin, W.J. (1992)
Am. Chem. Soc. 93, 4031–4038. J. Mol. Biol. 227, 1100–1117.
References 657
361. Linderstrom-Lang, K. (1955) Chem. Soc. (London), 386. Lee, H.-I., Sorlie, M., Christiansen, J., Song, R., Dean,
Spec. Publ. No. 2, 1–20, discussion 21–24. D.R., Hales, B.J., & Hoffman, B.M. (2000) J. Am. Chem.
362. Englander, S.W., & Kallenbach, N.R. (1983) Q. Rev. Soc. 122, 5582–5587.
Biophys. 16, 521–655. 387. Parast, C.V., Wong, K.K., Lewisch, S.A., Kozarich, J.W.,
363. Ehrhardt, M.R., Urbauer, J.L., & Wand, A.J. (1995) Peisach, J., & Magliozzo, R.S. (1995) Biochemistry 34,
Biochemistry 34, 2731–2738. 2393–2399.
364. Wand, A.J., Roder, H., & Englander, S.W. (1986) 388. Itoh, S., Ogino, M., Haranou, S., Terasaka, T., Ando, T.,
Biochemistry 25, 1107–1114. Komatsu, M., Ohshiro, Y., Fukuzumi, S., Kano, K., et al.
365. Haruyama, H., Qian, Y.Q., & Wuthrich, K. (1989) (1995) J. Am. Chem. Soc. 117, 1485–1493.
Biochemistry 28, 4312–4317. 389. Parast, C.V., Wong, K.K., Kozarich, J.W., Peisach, J., &
366. Wang, Q.W., Kline, A.D., & Wuthrich, K. (1987) Magliozzo, R.S. (1995) J. Am. Chem. Soc. 117,
Biochemistry 26, 6488–6493. 10601–10602.
367. Jandu, S.K., Ray, S., Brooks, L., & Leatherbarrow, R.J. 390. Yonetani, T., & Schleyer, H. (1967) J. Biol. Chem. 242,
(1990) Biochemistry 29, 6264–6269. 3919–3925.
368. Perrin, C.L., Dwyer, T.J., Rebek, J., Jr., & Duff, R.J. (1990) 391. Libertini, L.J., Waggoner, A.S., Jost, P.C., & Griffith, O.H.
J. Am. Chem. Soc. 112, 3122–3125. (1969) Proc. Natl. Acad. Sci. U.S.A. 64, 13–19.
369. Hvidt, A., & Nielsen, S.O. (1966) Adv. Protein Chem. 21, 392. Altenbach, C., Oh, K.J., Trabanino, R.J., Hideg, K., &
287–386. Hubbell, W.L. (2001) Biochemistry 40, 15471–15482.
370. Roder, H., Wagner, G., & Wuthrich, K. (1985) 393. McHaourab, H.S., Oh, K.J., Fang, C.J., & Hubbell, W.L.
Biochemistry 24, 7407–7411. (1997) Biochemistry 36, 307–316.
371. Liu, K., Cho, H.S., Hoyt, D.W., Nguyen, T.N., Olds, P., 394. Sjoberg, B.M., Reichard, P., Graslund, A., & Ehrenberg,
Kelly, J.W., & Wemmer, D.E. (2000) J. Mol. Biol. 303, A. (1978) J. Biol. Chem. 253, 6863–6865.
555–565. 395. Feher, G. (1956) Phys. Rev. 103, 834–835.
372. Wagner, G. (1980) Biochem. Biophys. Res. Commun. 97, 396. Feher, G. (1959) Phys. Rev. 114, 1219–1244.
614–620. 397. Tierney, D.L., Huang, H., Martasek, P., Masters, B.S.,
373. Roder, H., Wagner, G., & Wuthrich, K. (1985) Silverman, R.B., & Hoffman, B.M. (1999) Biochemistry
Biochemistry 24, 7396–7407. 38, 3704–3710.
374. Arrington, C.B., & Robertson, A.D. (1997) Biochemistry 398. Edmondson, D.E., & D’Ardenne, S.C. (1989)
36, 8686–8691. Biochemistry 28, 5924–5930.
375. Neira, J.L., Sevilla, P., Menendez, M., Bruix, M., & Rico, 399. Bender, C.J., Sahlin, M., Babcock, G.T., Barry, B.A.,
M. (1999) J. Mol. Biol. 285, 627–643. Chandrashekar, T.K., Salowe, S.P., Stubbe, J.,
376. Wagner, G., Stassinopoulou, C.I., & Weuthrich, K. Lindstroem, B., Petersson, L., Ehrenberg, A., & Sjoberg,
(1984) Eur. J. Biochem. 145, 431–436. B. (1989) J. Am. Chem. Soc. 111, 8076–8083.
377. Kossiakoff, A.A. (1982) Nature 296, 713–721. 400. Ivancich, A., Jouve, H.M., Sartor, B., & Gaillard, J. (1997)
378. Wlodawer, A., & Sjeolin, L. (1982) Proc. Natl. Acad. Sci. Biochemistry 36, 9356–9364.
U.S.A. 79, 1418–1422. 401. McElroy, J.D., Feher, G., & Mauzerall, D.C. (1972)
379. Paterson, Y., Englander, S.W., & Roder, H. (1990) Biochim. Biophys. Acta 267, 363–374.
Science 249, 755–759. 402. Deisenhofer, J., Epp, O., Miki, K., Huber, R., & Michel,
380. Schechter, A.N., Moravek, L., & Anfinsen, C.B. (1969) J. H. (1984) J. Mol. Biol. 180, 385–398.
Biol. Chem. 244, 4981–4988. 403. Toyoshima, C., Nakasako, M., Nomura, H., & Ogawa,
381. Deligiannakis, Y., & Rutherford, A.W. (1996) H. (2000) Nature 405, 647–55.
Biochemistry 35, 11239–11246. 404. Toyoshima, C., & Mizutani, T. (2004) Nature 430,
382. Voss, J., Salwinski, L., Kaback, H.R., & Hubbell, W.L. 529–35.
(1995) Proc. Natl. Acad. Sci. U.S.A. 92, 12295–12299. 405. Sorensen, T.L., Moller, J.V., & Nissen, P. (2004) Science
383. Stone, T.J., Buckman, T., Nordio, P.L., & McConnell, 304, 1672–5.
H.M. (1965) Proc. Natl. Acad. Sci. U.S.A. 54, 1010–1017. 406. Toyoshima, C., Nomura, H., & Tsuda, T. (2004) Nature
384. Johnson, D.A., Gassner, G.T., Bandarian, V., Ruzicka, 432, 361–8.
F.J., Ballou, D.P., Reed, G.H., & Liu, H.W. (1996) 407. Olesen, C., Sorensen, T.L., Nielsen, R.C., Moller, J.V., &
Biochemistry 35, 15846–15856. Nissen, P. (2004) Science 306, 2251–5.
385. Rocklin, A.M., Tierney, D.L., Kofman, V., Brunhuber, 408. Toyoshima, C., & Nomura, H. (2002) Nature 418,
N.M., Hoffman, B.M., Christoffersen, R.E., Reich, N.O., 605–11.
Lipscomb, J.D., & Que, L., Jr. (1999) Proc. Natl. Acad.
Sci. U. S. A. 96, 7905–7909.
Chapter 13
Folding and Assembly
Each polypeptide begins its existence by emerging, amino conformational behavior depends critically on the sol-
terminus foremost, from a ribosome. Its initial amino acid vent in which they are dissolved.1 If the functional groups
sequence is the complete translation of the sequence in of its repeating units are miscible with the solvent, the
which the codons are arranged between the start codon polymer is free to expand and expose all of those func-
and the stop codon on the messenger RNA. At some point tional groups to that solvent without penalty. Such a sol-
in its early history, the polypeptide folds to assume its vent is a good solvent. When a polypeptide is dissolved
native state. The native state of a polypeptide is the lim- in a good solvent, rotation about each bond between
ited set of equilibrating conformations in which it will amide nitrogen and a carbon and between a carbon and
spend the remainder of its lifetime and in which it is capa- acyl carbon is permitted, within the confines of the
ble of performing its role within or on behalf of the living clashes represented in the Ramachandran plot (Figure
organism in which it was synthesized. The native state is 6–4) and within the requirement that no two atoms any-
the set of conformations of the polypeptide represented where in the polypeptide can occupy the same space at
by the crystallographic molecular models of the protein. the same time. As with any other unconfined organic
It is also referred to as the folded state. On the basis of the molecule in solution, the conformation of a polypeptide
easily verified existence and identity of the native state, a in a good solvent is continuously changing as these rota-
denatured state of a polypeptide can be defined as its tions occur at random. Such a protean polymer in a good
antonym. A denatured state of a polypeptide is any set of solvent is a random coil. This term incorporates
equilibrating conformations of that polypeptide that is unavoidably the uncontrolled and continuous motion of
not or does not contain the set of conformations of the this process. A random coil is a special type of unfolded
native state. As it emerges from the ribosome, the nascent state. The unfolded state of a polypeptide is a state in
polypeptide is in a denatured state. which the polypeptide is significantly expanded relative
The initial folded state of the polypeptide can to the native state so that most its structure is exposed to
undergo posttranslational modification, it can combine the solution even though it may not be a fully random
with several other identically folded polypeptides of the coil.
same sequence or several other folded polypeptides of a Unfortunately, there are few good solvents for nat-
different sequence and structure, or it can enter a helical urally occurring polypeptides. This is due to the fact that
polymeric protein as one of the protomers. The product almost all2,3 natural polypeptides are created to fold. To
of these steps is the mature native state of the protein fold, they must be composed of a mixture of hydropho-
encountered in the living tissue. The order in which these bic and hydrophilic amino acids, placed in a particular
later processes occurs cannot be predicted, but all of sequence. There is almost no solvent in which the result-
them usually follow the folding of the unadorned ing mixture of side chains is miscible. In particular,
polypeptide because it is usually only the folded polypep- water, although it is a good solvent for the hydrophilic
tide that contains the information controlling them. side chains, is a bad solvent for the hydrophobic side
Accordingly, the steps in the maturation of a protein chains.
can be divided into folding, posttranslational modifica- A bad solvent is a solvent in which functional groups
tion, and assembly. Folding is any process by which the of the repeating units of a polymer are only sparingly sol-
polypeptide initially in a denatured state, for example, its uble. In a bad solvent, a polymer contracts to decrease the
set of conformations as it emerges from the ribosome, exposure of those sparingly soluble functional groups to
assumes the folded native state. Assembly is the process the solution. The hydrophobic effect is a force that seeks
by which individual folded polypeptides associate to form to minimize the exposure of a hydrophobic solute to
their ultimate oligomeric or polymeric protein. water. Because of the hydrophobic effect that is exerted
on the hydrophobic side chains in a natural polypeptide,
water is a bad solvent for such a polypeptide at neutral pH
Thermodynamics of Folding and at an ionic strength of 0.2 M, which are the conditions
under which most proteins are found. If water were not a
A polypeptide is a polymer of amino acids (2–15). It is bad solvent, natural polypeptides would not fold. Natural
known from studies of polymers in general that their polypeptides, because they have evolved in water, fold in
660 Folding and Assembly
water. Within a cell, each polypeptide begins its life at concentrations of 6 or 8 M, respectively, they become
emerging from a ribosome into the cytoplasm. Even completely unfolded, and their constituent polypeptides
though water is a bad solvent for the emerging polypep- become random coils.
tide, it probably remains in an unfolded state until it There are several ways to demonstrate this fact.5
reaches a length at which there are a large enough The molar masses of the proteins determined from the
number of hydrophobic side chains to accomplish its colligative properties of these solutions are those of
contraction. Consequently, beyond a certain length, the the constitutive polypeptides rather than the oligomers.
polypeptide has the potential to contract to form a state The intrinsic viscosities of proteins dissolved in these
other than a random coil, yet there are experimental solutions range from 15 to 100 cm3 g–1 even though the
observations suggesting that at least some incomplete intrinsic viscosities of the native proteins are between 3
polypeptides adopt expanded unfolded states when they and 5 cm3 g–1. Furthermore, within a set of proteins, the
are dissolved in aqueous solution. intrinsic viscosities of their polypeptides dissolved in
Only glycerol, another cohesive, hydroxylic solvent, those solutions are correlated to the length of the con-
is also able to promote folding.4 Other pure solvents, stituent polypeptides by a relationship that agrees with
because they dissolve hydrophobic functional groups theoretical expectation for the behavior of random coils.
rather than exclude them (Figure 5–22) cannot promote The optical rotatory dispersion spectra and circular
the folding of a polypeptide, but they are nevertheless dichroic spectra of proteins in such solutions are those
bad solvents because they cannot solvate the polar side theoretically expected from a polypeptide lacking any
chains adequately. Consequently, if a polypeptide is not regular secondary structure, even if the spectra of the
in its native state in a particular solvent, it will usually native proteins indicate that they are predominantly
also not be a random coil. When an organic solvent that a helix and b structure. The acid–base titration curves of
is miscible with water, such as ethanol, is added to a solu- proteins dissolved in these solutions lose the normally
tion of protein, it causes the protein to denature because observed shifts in intrinsic pKa brought about by the
it solvates its nonpolar groups, but it also diminishes the electrostatic features of the native state and become
solvation of the polar groups, preventing the formation simple sums of the constituent intrinsic acid–base titra-
of a random coil. Denaturants are solutes that, when tions of the constituent amino acids (Table 2–2). All of
added to an aqueous solution of a protein, promote the the tyrosines in the protein display ultraviolet spec-
formation of a denatured state of that protein. Most trophotometric acid–base titrations with expected
denaturants do not turn water into a good solvent. intrinsic values of pKa. The rates of amido proton
If one is studying its folding, both the native state exchange become very rapid when proteins are dis-
and the denatured state of a polypeptide must be well- solved in these solutions, and no evidence for a class of
defined. The only denatured state of a polypeptide that slowly exchanging amido protons is usually found. The
can be defined with sufficient accuracy is a random coil. ultraviolet spectra between 270 and 300 nm of proteins
Therefore, the folding of a polypeptide is most informa- dissolved in these solutions are simple summations of
tively studied if the process that is monitored is the iso- the spectra of phenylalanine, tyrosine, and tryptophan
merization between the random coil and the native and display none of the spectral shifts characteristic of
state, even though this may not be what occurs in a cell. the native states.6
For this study to be accomplished, a good solvent is Solutions of either guanidinium chloride or urea
required. One of the few ways to create a good solvent is promote the unfolding of a polypeptide by increasing
to add either guanidinium chloride or urea to an aqueous the stability of the random coil. This increase in stabil-
solution of the protein. Both of these solutes are denatu- ity is due to favorable changes in the solvation both of the
rants, but they are denaturants that create a good sol- side chains of the amino acids and of the polypeptide
vent. Regardless of whether or not a polypeptide in a cell backbone brought about by these solutes. From meas-
is a random coil at its birth, experimentally an examina- urements of the solubility of various amino acids, as well
tion of the thermodynamics of protein folding usually as diglycine and triglycine, in solutions of either urea7 or
begins with the polypeptide as a random coil in a con- guanidinium chloride,8 the standard free energies of
centrated solution of guanidinium chloride or urea. transfer of both the side chains of the amino acids and
When almost all natural proteins, the cystines of the peptide bond between water and solutions of urea or
which have been reduced to cysteines, are dissolved in guanidinium chloride have been estimated (Table 13–1).
solutions of guanidinium chloride (13–1) or urea (13–2) These standard free energies of transfer were derived
from the differences between the solubilities of each of
H(H O‘
the amino acids and peptides and the solubilities of
‘
H H
:
free energies of transfer of each of the side chains or the Table 13–1 has been explained as the result of preferen-
peptide bond, respectively. tial binding of the denaturant to the random coil.5,13,14
The values obtained for leucine, phenylalanine, There is no evidence, however, for the existence of par-
and tryptophan agree quite closely with direct measure- ticular binding sites for either of these denaturants,
ments of the standard free energies of transfer for isobu- which if they existed would have to be distributed rather
tane, toluene, and skatole as models of the respective uniformly over the dramatically heterogeneous surface
side chains (Table 13–1).9 The N-acetyl ethyl esters of of the random coil. A more realistic explanation would be
leucine and phenylalanine, however, were found to have that these denaturants partition favorably into the pecu-
standard free energies of transfer, relative to ethyl liar layer of water solvating the random coil, relative to
N-acetylglycinate, that were much less negative (Table the water in the bulk of the solution.15 Regardless of the
13–1).10 It may be premature to attach any significance to molecular explanation, the experimental observation
the absolute numerical values of these various estimates. that accounts for the increase in the stability of the
It has been uniformly observed that the free ener- random coil relative to that of the native state is that both
gies of transfer of both hydrophobic solutes and neutral urea and guanidinium ion preferentially solvate those
hydrophilic solutes such as peptides between water and portions of a polypeptide exposed to the solution,15,16
either 7 M urea or 5 M guanidinium chloride have nega- and a random coil simply exposes more of the polypep-
tive values. Unlike the hydrophobic effect, which is tide than does the native state. The preferential solva-
imposed only on hydrogen–carbon bonds, the increase tion (Equation 1–57) of bovine serum albumin in its
in solvation performed by urea and guanidinium ion is native state15 by urea is +0.10 g mL–1; and by guanidinium
linearly related to the accessible surface area of the side ion, +0.14 g mL–1. The increase in preferential solvation
chain, regardless of its polarity,11 so that both hydropho- that occurs during the unfolding of lysozyme from Gallus
bic and hydrophilic functional groups that are exposed gallus by urea16 is +0.25 g mL–1. These positive experi-
to the solution upon formation of the random coil are mentally measured preferential solvations demonstrate
more favorably solvated in solutions of urea or guani- that both of these denaturants are significant salting-in
dinium chloride than they would be in water. This stabi- solutes. They do not exert their effects by decreasing the
lization of the random coil increases monotonically, but cohesion of the water and producing in turn a decrease
not linearly12 with the concentration of denaturant.7,8,10 in the hydrophobic effect because both urea and guani-
At some concentration of the denaturant, which differs dinium ion increase the surface tension of an aqueous
for each protein, the unfolded polypeptide becomes solution.17
more stable than the folded polypeptide. This point is The favorable solvation of the hydrophobic side
reached not only because of the increase in favorable sol- chains by urea and guanidinium chloride does make a
vation but also because the unfolded polypeptide is more major contribution to the stabilization of the random coil
disordered. (Table 13–1). This observation suggests that urea and
The favorable solvation of both polar and nonpolar guanidinium chloride cause the solution to become
functional groups in a polypeptide by urea or guani- more like a usual organic solvent in its properties (Figure
dinium ion quantified in the free energies of transfer in 5–22). This effect may result from the stable introduction
Table 13–1: Estimates of the Standard Free Energy of Transfer of Various Side Chains of the Amino Acids between Water
and Solutions of Urea or Guanidinium Chloride
amino acida alkane modelb N-acetyl ethyl esterc amino acida alkane modelb N-acetyl ethyl esterc
a
Calculated from the difference between solubility of glycine and the appropriate amino acid.7,8 bFree energy of transfer of isobutane, toluene, or skatole.9 cDifference in
free energy of transfer of ethyl N-acetylglycinate and N-acetyl ethyl ester of the amino acid.10
662 Folding and Assembly
of the nonpolar p clouds of the denaturants (2–26) into passes the process of protein folding can be presented as
the solution. N-Alkyl-, N,N¢-dialkyl-, and N,N,N¢,N¢- the equilibrium
tetraalkylureas are even more effective at increasing the
solubility of naphthalene, indole, and ethyl N-acetyltryp- kF
tophanate in water than is urea itself.18,19 This observa- U 1 F (13–1)
tion also suggests that it is nonpolar noncovalent kU
interactions between urea and the hydrophobic amino
acids that explain its ability to solvate them favorably.18 where F is the polypeptide folded in its native state and
The fact that methylurea, dimethylurea, and tetramethy- U is the unfolded state. The rate constants kF and kU are
lurea are each in turn increasingly better denaturants of composite rate constants including any kinetic steps
proteins than urea itself19 and the fact that some alkyl- between the two extremes, and the equilibrium con-
ureas are better denaturants than even guanidinium stant for folding, KFd, is defined by
chloride20 suggest that the favorable solvation by urea of
the hydrophobic functionalities revealed during the for- [ F ]eq kF
mation of the random coil, in addition to its ability as a K Fd = = (13–2)
donor or acceptor of hydrogen bonds,14 is the major fea- [ U ]eq kU
ture of its ability to promote unfolding.
A polypeptide will fold only if the free energy of the
Because polypeptides folded in their native state
native state is less than the free energy of all accessible
are by design reasonably stable in aqueous solution at
denatured states. Because of this requirement, for exam-
physiological temperatures and ranges of pH, the molar
ple, a nascent polypeptide cannot fold until it is long
concentration of the unfolded state under normal cir-
enough for the native state to contain a large enough col-
cumstances is immeasurably low, and in such a situa-
lection of noncovalent interactions to overcome the sig-
tion, neither the equilibrium constant nor the
nificant unfavorable loss of standard entropy that must
thermodynamic changes associated with folding can be
always accompany folding. It is also the case that a
measured directly by following the concentrations of the
polypeptide which has undergone extensive covalent
two forms of the protein. One solution to this problem is
posttranslational modification after it originally folded
to shift the equilibrium by introducing an unnatural per-
may not be able to fold again after it has been returned to
turbation. Because the unfolded states produced by
a denatured state. For example, proinsulin can be
adding guanidinium ion or urea are random coils, the
unfolded and its cystines reduced to cysteine. The pro-
least controversial perturbation that can be used to shift
tein will then refold spontaneously to its native state, and
the equilibrium is to add increasing concentrations of
the proper cystines will reform under oxidizing condi-
one or the other of these solutes to a series of solutions of
tions.21 Insulin, however, which is a posttranslationally
the protein. As the concentrations of these perturbants
modified fragment of proinsulin, missing 25 amino acids
are increased, the unfolded form of the protein becomes
from the middle of the polypeptide, does not refold
more and more stable until the equilibrium constant for
spontaneously after it has been unfolded and its cys-
Equation 13–1 is small enough for measurable amounts
teines reduced, and it can be refolded only with sub-
of the unfolded form of the protein to exist.
terfuge. The only fact that seems to be inescapable is
Raising the temperature or lowering the pH of the
that, at some point in its lifetime, a polypeptide has a
solution or a combination of these perturbations also
covalent structure capable of folding to produce either
stabilizes unfolded states of the protein relative to the
the mature native state directly or an initial native state,
native state. The decrease in the magnitude of the equi-
which is modified subsequently but retains its basic
librium constant for Equation 13–1 brought about by
folded state.
raising the temperature is presumably due to increases
A polypeptide that has not been modified so exten-
in thermal motion that always shift reactions in favor of
sively as to cause the mature native state to be higher in
more disordered states. The decrease in the magnitude
free energy than the random coil or higher in free energy
of the equilibrium constant brought about by lowering
than any other accessible denatured state will, under the
the pH is due to the fact that, because an unfolded state
proper circumstances, spontaneously refold to its
is expanded relative to the folded state, it can support a
mature native state after it has been purposely turned
greater net charge and thus can take up more protons
into a random coil by dissolving it in 6 M guanidinium
than the unfolded state as the pH is lowered. The rela-
chloride or 8 M urea. Most of our understanding of the
tionship between the equilibrium constant for folding
folding of polypeptides has been derived from the study
and the pH of the solution is governed by the differential
of such conformational isomerizations. Their existence
equation5
states that all of the information necessary to achieve the
proper native state resides in the amino acid sequence of
! ln K Fd
the polypeptide. = 6 H,F – 6 H,U (13–3)
The conformational isomerization that encom- ! ln a H+
Thermodynamics of Folding 663
where aH+ is the activity of protons in the solution and the unfolded state and the pH is decreased. Similarly a
6H,U and 6H,F are the mean net proton charge numbers of histidine with an elevated pKa stabilizes the folded
the unfolded and the folded states of the protein, respec- state.29 All of these shifts can be explained quantitatively
tively. by considering the effects of the respective values of pKa
The unfolded state of the protein can support a for a particular side chain on the magnitudes of 6H,F and
greater mean net proton charge number because the 6H,U5 in the integrated form of Equation 13–3.31–33
values of pKa for its functional groups are higher. A solution of guanidinium chloride at 6 M seems to
Consequently, it takes up more protons as the pH is low- produce the most complete unfolding to the random
ered than does the folded state. This fact is verified by coil.5 The values of the various physical parameters for
measuring the acid–base titration curves of the protein the same protein dissolved in 8 M urea rather than 6 M
(Figure 1–11) in the absence and the presence of 6 M guanidinium chloride are slightly but significantly dis-
guanidinium chloride.22–24 The two titration curves are placed in the direction of a folded state. Proteins dis-
then related to each other in absolute terms by measur- solved in solutions of low pH the temperatures of which
ing the moles of protons that must be added to a solution have been raised until no further change in optical rota-
to maintain a constant pH as the protein is unfolded by tion occurs will still display further changes when guani-
adding guanidinium chloride.24 In the acid region of the dinium chloride is added,5,34 even when no intra-
titration curves, the mean net proton charge number of molecular hydrogen bonds seem to remain in the
the unfolded state is usually greater than that of the denatured protein,34 and this observation suggests that
folded state, so the equilibrium constant for folding reversible thermal denaturation does not produce a
decreases as the pH is lowered (Equation 13–3). The random coil. Lowering the pH of a solution of a protein
decrease in KFd becomes more pronounced the more the without applying heat often leads to its denaturation, but
pH is lowered so that a larger and larger fraction of the the denatured state produced by acid alone usually also
carboxylates in the protein become involved in the titra- retains residual structure.35,36 When either the tempera-
tion.23 At low ionic strength, there also may be a small ture is increased or the pH is lowered, hydrophobic clus-
increase in the repulsion between the positively charged ters (Figure 6–21) in otherwise unfolded polypeptides
side chains in the compact native state, which destabi- should remain associated. This would account for the
lizes it relative to the denatured state.25 incomplete unfolding observed in these situations.
Because a side chain that is an acid–base often In any meaningful measurement of the properties
titrates anomalously when it becomes incorporated into of folding, the conditions must be such that the reaction
the native structure of a protein, its incorporation will remains reversible. When a concentrated solution of
affect the equilibrium constant for folding. If the side ovalbumin and lysozyme, otherwise known as the white
chain is a carboxylic acid, the effect of its incorporation of an egg, is heated, the polypeptides unfold but then
is most readily understood by considering the rapidly coagulate among themselves to form a white,
folding–unfolding at a pH low enough that it is fully pro- intractable, gelatinous precipitate. In the unfolded state
tonated in both the unfolded and the folded states. If it is produced initially by raising the temperature, otherwise
buried in the folded state so that its pKa is elevated,26 the buried hydrophobic amino acids on these polypeptides
carboxylic acid will lose its proton at a higher pH in the all become simultaneously exposed to the solution and
folded state than in the unfolded state. Consequently, as noncovalent intermolecular polymerization takes place.
the pH is raised, 6H,F will be greater than it would have There is little doubt that, in this example, a significant
been if the carboxylic acid had not been buried, and the portion if not the majority of the changes in standard
equilibrium constant for folding will be smaller than it enthalpy and standard entropy proceeding during this
would have been.27,28 If the carboxylic acid has its pKa process are those of the coagulation, which is of only
lowered by participating as the acceptor in one or more marginal interest.
hydrogen bonds in the folded state,29,30 it will lose its In all studies of protein folding, the first result pre-
proton at a lower pH in the folded state than in the sented should demonstrate the complete reversibility of
unfolded state, and the equilibrium constant for folding the reaction. Foldings perturbed by the addition of urea
will be larger than it would have been.31 For example, or guanidinium chloride usually are reversible. With
Aspartate 76 in ribonuclease T1 from Aspergillus oryzae larger proteins, the rates of renaturation from a concen-
participates in several hydrogen bonds in the native state trated solution of urea or guanidinium chloride, how-
that lower its pKa to 0.5, coincident with an increase in ever, can be slow; and if the concentration of denaturant
the equilibrium constant for its folding of a factor of is abruptly decreased by dilution, an otherwise reversible
400.30 A buried lysine, the pKa of which is lowered folding can become irreversible.37 Denaturation pro-
because it remains unprotonated in the folded state of a duced by acid is usually reversible because the denatured
protein,26 also decreases the stability of the folded state polypeptides are so positively charged that they will not
relative to the unfolded state32 for the same reasons that coagulate.36
a buried carboxylic acid does, but the argument begins Although a few foldings perturbed solely by
with a pH high enough that the lysine is unprotonated in increases in temperature are reversible at neutral pH,38,39
664 Folding and Assembly
most proceed irreversibly, usually with coagulation.40 absorption of ultraviolet light, its capacity to take up pro-
When thermal unfolding is performed in a scanning tons from the solution at constant pH,24 its absorption of
calorimeter, however, the solution is heated continu- heat at constant temperature,46 its elution volume on
ously while the absorption of excess heat is monitored. It chromatography by molecular exclusion,47 its elec-
is possible that, under these conditions, the transition trophoretic mobility,11 or its nuclear magnetic resonance
from folded protein to unfolded protein takes place in a absorptions48,49 is measured as a function of tempera-
short enough period that little coagulated protein accu- ture, pH, or the concentration of urea or guanidinium
mulates, and the reaction remains reversible during that chloride, changes indicative of a shift in the value of the
interval and only becomes irreversible upon coagulation equilibrium constant KFd for folding (Equation 13–1) are
at the higher temperatures experienced beyond the range observed (Figure 13–1A).50
of temperatures encompassing the transition to the dena- Each pair of experimental points (a square and a
tured state. It has been argued that the behavior of many circle) in Figure 13–1A represents a solution of cold
foldings in a scanning calorimeter—namely, the shapes shock-like protein from Thermotoga maritima at a par-
of the curves, the effects of ligands, and the moleculari- ticular temperature, pH, and concentration of guani-
ties of the apparent reactions—is that expected of a dinium chloride. Two different initial solutions of protein
simple reversible isomerization.41,42 At low pH (pH 2–3), were used. Either a solution of the native protein in the
however, most thermally perturbed foldings, even though absence of denaturant was diluted by mixing into a solu-
they would proceed with coagulation at higher pH, often tion of guanidinium chloride (open symbols) or a solu-
become reversible.43,44 Presumably, this is due to the fact tion of the unfolded protein in 5.5 M guanidinium
that coagulation is prevented by charge repulsion among chloride was diluted by mixing into a solution of guani-
the denatured polypeptides. It is usually observed that a dinium chloride (solid symbols). In each case, the mix-
polypeptide denatured thermally and reversibly at low ture was formulated to produce the noted final
pH will coagulate visibly and irreversibly as the pH is concentration of denaturant. One member of each pair
increased, and often the onset of this coagulation is found of points is the initial fluorescence of the solution imme-
to occur abruptly within a very narrow range of pH.45 diately after mixing (squares). For each final concentra-
When a physical property of a protein such as its tion of guanidinium chloride, the solution was allowed to
intrinsic viscosity, its sedimentation velocity, its optical reach equilibrium (circles), which was assumed to be the
rotation, its molar ellipticity, its intrinsic fluorescence, its state after all changes in fluorescence had ceased.
Figure 13–1: Shift of the equilibrium constant for the folding of the cold
A shock-like protein from Thermotoga maritima (naa = 66) in solutions of
2.8 guanidinium chloride.50 Solutions of cold shock-like protein and solu-
Fluorescence (V)
(squares) to correct for the short interval between mixing and the start of
the monitoring. Each isomerization was allowed to progress until the lack
10 of further changes in fluorescence indicated that equilibrium had been
reached. The fluorescence of the solution at equilibrium (circles) is plot-
kU + kF
Below a certain concentration of guanidinium chlo- concentration of random coil becomes sufficiently large
ride (about 2 M), there is, at equilibrium, a linear that it contributes significantly to the intrinsic fluores-
increase in the intrinsic fluorescence with decreasing cence. At this point, the isomerization of the folding
concentration of guanidinium chloride. Even in samples (Equation 13–1) continuously interconverts measurable
of native protein that eventually will unfold (Í), the quantities of native state and measurable quantities of
immediate magnitude of the fluorescence upon addition unfolded state in equilibrium with each other. As the
of guanidinium chloride, before unfolding commences, concentration of guanidinium chloride is increased fur-
falls upon this baseline. This baseline traces the pertur- ther, a greater fraction of the protein is in the unfolded
bation in the intrinsic fluorescence of the fully folded state until, finally, immeasurably small amounts of the
state due only to addition of the denaturant in the native state are present at equilibrium.
absence of any unfolding or after complete folding has A similar monotonic transition between the native
been achieved. state and a denatured state, the two in equilibrium with
Above a certain concentration of guanidinium each other, is observed when a series of solutions of a
chloride (about 4 M) there is, at equilibrium, a linear protein are each brought to a different temperature, as
increase in intrinsic fluorescence with increasing con- long as the thermal denaturation is reversible (Figure
centration of guanidinium chloride, presumed to reflect 13–2).53 In the example presented in the figure, the
the effect of increasing the concentration of denaturant increase in the stability of the denatured state relative to
on the intrinsic fluorescence of the unfolded polypep- that of the native state with decreasing pH, as defined by
tide. When fully unfolded protein is diluted into the Equation 13–3, is apparent in the shifts of the regions of
range of concentrations of guanidinium chloride where transition to lower temperatures as the pH of the solu-
it will fold (ˆ), the immediate fluorescence of the solution tion is lowered. Similar shifts of the region of transition
before folding commences also falls upon this other caused by either pH54 or temperature46 are observed
baseline. when the equilibrium between folded and unfolded
At intermediate concentrations of guanidinium states is being shifted with guanidinium chloride or urea.
chloride, in the region of transition, the observed magni- A monotonic transition reflecting the shift in equilibrium
tudes of the intrinsic fluorescence at equilibrium fall
between the extremes of fluorescence of the fully folded
protein and the fluorescence of the fully unfolded pro- 0
tein. The region of transition is that range of denaturant
concentration, pH, temperature, or pressure in which
measurable concentrations of both denatured and native –0.04
(L g –1 cm –1)
between the native state and a denatured state can also guanidinium chloride (Í) or the random coil in a con-
be observed by differential scanning calorimetry,39 pro- centrated solution of guanidinium chloride (ˆ) is trans-
vided the rate of temperature increase is slow enough ferred into a solution at that pH and final concentration
that equilibrium is reached at each temperature and the of guanidinium chloride. Either of these reactions is a
process remains reversible over the interval in which the special case of a general kinetic category referred to as an
region of transition is traversed.55,56 approach to equilibrium.
If the two-state assumption is made, the fluores- The approach to equilibrium of either the unfolding
cence of the solution (Fobs) observed at equilibrium at or the folding polypeptide, respectively, should be gov-
each concentration of guanidinium chloride in Figure erned only by the two composite first-order rate con-
13–1A is stants kF and kU (Equation 13–1) if the two-state
assumption is valid. Either the unfolded state or the
F obs = f F F 0,F + f U F 0,U (13–4) folded state, respectively, should be the exclusive prod-
uct formed in the two reactions as the equilibrium is
where F0,F is the intrinsic fluorescence that would be established. The rate at which the concentration of either
observed if all of the protein were fully folded, F0,U is the species, U or F, in Equation 13–1 changes is
intrinsic fluorescence that would be observed if all of the
protein were fully unfolded, fF is the fraction of the pro- d[ U ] d[ F ]
tein in the folded state, and fU is the fraction of the pro- – = = k F [U] – k U [F] (13–5)
dt dt
tein in the unfolded state, all at the particular
concentration of guanidinium chloride. Because F0,U and
Because the concentration of total protein, [protein]TOT,
F0,F at each concentration of guanidinium chloride are
remains constant, it follows that
known from the respective baselines and because fF + fU
= 1.0 as a result of the two-state assumption, fF and fU at
each concentration of guanidinium chloride can be cal- [U] + [ F ] = [ protein ]TOT = [ U ]eq + [ F ]eq
culated. From fF and fU, the equilibrium constant for fold- (13–6)
ing (KFd = fF/fU) can be determined for that concentration
of guanidinium chloride, temperature, and pH. The where [U]eq and [F]eq are the concentrations of native
same analysis can be applied to the behavior of any other state and random coil at equilibrium. Combining
physical property that is directly proportional to the Equations 13–5 and 13–6 and focusing on the concentra-
concentration of native protein and denatured protein, tion that is decreasing, arbitrarily chosen to be the
respectively, such as absorbance, optical rotation, circu- unfolded form for the following derivation, then
lar dichroism, or specific viscosity.
If the two-state assumption is correct and signifi- d[ U ]
cant concentrations of only the native state and the – = ( k F + k U)([ U ] – [ U ]eq ) – k U [ F ]eq + k F [ U ]eq
random coil are present at each concentration of guani- dt
dinium chloride, then the situation is dramatically sim- (13–7)
plified. It is, however, reasonable that this should be the
case. When a polypeptide folds from a random coil to At equilibrium no further changes occur in the concen-
form the native state under physiological conditions, it trations of either the native state or the random coil so
must pass through intermediate states between the
random coil and the native state. If, however, these inter- k U [ F ]eq = k F [ U ]eq (13–8)
mediate states were as stable or more stable than the
folded state, there would be significant, measurable con-
centrations of them at equilibrium, a possibility that has at all times, and, because [U]eq is a constant
rarely been observed and that would be unfortunate for
the protein in terms of both its function and its ability to d[ U ] (
d [ U ] – [ U ]eq )
avoid endopeptidolytic digestion. That these intermedi- – = – =
dt dt
ate states remain less stable than the native state as
guanidinium chloride is added to the solution is not sur-
prising, so long as they are about as compact as the native
( k F + k U ) ( [ U ] – [ U ]eq ) = k obs,F ([ U ] – [ U ]eq )
state. There are several observations which suggest that, (13–9)
in many cases, the two-state assumption is valid.
In the region of transition, a point on the curve in where kobs,F is the observed rate constant for the
Figure 13–1A should represent, if the two-state assump- approach to equilibrium during net folding.
tion is valid, an equilibrium mixture of only fully native Equation 13–9 is a simple first-order differential
protein and its random coil. The same equilibrium mix- equation in the variable ([U] – U]eq) and describes a first
ture forms when either the native state in the absence of order-process in this variable. Upon integration
Thermodynamics of Folding 667
[ U ]0 – [ U ]eq
(
= exp – k obs,F t ) (13–10) monitored by a physical property is completely general
and can be applied to any situation where the equilib-
rium can be described by two first-order rate constants,
forward and reverse. In the case of cold shock-like pro-
where [U]0 is the concentration of unfolded form at the
tein (Figure 13–1A), cleanly first-order, exponential
beginning of the approach to equilibrium. Because the
approaches to equilibrium were observed whether
process is symmetric (Equation 13–5), if unfolding were
unfolded protein (ˆ) was folded (2) or folded protein (Í)
being monitored rather than folding, the variable would
was unfolded (3).50 An uncomplicated first-order
have been ([F] – [F]eq) rather than ([U] – [U]eq) but the
approach to equilibrium is generally accepted as sup-
observed rate constant for the approach to equilib-
port for a two-state assumption.54
rium, kobs,U, would still be (kF + kU).
The observed rate constants for these approaches
Suppose one were to monitor any physical property
to equilibrium for cold shock-like protein (Figure
Y, such as intrinsic fluorescence, absorbance, optical
13–1B)50 have identical values at the same concentra-
rotation, circular dichroism, or specific viscosity, that is
tions of guanidinium chloride whether the equilibrium
directly proportional to the concentration of unfolded
was approached in the direction of folding (2) or in the
state and directly proportional to the concentration of
direction of unfolding (3), another observation consis-
folded state, respectively. The observed magnitude of
tent with the two-state assumption. Furthermore, the
that physical property for any solution containing a mix-
observed rate constants for the approach to equilibrium
ture of unfolded state and folded state at any time
decrease smoothly as the concentration of guanidinium
chloride is increased until the region of transition is
Y obs = z U [ U ] + z F [ F ] (13–11) reached and increase smoothly with guanidinium chlo-
ride beyond the region of transition, as expected if only
where zU and zF are the constants of proportionality one rate constant, kF, dominates in the former portion of
between the concentration of each species and its the plot and another, kU, dominates in the latter. Such
respective contribution to the overall magnitude of the uncomplicated behavior of these observed rate con-
physical property for the mixture. When Equation 13–11 stants is also presented as evidence for two-state behav-
is combined with Equation 13–6, ior. In the region of transition, both kF and kU contribute
to the observed rate constant for the approach to equi-
librium kF + kU.
(
Y obs – Y obs,eq = z U – z F ) [ U ] – ( z U – z F ) [ U ]eq It is the decrease in kF and the increase in kU
(dashed lines in Figure 13–1B) with increasing concen-
(13–12) trations of guanidinium chloride that together shift the
equilibrium constant into the measurable range. As the
where Yobs,eq is the magnitude of that physical property at guanidinium chloride has its greatest effect on the stabil-
equilibrium. It follows from Equation 13–10 and 13–12 ity of the unfolded state, it is not surprising that kF is
that affected more than kU.
Often conditions are purposely chosen to ensure
Y obs – Y obs,eq that the approaches to equilibrium of the folding reac-
Y obs,0 – Y obs,eq
(
= exp – k obs,F t ) (13–13) tion in the region of the transition between fully native
and fully denatured protein are simple first-order
processes. Under other conditions of temperature, pH,
Equation 13–13 states that the fraction of the total or concentration of denaturant, either folding or unfold-
change that occurs in the magnitude of a particular phys- ing or both are not first-order processes and these condi-
ical property monitoring the isomerization between tions are avoided. For example, the equilibrium constant
unfolded state and folded state of a polypeptide should for folding of myoglobin at 25 ∞C is shifted into the meas-
decrease exponentially as a function of time if during urable range at pH 4.2. At this pH both the folding and
that isomerization only two states are significantly pop- unfolding of the polypeptide are first-order processes.57
ulated, namely, the unfolded state and the folded state. They both proceed with clean isosbestic points in the
Again, because the process is symmetric, if the deriva- Soret region of the visible spectrum, and this also indi-
tion just presented had been based on [F] instead of [U] cates that both folding and unfolding at this pH are two-
because unfolding was being monitored rather than fold- state processes. At higher or lower values of pH, however,
ing, the same outcome would have occurred, and the kinetics of both the folding and unfolding reactions
because kobs,F = kobs,U = kU + kF, Equation 13–13 is the become complex.
same whether net folding is progressing from an initially If a point in the region of transition in Figure 13–1A
unfolded state or net unfolding is progressing from an represents a mixture containing only the native state and
initially folded state. the random coil, then Equation 13–4, with appropriate
668 Folding and Assembly
tion was monitored by circular dichroism, sedimentation lack of coincidence of the curves for different physical
velocity, fluorescence emission at 320 nm, and fluores- properties.64,65 For example, when the shift in the equi-
cence emission at 360 nm, the changes observed differed librium between the native state and the unfolded state
dramatically. Each curve, however, showed an obvious of bovine a-lactalbumin (Figure 13–4B)66 was followed
plateau at intermediate concentrations of urea, consis-
tent with an almost completely populated intermediate
state in this range. The kinetics of both folding and 1.0 A
unfolding over the entire range of concentrations of urea
displayed two distinct exponential phases with different
rate constants during the approach to equilibrium, and 0.5
these two sets of observed rate constants when plotted
against the concentration of urea gave two separate,
Fractional change
overlapping curves, each resembling the one in Figure
13–1B. The plot for the observed rate constants of the iso- 0
merization between the unfolded state and the interme-
diate was displaced to higher concentrations of urea than 1.0 1.6
the one for the isomerization between intermediate and
the native state. Consequently, it was concluded that this 1.8
s 020,w (S)
unfolding promoted by urea is a three-state process with
a discrete intermediate state that is almost completely 0.5 2.0
populated at concentrations of urea between 3 and 4 M.
In situations such as the one described, in which a 2.2
discrete intermediate is thought to exist, the curves plot-
ting the transition from native state to fully unfolded 0 2.4
state as a function of the concentration of denaturant
often show an obvious plateau or inflection,59,61 and 0 2 4 6 8
these curves can be fit by equations similar to Equation [Urea] (M)
13–4 based on a three-state assumption to obtain the
fraction of native state, intermediate state, and unfolded
state at each concentration of denaturant. From these 1.0 B
fractions, equilibrium constants for the isomerizations
Fractional change
by circular dichroism at 222 nm (2), the transition fully unfolded states that is being monitored by a partic-
observed did not coincide with the one measured by cir- ular physical property, fF = fU (Equation 13–4); fU = "; and,
cular dichroism at 270 and 296 nm (3, D). instead of KFd being equal to 1 at this point, as with a
The shift in the equilibrium constant for the folding monomer, KFd for a dimer is equal to [polypeptide]TOT–1.
of bovine carbonate dehydratase has been followed by Consequently, as the total concentration of protein is
changes in circular dichroism at 269 nm, ultraviolet increased, the equilibrium constant between unfolded
absorption at 290 nm, and optical rotation at 400 nm as and folded states must be shifted more and more before
a function of the concentration of guanidinium chloride the midpoint is reached, which requires a greater and
at pH 7.0.67 The circular dichroism smoothly traced one greater perturbation. If there are only the two states, as
transition and the optical rotation smoothly traced the total concentration of protein is increased, the mid-
another transition proceeding at a higher concentration points of the curves describing the transition between
of guanidinium chloride. The change in absorbance folded dimer and unfolded monomer move systemati-
traced a curve between the other two that displayed an cally to higher and higher concentrations of guanidinium
inflection, suggesting that it was able to monitor both chloride63,70–72 or urea73 or to higher temperatures.38,74
transitions. Furthermore, the kinetics of the refolding of Because it is only the molecularity of the reaction that
the random coil was not a homogeneous, first-order causes these shifts in the curves with the concentration
process. It was concluded that one or more stable con- of protein, they are no longer observed when the dimer is
formers of the polypeptide of carbonate dehydratase, artificially converted to a monomer by joining the car-
other than the fully folded state and the random coil, are boxy terminus of one of its subunits with the amino ter-
present in solutions of guanidinium chloride between 2 minus of the other.74,75 Even greater shifts with the
and 3 M in concentration. The properties of these other concentration of protein are observed in the curves fol-
states are distinct from those of either the native state or lowing folding of higher oligomers76 as a function of the
the random coil. From observations of different transi- perturbation.
tions with different physical properties, the fractions of Often when the equilibrium between folded
native, intermediate, and unfolded states as a function of oligomer and unfolded monomer is shifted by a pertur-
the concentration of denaturant can also be estimated,68 bation, stable intermediate states are formed. For exam-
even in situations where very little intermediate forms,69 ple, during the increase in the perturbation, a dimer may
and equilibrium constants among the states can be cal- dissociate to monomers before the polypeptides unfold,
culated. and if the physical property detects only unfolding, the
If the protein being unfolded or refolded is an curves following the transition between the folded state
oligomer of two or more subunits, the respective disso- and the unfolded state will not shift as the concentration
ciation or association of those subunits causes the tran- of protein is increased38 because the formation of the
sition between folded state and unfolded state to depend monomeric intermediate goes undetected. Similarly, a
on the concentration of protein in the solution. Suppose tetramer can dissociate into dimers before the dimers
that the native protein is an a2 dimer. The equilibrium unfold to monomers.72 In some cases, the intermediate
between the unfolded state aU and the native state (aF)2 state is detected. In one such instance, the curves
is showed a plateau as in Figure 13–4A, but because both
the native protein and the intermediate were dimers, it
2a U 1 (a F) 2 (13–14) was only the portion of the curve monitoring the transi-
tion from the dimeric intermediate to the unfolded
monomer that shifted with concentration of protein.73
and because When the protein binds a ligand, the addition of the
ligand also causes the curves following the transition
[polypeptide]TOT = 2[(a F) 2] + [a U] (13–15) between the native state and the unfolded state to shift to
higher levels of perturbation. Because only the folded
protein can bind the ligand, L, if the ligand is present at
if there are only two states present, native dimer and
saturation so that only liganded native protein is present
unfolded monomer, at equilibrium
at all concentrations of denaturant
[(a F) 2] 1 – fU K AFd
K Fd = = (13–16) U + L 1 F·L (13–17)
[a U]2 2fU2
[polypeptide]TOT
and
As the magnitude of the perturbation is increased,
the equilibrium constant between folded and unfolded [F·L] fF
state is shifted in the direction of the unfolded state. At K ¢Fd = = (13–18)
the midpoint of the transition between fully folded and [ U ][ L ] f U [ L ]
Thermodynamics of Folding 671
( )
Consequently, as the concentration of ligand is increased
! DH ªFd
beyond its level of saturation, a greater perturbation is = DC ªp,Fd (13–20)
required to shift the equilibrium to a point at which fF = !T P
fU, and the curves move toward greater perturbation, for
example to higher concentrations of guanidinium chlo- The change in standard heat capacity can also be meas-
ride, as the concentration of ligand is increased. It is also ured directly in a calorimeter81 or by combining meas-
the case that, at the same concentrations, a ligand with a urements of unfolding in solutions of urea and with
smaller dissociation constant will shift the curve a temperature in a different manner.82
greater distance than one with a larger dissociation con- From the observations presented in Figure 13–5B, it
stant.77 could be calculated that the change in standard heat
The change in standard enthalpy of folding, DH∞Fd, capacity78 for the folding of the polypeptide of
can be measured directly in a differential scanning b-lactoglobulin (naa = 162) between 5.5 and 4.4 M urea,
calorimeter39,41 or it can be calculated from the depend- pH 2.5 and 3.2, and 15 and 50 ∞C is –8700 ± 700 J K–1 mol–1,
ence of the equilibrium constant of folding, KFd, on tem- or –54 J K–1 (mol of amino acid)–1. The measured values
perature. From the van’t Hoff relationship for the changes in standard heat capacity for the folding
of proteins composed of a single polypeptide and lacking
cystines are –60 ± 10 J K–1 (mol of amino acid)–1, regard-
! ln K Fd DH ªFd less of the perturbation used to shift the equilib-
= – rium.78,79,82–87
()
(13–19)
1 R
! Unlike the changes in standard entropy and stan-
T P dard enthalpy that vary considerably from situation to
situation, this uniform decrease in standard heat capac-
ity seems to be a fundamental property of folding. It must
If a folding is followed as the temperature is varied and arise from a combination of the decrease in heat capac-
the value of the logarithm of the equilibrium constant for ity that occurs when hydrophobic amino acids are trans-
folding is plotted as a function of T –1, the slope of the plot ferred from the aqueous phase into the interior of the
will be directly proportional to the change in standard molecule of protein,88 the increase in heat capacity that
enthalpy. When the folding of b-lactoglobulin was made arises from the desolvation of polar amino acids when
reversible and kinetically first-order in both directions by they are transferred into the interior,89 and the decrease
adding appropriate concentrations of urea and adjusting in conformational heat capacity that occurs when vibra-
the pH to 3, the equilibrium constant for folding, KFd, tions and rotations along the polypeptide become more
could be measured at each concentration of urea for hindered after it is folded.83 Of the three contributors,
temperatures between 10 and 50 ∞C. The behavior of however, the difference in conformational heat capacity
log KFd as a function of T–1 (Figure 13–5A)78 demonstrates between the native state and the random coil may not be
that the change in standard enthalpy for the reaction, very significant because the observed heat capacity of an
DH∞Fd, is not constant but varies considerably with tem- unfolded polypeptide is quite close to the heat capacity
perature. calculated only from the individual side chains and the
In fact, the values of the change in standard individual peptide bonds composing that polypep-
enthalpy for folding, DH∞Fd (Figure 13–5B),78 calculated tide.85,90,91
from the slopes of this first plot, vary from exothermic to The value for the change in standard heat capacity
endothermic over the range of temperatures sampled, a of folding is consistent with the decrease in standard heat
fact suggesting that the change in standard enthalpy for capacity (–200 to –400 J K–1 mol–1) observed for the trans-
folding is by itself uninformative. Furthermore, the fer of alkanes and arenes from water into an organic
change in standard enthalpy for folding of a series of phase (Table 5–8) if it is recalled that hydrophobic amino
mutants of the same protein is usually linearly related to acids make up only a fraction (30%) of the amino acids in
the change in standard entropy (Equation 5–63) with a a polypeptide and that many of them remain accessible
slope Tc of about 350 K.79,80 Although the slope is some- to water in the native state after the protein has folded.
what greater than most other noncovalent processes, the These considerations suggest that the uniform decrease
compensation observed suggests that as with the in standard heat capacity [–60 J K–1 (mol of amino acid)–1]
hydrophobic effect both standard enthalpy and standard associated with the folding of a polypeptide is one of the
entropy are registering mainly compensatory changes in few signatures of the hydrophobic effect arising from the
the water. removal of hydrophobic amino acids from the solvent
When the change in standard enthalpy, DH∞Fd, is during their burial in the interior of the native state upon
plotted against the temperature (Figure 13–5B), the slope folding. The hydrophobic effect is the only noncovalent
of the relationship observed at each point is the standard force that can provide a significant favorable contribu-
heat capacity change of folding, DC ∞p,Fd, for that temper- tion to the standard free energy of folding.
ature Although not always the case,79 it has been pointed
672 Folding and Assembly
requires that the solvated denatured state have a smaller mate the value at pH 7 and 25 ∞C of this equilibrium con-
volume than the solvated native state because stant in the absence of any perturbation. This is gener-
ally accomplished by extrapolation102 from realms of pH,
( )
! DG ªFd temperature, pressure, and concentrations of urea and
= DV ªFd = V F – V U (13–21) guanidinium chloride where measurements can be
!P T made.
Various equations have been derived for extrapolat-
where DG∞Fd is the standard free energy of folding ing values of the equilibrium constant KFd to small or zero
(–RT ln KFd), DV ∞Fd is the standard volume change of concentrations of urea and guanidinium chloride103,104
folding, VF is the molar volume of the folded state, and VU and from acidic to neutral pH.104 Most of these equations
is the molar volume of the unfolded state. The volume plot the observed standard free energies of folding,
changes for folding calculated from these results are pos- DG∞Fd, as functions of the magnitude of the perturbation
itive as expected. At pH 2.0 and 0 ∞C the volume change to perform the extrapolations. It is also possible to per-
for folding of bovine ribonuclease A97 is +48 cm3 mol–1 form nonlinear least-squares fits of empirical equations
and that for bovine chymotrypsinogen98 is +14 cm3 mol–1, to plots of the directly observed changes of a physical
while that for myoglobin99 from pH 4 to 6 at 20 ∞C is property as a function of denaturant.105 Unfortunately,
+92 ± 5 cm3 mol–1. each theoretical curve, although it is successful at repro-
The changes in isoentropic compressibility of ducing the behavior in the measurable regions, deviates
folding from the other theoretical curves beyond the measurable
regions. The values for the standard free energy of fold-
( ) ( )
!VF ing measured both at elevated temperatures and in the
1 1 !VU (13–22)
D k S,Fd = – – presence of urea can be extrapolated simultaneously to
VF !P S V U !P S obtain an estimate of the value for standard free energy
of folding at 25 ∞C in the absence of urea.86 Extrapolation
are more informative. The isoentropic changes in com- both from high concentrations of guanidinium chloride
pressibility of folding for ribonuclease A and chy- and from low pH can also be performed simultane-
motrypsinogen97,98 are both about –0.015 GPa–1. The ously.106 It is also possible to measure the thermal
negative values for these isoentropic compressibilities unfolding of a protein in a differential scanning
indicate that the solvated denatured state is more com- calorimeter at a series of concentrations of urea below
pressible than the native state. This is not a surprising the range of concentrations at which the transition is
result because the isoentropic compressibilities of native observed at 25 ∞C.
proteins are very small, 10-fold smaller than those of The extrapolation that has become most widely
organic liquids and 2-fold smaller than those of amor- accepted is one for values of standard free energies of
phous organic solids.100 The greater compressibility of folding, DG∞Fd, observed in solutions of guanidinium
the denatured state is probably due in part to its more chloride or urea, and the equation for performing this
fluid structure, but it is also possible that the hydropho- extrapolation that has emerged as the most popular is107
bic functional groups revealed in the denatured state
increase the structure of the water surrounding them and DG ªFd,[D] = DG ªFd,H2O – m [D] (13–23)
thereby increase its compressibility. This increase in the
structure of water, if it is significant, would resemble the
increase in its structure caused by decreasing the tem- where DG∞Fd,[D] is the standard free energy of folding cal-
perature, and decreasing the temperature of liquid water culated (Equation 5–14) from the observed equilibrium
increases its compressibility (Figure 5–5). constant at a given concentration of the denaturant;
High pressures also are able to dissociate multi- DG∞Fd,H2O is the standard free energy of folding for the
meric proteins into monomers, reversibly, without caus- protein in aqueous solution at the same pH, ionic
ing denaturation, even at neutral pH. The volume change strength, and temperature; and m is the slope of a line
is small; in the case of enolase at 10 ∞C and pH 7.4, that is fit to the observations. This equation states that
DV ∞ = 0.025 cm3 (mol of amino acid)–1. Presumably the the standard free energy of folding is a simple linear
individual volume changes occur only at the faces of the function of the concentration of denaturant, which
subunits that are exposed during the dissociation.101 seems to ignore the observation that the changes in sol-
To be able to calculate the equilibrium constant KFd vation brought about by guanidinium chloride and urea
for folding from the measured concentrations of the two (Table 13–1) are not directly proportional to their molar
states of the protein, it must be decreased significantly by concentrations.8,12,25,103,108
one or a combination of rather unphysiological pertur- Nevertheless, there are many observations support-
bations such as increasing the temperature or pressure, ing the validity of this relationship. In situations where
lowering the pH, or adding guanidinium chloride or urea there are wide ranges of the concentration of denaturant
to the solution. It would be of interest to be able to esti- over which the equilibrium constant for folding can be
674 Folding and Assembly
measured accurately (Figure 13–6),109,110 the standard free stants, values for kF and kU in the absence of denaturant
energies of folding do in fact vary linearly with the con- can be estimated by extrapolation (dashed lines in Figure
centration of denaturant over the entire range of meas- 13–1B). If the folding is a two-state process, the equilib-
urements.* Most of the time, however, the range of rium constant for folding calculated from the estimates of
concentrations of denaturant over which measurements these two rate constants (Equation 13–2) gives a value for
of the equilibrium constants for folding can be made is the standard free energy of folding that agrees50,112 with
much narrower (Figure 13–7).105 Extrapolations of stan- that obtained by use of Equation 13–23.
dard free energies of folding perturbed by different denat-
urants give the same value for DG∞Fd,H2O, in spite of the long
distances over which those extrapolations must be 2
made.105,107,111 Standard free energies of folding beyond the A
(mM–1 cm–1)
range of denaturant concentrations in which they can be 0
directly measured can be estimated from measurements –2
of unfolding induced by raising the temperature in a dif-
ferential scanning calorimeter, and these estimates usu- –4
ally fall close to the line of extrapolation.111 Measurements
of the first-order rate constants for the approach to equi- –6
D e 293
librium can be made for the entire range of concentrations
of denaturant, and from a plot of these observed rate con- –8
–10
0 2 4 6 8 10
[Denaturant] (M)
10
B
(kJ mol –1)
5
DG ªFd,[D] (kJ mol –1)
0
0
60
DG ªFd,[urea]
–5 –20
F 342
40
–10
20
0 2 4 6 8 –40
–15 [Urea] (M) 0 1 2 3 4 5
0 2 4 6 [Denaturant] (M)
[Urea] (M) Figure 13–7: Estimation of the standard free energy of folding by
Figure 13–6: Estimation of the standard free energy of folding in extrapolating standard free energies of folding observed in solu-
the absence of denaturant by extrapolation. Cold shock protein tions of different denaturants.105 (A) Shifts of the equilibrium con-
CspB (naa = 67) from Bacillus subtilis109 was dissolved in a series of stants of folding. Bovine chymotrypsin (naa = 241), which had been
solutions of increasing molar concentrations of urea at pH 7 and sulfonylated with phenylmethanesulfonyl fluoride to inactivate the
25 ∞C. Two final concentrations of protein, 1.35 mM and 13.5 mM, endopeptidase, was dissolved in solutions of different concentra-
were used. The emission of fluorescence of each solution, F342, was tions of urea (3), 1,3-dimethylurea (䉭), and guanidinium chlo-
monitored at 342 nm upon excitation at 280 nm (inset). After equi- ride (Í) at pH 4.0 and 25 ∞C. The several folding isomerizations
librium was reached, the equilibrium constant for folding was esti- were monitored by the change in extinction coefficient at 293 nm
mated for each concentration of urea, and from these equilibrium (De293; millimolar–1 centimeter–1), which is plotted as a function of
constants, the respective standard free energies of folding the concentration (molar) of urea. In the respective regions of tran-
DG∞Fd,[urea] were calculated. These standard free energies of folding sition, the fraction of the protein in the folded state and the fraction
(kilojoules mole–1) are plotted as a function of the concentration of in the unfolded state were calculated from the distance of each data
urea (molar). A line was fit to the data by linear least-squares analy- point from the values of the change in absorbance for the fully
sis. The dashed line at zero (equilibrium constant of 1) emphasizes folded (upper dashed lines) and the fully unfolded (lower dashed
that direct measurements of the equilibrium constant can usually lines) states. Equilibrium constants for folding were calculated
be made only over a limited range. Reprinted with permission from from these fractions for each concentration of denaturant, and
ref 109. Copyright 1995 Nature Publishing Group. from each of these equilibrium constants, standard free energies of
folding DG∞Fd,[D] were calculated at the respective concentrations of
denaturant. (B) Standard free energies of folding (kilojoules mole–1)
plotted against the respective concentrations (molar) of each
* The ionic strength of the solution must be maintained as the con- denaturant. Each of the lines was fit to the respective set of data by
centration of guanidinium chloride is decreased to retain linear linear least-squares analysis. Reprinted with permission from
behavior of DG∞Fd,[GdmCl].111 ref 105. Copyright 1988 American Chemical Society.
Thermodynamics of Folding 675
That the numerical value for DG∞Fd,H2O, obtained by the extrapolated value. If it is assumed that the reduction
use of Equation 13–23 is a reasonable estimate of its of this cystine in the absence of guanidinium chloride
actual value can also be demonstrated by evaluating the occurs only when the native protein is briefly and
effect of pH on its magnitude. Acid–base titration curves reversibly a random coil, the difference between the rate
for the folded state of the protein and its unfolded state of reduction for the native state and that estimated for
in 8 M urea or 6 M guanidinium chloride can be meas- the random coil is consistent with a value for the stan-
ured directly23,24 or calculated from its composition of dard free energy of folding of –26 kJ mol–1. This is close to
amino acids,25 and an integrated form of Equation 13–333 the value (–30 kJ mol–1) obtained by extrapolation from
can be used to calculate the variation expected in ranges of guanidinium chloride concentrations in which
DG∞Fd,H2O caused by changes in pH. These calculated vari- the equilibrium constant can be measured.
ations reproduced the observed variations with pH of Measurements of proton exchange also support an
estimates of DG∞Fd,H2O obtained by the extrapolation extrapolation of standard free energy of folding that is
defined by Equation 13–23 for bovine ribonuclease A23,25 linear in the concentration of denaturant. The amido
and bovine chymotrypsin24 when they were unfolded in protons of peptide bonds buried deeply in the interior of
solutions of urea and guanidinium chloride. a protein, when they exchange at the EX2 limit (Equation
The observed changes in the dependence of 12–63), often register a conformational change that is the
DG∞Fd,H2O on pH caused by site-directed mutation of a global unfolding and folding of the protein.116
particular amino acid in the protein also agree quantita- Consequently, in these situations, Kconf (Equation
tively with those calculated with integrated forms of 12–63)* is actually KFd–1. When the standard free energy
Equation 13–3. The observed values of DG∞Fd,H2O between of this conformational change revealed by proton
pH 5 and 8 for the carboxy-terminal domain of protein L9 exchange, DG∞HX, is monitored, it is found to be a linear
from the 50S subunit of the ribosome of E. coli fell on the function of the concentration of the denaturant (Figure
curve calculated with an integrated form of Equation 13–8).116,117 In the case of cysteineless type I
13–3 by use of the values of pKa for its four histidines in ribonuclease H from E. coli, only Methionine 47 is
the native state as determined directly by nuclear mag- buried deeply enough to respond only to the global
netic resonance.113 When each of these histidines was unfolding and folding over the range of rates that could
mutated in turn, the observed changes in the behavior of be measured, but the change in standard free energy of
DG∞Fd,H2O were again those predicted from their individ- the conformational change that it monitors remains a
ual values of pKa. In particular, the mutation of Histidine linear function of the concentration of guanidinium
134, which is buried in the interior and has the lowest pKa chloride to concentrations well below those at which
of the histidines in the native protein, caused the great- global folding can be monitored directly (inset in Figure
est change in the observed pH dependence of DG∞Fd,H2O. 13–8). The range of values for proton exchange that can
Aspartate 26 is buried in the native state of thioredoxin be measured for deeply buried amido hydrogens can be
from E. coli, which causes its pKa to be 7.5, a fact that extended by raising the temperature, and at higher tem-
destabilizes the folded state relative to the unfolded perature, they remain linear functions of the concentra-
(Equation 13–3). The destabilization calculated from the tion of guanidinium chloride until none is left in the
difference in the values of pKa for just Aspartate 26 is solution.116 Furthermore, the standard free energies of
equal to the difference in the values of DG∞Fd,H2O esti- folding in water, DG∞Fd,H2O, estimated from these plots of
mated by Equation 13–23 for the wild-type protein and a DG∞HX as a function of the concentration of denaturant,
mutant in which Aspartate 26 is replaced by alanine.27 agree satisfactorily with values of standard free energies
Differences in DG∞Fd,H2O calculated from observed shifts of folding in water estimated from linear extrapolations
in the values of pKa for histidines in ribonuclease T1 from of standard free energies of folding calculated from
A. oryzae caused by mutation of Glutamate 58 to alanine direct measurements of the equilibrium constant for
also agreed with differences in values of DG∞Fd,H2O esti- folding at higher concentrations of denaturants (inset to
mated with the extrapolation of Equation 13–23.114 Figure 13–8).118
The constant fragment CL of the light chain of There are several observations, however, suggest-
immunoglobulin G (Figure 11–1) is a small protein, the ing that the standard free energy of folding of at least
folding of which as a function of the concentration of some proteins that fold in a two-state process may not be
guanidinium chloride has been measured.115 The protein a linear function of the concentration of denaturant all
contains a deeply buried cystine that is readily reduced
by dithiothreitol when it is unfolded in solutions of
guanidinium chloride. The rate of its reduction in the * By convention, the equilibrium constant for the conformational
random coil in the absence of guanidinium chloride can change producing exchange is defined for the opening of the
structure, while the equilibrium constant for folding is defined
be estimated by extrapolation of its rate of reduction at
reciprocally, namely, for the closing of the structure. Consequently,
higher concentrations of guanidinium chloride. The KFd–1 = Kconf and DG∞Fd = –DGHX when the conformational change
actual rate of its reduction in the absence of guanidinium being monitored by the exchange is the global unfolding and
chloride when the protein is folded is much slower than folding of the protein.
676 Folding and Assembly
fF
fi fi
fi fi fi
(kJ mol –1)
( ) ( )
fi
fi
30 0 1 2 3 T T
fi
[GdmCl] (M) DH ªFd,m – T DC ªp,Fd ln
Tm Tm
fi (13–24)
fi
DG ªHX
tide131 is on the order of 10 s–1 at 25 ∞C and the equilib- extents, occurring in different locations, some involving
rium constant KFd for folding is on the order of 108, then isomerizations retaining compact globular structure,
the observed rate constant for the unfolding of a native others involving large, rapid expansions into the solvent
protein to the random coil (kU = kF/KFd) must be on the followed by a collapse back into the native state.139
order of 10–7 s–1. This would state that a protein has a 50% Because a polypeptide can fold in the first place and
chance of unfolding to the random coil every 100 days at because it must refold in part or in its entirety during the
25 ∞C. This is not a major problem in the life of a protein. span of its life, the information dictating the final native
Measurements of the equilibrium constants Kconf state of the protein must be contained within its amino
for the conformational changes permitting the exchange acid sequence. Because the standard free energy of fold-
of amido protons at the EX2 limit (Equation 12–63) are ing of most proteins is not a large negative number
consistent with the proposal that the most deeply buried (Table 13–2), perhaps for cause, if some of the informa-
positions in the polypeptide backbone exchange only tion is lost or misinformation is added, the protein will
upon its complete unfolding (Figure 13–8). The confor- not fold. For example, whenever the sequence of a pro-
mational equilibrium constants governing the exchange tein is changed by site-directed mutation, the possibility
rates for the less deeply buried amido protons, however, exists that the mutant will not fold, for reasons that will
are larger than the one for the most deeply buried posi- never be learned. Many site-directed mutations, how-
tions and are spread over a range of values.132–135 The ever, have little effect on the ability of the protein to fold,
larger values for these other conformational equilibrium and in a few instances, a site-directed mutation has been
constants, which produce faster rates of exchange, are found to increase the stability of a protein. For example,
the result of conformational changes confined only to when the amino acids at positions 40–49 of lysozyme
portions of structure of the protein, for example to indi- from bacteriophage T4 were all replaced with alanines,140
vidual a helices or loops of random meander,116,132,136,137 its standard free energy of folding increased by only 10 kJ
rather than the result of the fundamental unfolding mol–1, while the appropriate replacement of five of the
encompassing the entire structure. amino acids in type I ribonuclease H from E. coli129
These local conformational changes appear to be decreased its standard free energy of folding by 20 kJ
of two types: those involving considerable exposure of mol–1.
nonpolar functional groups to the solvent, similar to the Incomplete polypeptides often lack sufficient
exposure experienced during global unfolding, and those information to fold properly. A form of the polypeptide
involving exposure of the amido protons to be exchanged of bovine ribonuclease A (naa = 124) that is missing the
without any significant expansion of the local structure last six amino acids is unable to produce a folded pro-
into the solvent.138 The former are recognized by the tein with enzymatic activity, and what structure it does
increase in their equilibrium constants produced by have at 20 ∞C is eliminated by heating to only 40 ∞C at
adding guanidinium chloride or urea; the latter, by the pH 7.5 in the absence of denaturants.141 This truncated
insensitivity of their equilibrium constants to the addi- polypeptide is also susceptible to endopeptidolytic
tion of these denaturants.116,132,137 For example, above degradation, unlike the intact native protein. When the
0.8 M guanidinium chloride, the a amido protons on last 23 amino acids, which form only a small number of
Glutamine 105 and Alanine 110 of cysteineless type I contacts with the bulk of the folded polypeptide in its
ribonuclease H (Figure 13–8) must exchange during a crystallographic molecular model, are removed from the
major unfolding of the protein because the standard free polypeptide of micrococcal nuclease (naa = 149), the
energy for the conformational change permitting their polypeptide produced is a random coil by the criteria of
exchange decreases significantly as the concentration of circular dichroism, optical rotation, and ultraviolet
guanidinium chloride increases. At lower concentrations absorption.142 It is also readily digested by trypsin,
of guanidinium chloride, however, their respective rates unlike the native enzyme. Its residual enzymatic activity
of exchange are governed by other conformational of 0.1%, which is an intrinsic property of the shortened
changes the equilibrium constants for which are unaf- polypeptide,143 suggests that it can still fold properly to
fected by the concentration of guanidinium chloride. form an active enzyme but that the equilibrium con-
These other conformational changes, therefore, must stant for folding is displaced heavily (KFd ≤ 10–3) in the
not involve significant increases in the exposure of the direction of the random coil. When the first 12 amino
polypeptide to the solvent. Because they are unaffected acids and the last 9 amino acids are removed from the
by the concentration of guanidinium chloride, the equi- protein, it folds partially to form a state in which some of
librium constants for these other conformational its normal secondary structures are formed but in low
changes become larger than the equilibrium constant for yield.144,145
the major unfolding at low concentrations of the denat- Another set of examples of the fact that a polypep-
urant. tide can fold only when all the necessary information is
These observations indicate that in solution, in its present is proteins that are posttranslationally modified
native state, the structure of a protein is constantly fluc- during their natural maturation. In many instances, the
tuating as a result of conformational changes of various polypeptide that folds to produce the native state is
Thermodynamics of Folding 679
longer than the final product because the initial folded isomerase from S. cerevisiae (fragments of 174 and 59 aa;
form is clipped, and the smaller piece or pieces resulting yield 50%),155 fragments of penicillin amidase from E. coli
from the posttranslational clipping of the polypeptide (fragments of 209 and 557 aa; yield 60%),156 and frag-
dissociate.146 For example, subtilisin E from Bacillus sub- ments of porcine 3-oxoacid CoA-transferase (fragments
tilis folds naturally when it is a polypeptide 352 amino of 250 and 270 aa; yield 85%).157 In the case of the ribonu-
acids in length. After it folds, it is posttranslationally clease, each fragment was a random coil in the absence
modified. During this process, the peptide bond after of the other, but in the cases of the isomerase and the
Tyrosine 77 is cleaved, and the first 77 amino acids of the amidase, one of the two fragments refolded on its own to
polypeptide, the prosequence, are lost. If the mature, form a compact structure. The two fragments of the
enzymatically active form of the protein (naa = 275) is transferase that were chosen for expression are structural
unfolded, it will not refold; but if the full-length polypep- domains in its crystallographic molecular model, and
tide (naa = 352) is unfolded in 6 M guanidinium chloride, both formed compact structures in the absence of the
it readily refolds to produce the native state.147 If the other, but neither formed the structure it has in the intact
prosequence (naa = 77) is included in the solution when protein. None of the fragments from any of these pro-
the mature protein is being refolded, considerable native teins had enzymatic activity on its own, and it was only
state is recovered.148 The yield of enzymatic activity is low upon mixing the two respective fragments that activity
but increases as the concentration of prosequence is was regained.
increased up to a molar excess of 4-fold.149 Only when the When a protein is split into two fragments and the
complete amino acid sequence of the longer polypeptide separated, incompetent fragments are mixed together in
is intact, however, is there sufficient information to pro- the hope of regenerating the native state of the protein,
duce a high yield of the mature form. Once folded and the situation is complicated by the fact that the frag-
posttranslationally modified, the mature protein is stable ments must associate with each other. For example, the
and biologically competent, as long as it is not unfolded. complex of the fragments of ribonuclease from B. amy-
In the case of carboxypeptidase C from Saccharomyces loliquifaciens and the complex of the fragments of phos-
cerevisiae, however, the mature posttranslationally mod- phoribosylanthanilate isomerase from S. cerevisiae had
ified form of the protein (naa = 421) is considerably more dissociation constants of 0.4 mM and 0.2 mM, respec-
resistant to the effects of guanidinium chloride than is its tively, so the fragments had to be present at concentra-
intact precursor (naa = 512), a result suggesting that the tions in excess of these dissociation constants for the full
prosequence provides information rather than standard yield of the native state to be regained.154,155
free energy for folding.150 One solution to this problem of the bimolecular
There are many other examples of proteins that lose association of the fragments is to perform a circular per-
portions of their polypeptide, usually from the amino ter- mutation158 of the protein. By genetic manipulation, the
minus, after they have folded. This is so common that the coding sequence for the protein in the DNA is severed at
term proprotein is used to designate the longer polypep- a particular position, and the portion to the 5¢ side of the
tide that folds, with the implication that the cleaved, break is moved to the 3¢ end of the remainder of the
mature native state is designated as the protein. Familiar coding sequence. The 3¢ end of the 3¢ fragment is joined
examples of this designation are proinsulin, proalbumin, in phase to the 5¢ end of the 5¢ fragment with a linking
and prothrombin. sequence of DNA encoding a segment of polypeptide
Fragments of a polypeptide, each lacking sufficient long enough to connect comfortably the carboxy termi-
information to fold separately, can sometimes cooperate nus of the original unpermuted protein to the amino ter-
to produce the proper native state. The first example of minus of the original unpermuted protein.159
this was the ability of the amino-terminal fragment of Consequently, a protein is eligible for circular permuta-
ribonuclease (Lysine 1–Alanine 20), which is almost tion only if its amino terminus and its carboxy terminus
structureless in isolation,151 to reassume its native struc- are near each other in its crystallographic molecular
ture as an a helix when combined with the remainder of model so that after the circular permutant has been
the polypeptide (Serine 21–Valine 124).152 Both the frag- expressed and has folded properly, its former carboxy
ment Alanine 1–Arginine 126 and the fragment Glycine terminus and former amino terminus can be joined by a
49–Glutamine 149 of micrococcal nuclease (naa = 149) are continuous stretch of polypeptide.
structureless in isolation.142,153 When they are mixed Following circular permutation, there is a break in
together, however, they combine with each other to form the polypeptide elsewhere in the native structure of the
two different forms of the native state that both appear to protein, formally equivalent to the break that would oth-
be properly folded but together have only 10% of the erwise produce two fragments of the protein, but the
nuclease activity of the native enzyme.153 polypeptide is continuous from the former carboxy ter-
Higher yields of enzymatic activity have been minus to the former amino terminus. If the break is
observed upon combination of fragments of ribonucle- placed at a position in its amino acid sequence known to
ase from B. amyloliquifaciens (fragments of 36 and 74 aa; be a disordered loop, the circularly permuted protein
yield 30%),154 fragments of phosphoribosylanthranilate will usually fold, display almost normal enzymatic activ-
680 Folding and Assembly
ity or biological function, assemble into the native oligomers clearly states that one piece of information
oligomer, and have similar standard free energies of fold- that has nothing to do with the folding of a protein is the
ing to that of the wild type.159–163 In such a situation, the order in which its amino acids emerge from the ribo-
disordered loop is broken by the new amino and carboxy some. If there are portions of the protein that do fold
termini, and the former carboxy and amino termini, before the complete protein emerges, those portions are
which are usually disordered anyway, are now joined not required to fold before the complete protein
together, to prevent the two fragments of the original emerges. It has been suggested, however, that if a protein
protein from dissociating when it is unfolded. has domains, each domain might be required to fold as it
Circular permutation can be used to examine the emerged from the ribosome during biosynthesis before
information necessary to fold. The position at which the the next emerged. There is no evidence in favor of this
amino acid sequence of the wild-type protein is broken conjecture, and proteins containing two or more
to produce the new amino terminus or carboxy terminus domains undergo reversible folding as readily as pro-
can be varied at random, and circular permutants that teins with only one domain.44,167–169
are still enzymatically active can be selected genetically. If the reaction producing the unique native state of
In almost all of the enzymatically active circular permu- a folded polypeptide is an isomerization between the
tants of aspartate carbamoyltransferase from E. coli, the random coil and that native state, the individual contri-
new carboxy and amino termini were found in segments butions to the overall standard free energy change for
of the polypeptide between a helices and b structure in this isomerization determine its outcome. Neither the
the crystallographic molecular model of the wild-type formation of a hydrogen bond between a donor and an
protein.164 This result seemed reasonable at the time acceptor in the random coil nor the formation of an ionic
because such elements of secondary structure probably interaction between a positively charged side chain and
could not form if there were a discontinuity within them. a negatively charged side chain in the random coil can
When a similar analysis, however, was made of random provide any net favorable standard free energy for the
circular permutants of thiol:disulfide interchange pro- folding of a protein in aqueous solution. In fact, their for-
tein dsbA from E. coli, the majority of the new amino and mation would be unfavorable. Nor can van der Waals
carboxy termini of the enzymatically active permutants forces make any contribution because the isomerization
were located at positions in the sequence of amino acids occurs in a condensed phase. Therefore, by exclusion
that in the wild-type protein are a helices or b structure. and perhaps for the lack of a better candidate, the
Four of the nine a helices and three of the five b strands hydrophobic effect has attracted the most attention in
could be interrupted, and the resulting circular permu- discussions of the folding of a polypeptide.170 The
tants folded and were enzymatically active.165 hydrophobic effect provides favorable standard free
Another approach is to place systematically the new energy for the formation of the native state because
carboxy and amino termini at each position in the amino hydrophobic side chains, which are exposed to water in
acid sequence of the protein and measure the enzymatic the random coil, are removed to the interior of the pro-
activity and standard free energy of folding for each of tein during the folding.171
the resulting circular permutants. When such an analy- One of the major deficits of standard free energy in
sis166 was performed on dihydrofolate reductase from the folding of a protein results from the requirement to
E. coli (naa = 159), a set of 10 segments varying in length unsolvate those hydrophilic functional groups destined
from 2 to 14 aa could be identified, the interruption of for the interior. This loss is due to the fact that water par-
which by introducing new amino and carboxy termini at ticipates in strong interactions with donors and accep-
any position led to a protein incapable of folding and tors of hydrogen bonds and charged functional groups
enzymatically inactive. Placing the interruption at and to the fact that when charged side chains are with-
almost any one of the 87 positions outside these 10 for- drawn from water they are usually neutralized first. The
bidden regions gave a circular permutant that could fold removal of even neutral hydrogen-bond donors from
to produce an enzymatically active protein. As with water, even though they may always find an acceptor in
thiol:disulfide interchange protein dsbA, many of the the interior of the protein, is a significantly endothermic
permissive positions were within segments that are a he- transfer.172 It has already been noted, however, that the
lices or b strands in the crystallographic molecular model formation of a hydrogen bond between an acceptor and
of the wild-type protein. The forbidden regions, how- a donor on a side chain, in the context of a folded
ever, also failed to correlate with elements of secondary polypeptide, is usually favorable with a standard free
structure. These results suggest that the information nec- energy of formation of around –5 kJ mol–1 (Table 6–6).
essary to fold a polypeptide may be distributed over its For example, in 52 instances in which a tyrosine was
sequence of amino acids by rules that are not immedi- mutated to a phenylalanine, the standard free energy of
ately obvious. folding increased by 6 ± 4 kJ mol–1 when the tyrosine was
The fact that many, if not most, of the circular per- involved in a hydrogen bond in the crystallographic
mutants of a protein can fold to produce enzymatically molecular model of the protein but showed no change
and biologically active proteins and even the proper when it was not. In 40 instances in which a threonine was
Thermodynamics of Folding 681
mutated to a valine, the standard free energy of folding entropy of the random coil, dictated by the sum over all
increased by 4 ± 4 kJ mol–1 when that threonine was of its states, should be very large because each amino
engaged in a hydrogen bond in the crystallographic acid has at least the two dihedral angles, y and f (Figure
molecular model but showed no change when it was 6–2), each of which can assume a number of values as
not.173 dictated by the Ramachandran plot (Figure 6–4). This ini-
The reason that these hydrogen bonds between tial intuition, however, neglects excluded volume.175
side chains in the native state have the modestly favor- Excluded volume designates the qualification that every
able free energies of formation that they do is approxi- configuration of the random coil in which two or more
mation. The hydrophobic effect drives the condensation atoms would occupy the same space at the same time is
of the random coil that unavoidably withdraws donors impossible and thus cannot contribute to the configura-
and acceptors in the backbone of the polypeptide from tional entropy. This is the consequence of the steric
contact with water. These donors and acceptors then effects that produce the Ramachandran plot itself oper-
combine to form the hydrogen bonds that define the sec- ating over the whole polymer rather than just between
ondary structure. These hydrogen bonds form because neighboring amino acids. Excluded volume makes a
these donors and acceptors can no longer participate in large contribution to diminishing the configurational
hydrogen bonds with water and must do so among them- entropy of the random coil. For a polypeptide 100 amino
selves. a Helices and b structure appear not because they acids in length, a set of configurations could be gener-
are beautiful (Figures 6–6 and 6–9) but because they are ated by randomly assigning values to the dihedral
an efficient way to provide an acceptor to most if not all angles y and f within their allowed ranges. The number
of the donors pulled out of the water by the condensation of these configurations that do not superpose two or
driven by the hydrophobic effect. The proper packing of more atoms in the polypeptide has been estimated to be
the secondary structure then can juxtapose the donor on only 10–44 of the total number of randomly generated
a side chain with an acceptor. It is this approximation, configurations.176
brought about by the complete, cooperative process of Even though a consideration of excluded volume
folding, that is the only reason the resulting hydrogen remarkably decreases the number of configurations
bond has a favorable standard free energy of formation available to the random coil, there are still a large
relative to the separated donor and acceptor in the number of configurations that are accessible. Only a
random coil. small number of these configurations constitute the
Because the realization of this favorable standard compact native state of the folded polypeptide. For
free energy of formation results from approximation, the native state to be stable relative to the random coil,
there are significant geometric requirements for its the configurational entropy resulting from the sum over
favorability. In addition, if too many of the hydrophobic all of the allowed unfolded configurations must be over-
groups on the side chains in the interior were replaced come by the hydrophobic effect realized upon the for-
with properly aligned donors and acceptors to exploit mation of the native state.
these favorable increments in standard free energy of The presence of a cystine in a folded polypeptide
formation, the polypeptide could not fold in the first makes a contribution to the change in configurational
place.3 These are among the reasons that there are few entropy for the isomerization between random coil and
such hydrogen bonds involving a donor on a side chain native protein. The polypeptide must first fold before the
in the interiors of proteins.171 Those few hydrogen bonds cysteines juxtaposed by the folding can be oxidized to
between side chains that are found are the result of evo- cystines.177 Because the folded native state is a prerequi-
lution by natural selection so it is not surprising that they site for the formation of a proper cystine and because the
have favorable free energies of formation. formation of a naturally occurring cystine usually has
In addition to the unfavorable standard free energy little effect on the structure or conformational freedom
of transfer associated with the dehydration of of the native protein,177–179 it necessarily follows that the
hydrophilic functional groups as they are pulled into the cystine itself cannot significantly change the intrinsic
interior of the folded polypeptide,174 the other major configurational entropy of the properly folded protein
deficit that must be overcome during the folding process and can change its intrinsic standard enthalpy only by
is the configurational entropy of the random coil. This the standard enthalpy of formation of the cystine. It has
is the positive, intrinsic standard entropy that arises from been demonstrated, however, that the standard enthalpy
the fact that the random coil can assume a large number of formation for cystine in a random coil is about the
of different configurations. It represents a deficit during same as that estimated for the standard enthalpy of for-
the folding of the polypeptide because the native state, to mation of the same cystine in the native state.180
a first approximation, assumes only a few conforma- Consequently, the incorporation of a cystine cannot
tions. Therefore, when the random coil becomes the affect the change in standard enthalpy of folding either.
native state its configurational entropy almost disap- Rather, a cystine between two cysteines that are adjacent
pears. in the native structure increases the value of the equilib-
At first glance, it seems that the configurational rium constant of the folding and decreases the change in
682 Folding and Assembly
standard free energy of folding of a protein because it temperature. The differences in standard free energies of
decreases the configurational entropy of the random folding varied from –3 to –14 kJ mol–1, and the difference
coil. increased monotonically as the distance between the
The decrease in standard free energy of folding can cysteines, and hence the length of the loop, increased.
be demonstrated experimentally by introducing a spe- A theoretical treatment of the expected decrease in
cific cross-link between two adjacent amino acids into configurational entropy caused by cross-linking a
the native structure of a protein and determining its random coil, which accounts for excluded volume, pre-
effect on its folding. Glutamate 35 and the enol tautomer dicts that the configurational entropy should decrease
of the oxindole produced by the oxidation (Reaction linearly with the natural logarithm of the distance
10–37) of Tryptophan 108 in lysozyme from G. gallus between the cross-linked positions with a slope of 2.4.186
form an ester: When the experimental values of (DDG∞Fd,SS)/T are plot-
ted against the natural logarithm of the distance, in
108
35
number of amino acids, between the cystines in each of
the mutants of T4 lysozyme, the expected relationship is
observed. Furthermore, the difference in standard free
O
O energy of folding between native a-lactalbumin and
N a-lactalbumin in which the cystine between Cysteine 6
H and Cysteine 120 has been reduced falls on the same
13–3 line.187
The effects of introducing cystines into other pro-
This ester introduces an intramolecular cross-link teins, however, differ significantly from those measured
between these two amino acids181 that are adjacent to in these observations. In some instances the magnitude
each other in the crystallographic molecular model of the of the difference in standard free energy of folding is
protein. The cross-linked lysozyme has a standard free much less than expected;188 in others, much more.189 The
energy of folding at pH 2 in 2 M guanidinium chloride at magnitudes of the differences in standard free energy of
62 ∞C that is 22 kJ mol–1 less than that of un-cross-linked folding for lysozyme from G. gallus cross-linked through
lysozyme.180 Lysine 7 and Lysine 41 in bovine the oxindole ester (13–3) and bovine ribonuclease A
ribonuclease A can be cross-linked specifically by cross-linked by 2-(p-nitrophenyl)-3-(3-carboxy-4-nitro-
2-(p-nitrophenyl)-3-(3-carboxy-4-nitrophenyl)thio- phenyl)thio-1-propene are also greater than those
1-propene (Figure 10–8).182 The difference in the stan- observed with cystines introduced into lysozyme from
dard free energy change of folding between the cross- bacteriophage T4.
linked and un-cross-linked ribonuclease at pH 2.0 and The equilibrium constant for the folding of the con-
40 ∞C183 is –21 kJ mol–1. stant fragment of the light chain of an immunoglobu-
Many studies have incorporated single cystines by lin G, CL (Figure 11–1), at pH 7.5 and 25 ∞C in solutions of
site-directed mutation between positions in the amino guanidinium chloride is decreased when its single cys-
acid sequence of a protein that are adjacent to each other tine is reduced. All of the change could be accounted for
in its crystallographic molecular model. For example, by the fact that the observed rate constant for folding (kF
cystines have been introduced into lysozyme from bac- in Equation 13–1) of the random coil with the cystine was
teriophage T4 (naa = 164). This protein has no cystines to 100-fold greater than the observed rate constant for the
begin with, so each mutant contained only one cross-link random coil without the cystine.190 This is consistent
in its polypeptide.184,185 In one study, four different with the conclusion that the intact, correct cystine
mutants were made containing cystines cross-linking decreases the configurational entropy of only the
positions 29, 34, 121, and 155 aa apart. In another, the random coil, while retaining access to the properly
same two site-directed mutants containing cystines folded structure, and permits the random coil to fold
cross-linking positions 121 and 155 aa apart were used more rapidly. Whether or not the proper cystine was
and a third, cross-linking positions 94 aa apart, was present had no effect on the observed rate constant of
made; and each of these mutants was submitted to the unfolding (kU in Equation 13–1).
same circular permutation to produce permutants of It has been shown that if the favorable noncovalent
T4 lysozyme with cystines cross-linking positions 49, 15, standard free energies of association between a subpop-
and 76 aa apart, respectively. Together, these manipula- ulation of the monomers along a polymer are signifi-
tions gave eight mutants of the same protein, each with a cantly more negative than their individual standard free
cross-link producing a covalent loop in the denatured energies of solvation, the polymer should spontaneously
polypeptide of a different length. For each mutant, the collapse to a globular form.191 Because the constraints of
difference in the standard free energy of folding excluded volume are even more extreme in this compact,
(DDG∞Fd,SS) between the protein with the cystine intact globular form, the number of accessible configurations
and the protein with the cystine cleaved by disulfide and hence its configurational entropy should be much
interchange (Figure 3–20) was estimated from its melting smaller. Because the hydrophobic effect is the only inter-
Thermodynamics of Folding 683
action capable of producing significant net favorable amino acids usually contain several cystines, are
standard free energies of association among the side oligomeric,196,197 or have a relatively large hydrophobic
chains of the amino acids in a random coil, it is generally core.198 There are, however, a few small domains, for
assumed that the noncovalent force that would perform example the WW domains (35 aa)199 or the peripheral
the condensation leading to a globular state of a subunit-binding domain from dihydrolipoyllysine-
polypeptide is the hydrophobic effect exerted upon the residue acetyltransferase (41 aa),200 that fold to form
hydrogen–carbon bonds in those side chains in the stable monomeric, well-defined native structures lacking
polypeptide. This view of the folding of a polypeptide cystines.
could be called the condensation model. Its central pro- Although it was only for the sake of the computa-
posal is that the collapse of the random coil to a con- tions that the folding of the polypeptide described in this
densed state decreases the configurational entropy of the condensation model was divided into the two steps of
polypeptide dramatically and narrows the search for the condensation and reconfiguration, there are stable, con-
native state to a much smaller number of accessible con- densed but fluidly unstructured states of a polypeptide
formations. that seem to have the properties required of a condensed
On the basis of this model, folding can be treated state on its way to the native state. These are the molten
theoretically as a process in which the unfavorable loss of globules. A molten globule is a state of a polypeptide in
the configurational entropy of the random coil is bal- which it has collapsed to a globular particle from the
anced only by the favorable removal of hydrophobic side expanded random coil but remains fluid with a con-
chains from contact with the aqueous phase.176,192 The stantly changing conformation rather than achieving the
statistical treatment of the random coil developed by limited set of conformations that is the native state. In
Flory,193–195 which takes account of excluded volume and such a fluid condensed state, the configurational entropy
the solvation of the monomeric units, can be expanded176 of the polypeptide should be significantly reduced rela-
to include the hydrophobic effect exerted during the tive to that of the random coil, and only a much smaller
sequestration of the monomers in the condensed state195 number of conformations that avoid the problems of
and the fortuitous sequestration of the monomers in the excluded volume should be accessible.191 Many of these
random coil,176 as well as the much smaller, but still sig- conformations should display a helices and b structure
nificant, configurational entropy of the condensed that form spontaneously.201
polypeptide before it assumes the native state. Under conditions that differ significantly from
The process of folding is divided into two imaginary those in the living system in which a particular polypep-
steps,176 not necessarily related to the actual steps. These tide has evolved to fold, its native state may no longer be
imaginary steps are the condensation of the random coil the most stable of the condensed conformations accessi-
to a globular structure excluding water and the reconfig- ble to that polypeptide, and a number of other con-
uration of the polymer in this condensed state to maxi- densed, structured conformations may be as stable.
mize the exposure of hydrophilic groups and minimize Peculiar conditions such as low pH or the presence of
the exposure of the hydrophobic groups to the water. It is denaturants, however, are necessary to prevent the
during this reconfiguration following the condensation polypeptide from assuming its native state, as it would do
that the donors and acceptors for hydrogen bonds that normally. It is argued that the intermediates detected in
have been withdrawn away from the acceptors and the folding of proteins under several such circumstances
donors of the water form hydrogen bonds among them- are examples of molten globular states and that all of
selves to produce the a helices and b structure observed these various intermediates represent a single configura-
in the final native state of the protein. Before the con- tional state assumed by a polypeptide that is at least as
densation, water formed hydrogen bonds with those distinct as that of the random coil. This may be an over-
donors and acceptors. statement. For example, two different molten globular
With reasonable values both for the hydrophobic states of apomyoglobin have been distinguished,202 and
effect on the average hydrophobic amino acid (–8 kJ there are intermediate states that do not have the prop-
mol–1) and for the fraction of the amino acids in the erties assigned to a molten globule.203
polypeptide that are hydrophobic (0.50), the formation Stable intermediates believed to be molten glob-
of a unique globular state from a random coil should pro- ules have been detected under many different circum-
ceed with net negative standard free energy change for stances. They have been observed for a-lactalbumin204
polypeptides greater than about 70 amino acids in below pH 4.5 at concentrations of guanidinium chloride
length.192 Polypeptides less than about 70 amino acids in below 2.5 M; for a-lactalbumin,205 stripped of bound
length should not fold because they should not be able to Ca2+, at pH 8 and guanidinium chloride concentrations
bury a large enough number of hydrophobic amino acids between 0.5 and 2.0 M; for cytochrome c206,207 below
to overcome the configurational entropy of their random pH 3 either at chloride concentrations greater than 0.1 M
coils. It is the case that small, folded, cystineless, or at concentrations of O-a-D-glucopyranosyl(1–3)-
monomeric proteins of less than 70 amino acids are quite b-D-fructofuranosyl-a-D-glucopyranoside greater than
rare. Proteins composed of polypeptides shorter than 70 0.5 M; and for carbonate dehydratase64 at temperatures
684 Folding and Assembly
below 60 ∞C and values of pH less than 3.5. The mutation protons do involve some expansion of the structure into
of Phenylalanine 173 to an alanine in murine interleu- the solvent, but the effect, and hence the expansion, is
kin 6 converts the protein into a molten globule.208 These much less than when the native state unfolds to the
are all unphysiological conditions, but proteins have random coil. Three of the eight a helices of the native
evolved to be entirely in their native states under physio- state of apomyoglobin216 are present in its molten glob-
logical conditions. Consequently, it would not be sur- ule,65 but site-directed mutations at the interfaces in the
prising that, to isolate intermediates in the normal native state between these a helices have little effect on
process of folding, such peculiar conditions would be the stability of the molten globule.217 It was concluded
required. Molten globules often become stable relative to that although these a helices had formed, they were not
the native state at low pH. Presumably, their fluidity per- packed against each other in any stable arrangement.
mits carboxy groups that are rigidly buried in the native The accessibility of tryptophans to the solvent, as
state to reach the surface of the globule and be exposed judged by quenching of fluorescence (Equation 12–41),
to the solvent, thereby lowering the standard free energy differs significantly in these molten globules, and trypto-
of the molten globule relative to that of the native struc- phans that are relatively more exposed in the native state
ture (Equation 13–3). become less exposed.218 The fluorescence intensities of
Various physical measurements have been made of the tryptophans in cytochrome c, which are quenched by
these stable intermediates identified as molten globules. the nearby heme in the native state but are fully expressed
The circular dichroic absorptions of the native protein in the random coil, remain quenched in its molten glob-
between 260 and 290 nm are largely lost in the molten ule.210 The intrinsic viscosity, rotational relaxation
globule, and this loss must result from the disappearance times, and diffusion coefficients of the molten globules
of the unique asymmetric environments around trypto- are indistinguishable from those of the corresponding
phans, tyrosines, and phenylalanines.66,209 The complex native states but are different from those of the random
nuclear magnetic resonance spectrum of the native coil.209,210 All of these observations demonstrate that they
state becomes much simpler and much more like that of are condensed, globular structures like the native state.
the random coil upon formation of these molten glob- It is thought that these intermediates identified as
ules,210,211 as would be expected if the unique environ- molten globules represent the random coil that has col-
ments around each amino acid had been lost and each lapsed to a globular state because of the hydrophobic
side chain now sampled continuously a broad range of effect, even though it is fluid and cannot assume the
changing environments. When the internal dynamics of unique set of conformations that is the native state. In
one of these molten globules are examined by quasielas- this regard, it is interesting that the majority (85%) of the
tic neutron scattering,212 it is observed that the potential change in standard heat capacity between the random
barriers to bond rotations in the side chains are lower coil and the native state of a-lactalbumin, which is a sig-
than those in the native state, while diffusive motions of nature of the hydrophobic effect, is experienced in the
side chains are greater, and significantly smaller units of transition between the random coil and the intermediate
structure diffuse cooperatively than those diffusing in that has been characterized as a molten globule.205 The
the native state. Measurements of the absorption of equilibrium constant for the formation from the random
ultrasound also indicate that such molten globules are coil of an intermediate thought to be a molten globular
more fluid than the native state,213 and conformational state of apomyoglobin displays a significant temperature
relaxations in the interior that occur in the 2 MHz range dependence, passing through a maximum between 0
are significantly enhanced. and 20 ∞C. This observation is also consistent with a
The majority of the circular dichroic absorption process accompanied by a large change in standard heat
between 200 and 240 nm seen in the native states is capacity,202 but in this case, only about 50% of the over-
retained in the respective molten globules, and this sug- all change in standard heat capacity is realized in the
gests that they contain a helices and b structure.66,209 The transition from random coil to molten globule.
slow proton exchange observed by nuclear magnetic If these intermediate, molten globular states resem-
resonance for buried peptide bonds in the native state ble intermediates on the normal kinetic pathway
increases by factors of 1000–100,000 upon the transition between the random coil and the native state, then the
to one of these molten globules, even though the condensation model for the folding of a polypeptide may
a amido protons of many of the same amino acids be an accurate rendition of the process. In this descrip-
remain relatively less accessible.65,214 These observations tion of folding, the random coil spontaneously collapses
suggest that some of the same elements of secondary under the influence of the hydrophobic force to form a
structure but not all of them215 remain at the same loca- condensed state that would be a molten globule. This
tions in the amino acid sequence of the polypeptide but molten globule would fluidly sample the limited number
open up 1000–100,000-fold more often. These acceler- of conformations available to the condensed polymer
ated rates of exchange are increased much more by until the native state, the set of conformations of lowest
adding denaturants138 so the conformational changes standard free energy, was encountered.
within the molten globule leading to exchange of amido The alternative to the condensation model for the
Thermodynamics of Folding 685
folding of a polypeptide could be referred to as the nucle- that assumes at equilibrium a structure containing the
ation model. In this view of the process, a short segment a helices that are present in its portion of the native
of the polypeptide or several short segments would spon- structure of the protein has been proposed to represent
taneously assume a metastable conformation similar to a point of nucleation for the native state even though it is
the conformation of that short segment or those short by itself a molten globule.223 These observations suggest
segments in the complete native state. This nucleus for that protein folding involves both condensation and
folding would resemble the conformation of the native nucleation, but not necessarily in that order. The ques-
state in both its secondary and tertiary interactions in tion of the sequence of events in the folding of a polypep-
this restricted region, and it would represent the most tide requires kinetic observations of the process.
independently stable region of the native state. From this
nucleus, folding would rapidly spread to produce the Suggested Reading
entire native structure. Evidence for this proposal comes
from the study of short segments of polypeptide that can Salahuddin, A., & Tanford, C. (1970) Thermodynamics of the denat-
assume structured states other than the random coil and uration of ribonuclease by guanidine hydrochloride,
from stable expanded states of some polypeptides. Biochemistry 9, 1342–1347.
Although almost all short segments of polypeptide Taniuchi, H., & Anfinsen, C.B. (1971) Simultaneous formation of
have proven to be structureless, a few have been found two alternative enzymatically active structures by complemen-
tation of two overlapping fragments of staphylococcal nuclease,
that assume a structured state. For example, two pep- J. Biol. Chem. 246, 2291–2301.
tides from bovine pancreatic trypsin inhibitor (naa = 58),
Perl, D., Welker, C., Schindler, T., Schroder, K., Marahiel, M.A.,
Arginine 20–Phenylalanine 33 and Asparagine 43–Ala- Jaenicke, R., & Schmid, F.X. (1998) Conservation of rapid two-
nine 58, were chemically synthesized and joined by state folding in mesophilic, thermophilic and hyperther-
forming the cystine between Cysteine 30 and Cysteine 51 mophilic cold shock proteins, Nat. Struct. Biol. 5, 229–235.
that occurs naturally in the native protein. This covalent
complex, containing only half of the covalent structure of Problem 13–1: As urea is added to a solution containing
the full-length protein, nevertheless formed a struc- a protein in its native state, the protein usually begins to
ture219 that had some of the structural features assumed unfold when the concentration of urea rises above
by this region in the crystallographic molecular model of 4–5 M. This unfolding is due to the ability of urea to sta-
the protein. The antiparallel b sheet could be discerned bilize the unfolded state.
in the nuclear magnetic resonance spectrum but not the Consider the side chain of an amino acid that is
a helix. The short, stable a-helical segments of polypep- located in the interior of a protein and cannot see the sol-
tide discussed earlier have also been proposed as models vent when the protein is folded. From the point of view
of nucleation points in protein folding.220 of this interior side chain, the following series of equilib-
There are also stable conformations of a few ria govern the unfolding process:
polypeptides, observed under circumstances promoting
denaturation, in which condensation has not occurred random coil
but elements of structure resembling those in the native in water
state have formed. For example, there is an expanded DGªtransfer, interiorÆH2O
ú
(A) How are these three values of DG∞ related? What (E) Plot (!DG∞transfer,H2OÆ[urea]/![urea]) against the
sign must each carry to explain the unfolding number of hydrogen–carbon bonds in each side
caused by urea? chain. Is the major effect of urea to counteract the
hydrophobic effect? Why?
The following is a table7 of the solubilities of a series of
amino acids in solutions of several concentrations of The accessible surface area of each of these side
urea at 25 ∞C. chains has been calculated by a computer from molecu-
lar models.
solubilities [g (100 g of solvent)–1]
at noted concentration of urea surface area of side chain
amino
acid 0M 2M 4M 6M 8M model (nm2)
(B) Calculate DG∞transfer,H2OÆ[urea] for each model com- (F) Plot (!DG∞transfer,H2OÆ[urea]/![urea] against accessi-
pound in the units of joules mole–1. Subtract ble surface area, labeling each point on your curve
DG∞transfer,H2OÆ[urea] for glycine to estimate to keep track of the side chain it represents. What
DG∞transfer,H2OÆ[urea] for each side chain.7,8 is the value of !(!DG∞transfer,H2OÆ[urea]/![urea])/
!(surface area)?
These values you have just calculated are tabulated below.
Measurements of a physical property displayed by a pro-
DG∞transfer,H2OÆdenaturant tein can be used to obtain a value of the equilibrium con-
(cal mol–1) stant for the transformation between the native state and
urea GdmCl the random coil at different concentrations of a denatu-
rant. From each of these equilibrium constants, the stan-
side chain 2 M 4M 6M 8M 1M 2M 4M 6M
dard free energy of folding, DG∞Fd,[denaturant], for the
Ala –0 +15 +10 +10 –10 –20 –30 –45 reaction at that concentration of denaturant can be cal-
Vala –60 –85 –125 –160 –85 –115 –195 –265 culated. The figure on the next page is an example of the
Leu –110 –155 –225 –295 –150 –210 –355 –480 relationship between DG∞Fd,[denaturant] and the concentra-
Ilea –100 –140 –205 –265 –135 –190 –320 –430
Met –115 –225 –325 –415 –150 –245 –400 –535
tion of denaturant for the unfolding of lysozyme pro-
Phe –180 –330 –470 –600 –215 –355 –580 –775 moted by urea and guanidinium chloride.
Tyr –225 –395 –580 –735 –235 –385 –605 –770 The slopes of these two lines (!DG∞/![denaturant])
Trp –270 –505 –730 –920 –400 –630 –980 –1,235 are relative measures of the effectiveness of the two
Proa –75 –105 –155 –200 –100 –140 –240 –320 denaturants. By fitting a straight line to these data it is
Thr –40 –60 –90 –115 –65 –90 –120 –125 possible to obtain, by extrapolation, the standard free
His –100 –160 –205 –255 –180 –285 –385 –420 energy of folding in the absence of denaturant, DG∞Fd,H2O.
Asn –135 –225 –330 –430 –200 –320 –490 –645
The following table gathers the results from four separate
Gln –80 –130 –190 –230 –135 –215 –315 –360
proteins, where [GdmCl]1/2 or [urea]1/2 is the concentra-
a
The values for these side chains are estimates based on results for
tion of denaturant when [F] = [U] and DG∞Fd = 0.
the other side chains and on results at a single concentration of
denaturant. guanidinium chloride urea
[GdmCl]1/2 DG∞Fd,H2O [urea]1/2 DG∞Fd,H2O
(C) Plot DG∞transfer,H2OÆ[urea] against [urea] for each of protein (M) (kJ mol–1) (M) (kJ mol–1)
these side chains. Determine the slopes of these bovine ribonuclease A 3.01 –39 6.96 –32
lines that give values for (!D G∞/![urea]) in joules lysozyme G. gallus 3.07 –24 5.21 –24
(mole of side chain)–1 [liter (mole of urea)–1]. bovine chymotrypsin 1.90 –32 4.04 –35
ovine b-lactoglobulin 3.23 –52 5.01 –44
(D) How do these numbers correlate with your expec-
tations in part A? Explain why the protein unfolds (G) From an examination of the figure and an under-
when [urea] rises above a certain critical level. standing of where the two points tabulated fall
Thermodynamics of Folding 687
In the native protein (0.0 M guanidinium chloride), four random coil to one heavily in favor of the folded state
absorptions from the protons on carbons 2 are observed. (Figure 13–1).*
They have been assigned to Histidines 48, 119, 12, and Often the folding of the polypeptide is complete
105, the four histidines of ribonuclease. within a few seconds so the dilution must be performed
(A) Why does each absorption have a unique position rapidly. Usually, the solution of the random coil at a high
in the spectrum of native ribonuclease? concentration of denaturant is mixed with around 10 vol-
umes of aqueous buffer of the appropriate ionic strength
(B) Why is there only one absorption from the pro- and pH in a rapid mixing chamber. The chamber is
tons on the carbons 2, which integrates as four designed to mix the two solutions completely in less than
protons from the protein, when it is dissolved in a millisecond as they are forced through it at high veloc-
3.0 M guanidinium chloride? ity under considerable pressure. There are several differ-
(C) Between 0.0 and 1.7 M guanidinium chloride the ent ways in which the solution emerging from the mixing
resonances shift around, but above 1.7 M the four chamber can then be monitored. Usually, a cuvette† is
absorptions coalesce into the one absorption. attached to the mixing chamber, and the mixture from
What process is the spectrometer monitoring the chamber is passed through the cuvette at the high
between 1.7 and 3.0 M guanidinium chloride? velocity developed in the mixing chamber until it fills
(D) What would be the position of the absorption from the cuvette uniformly. Once a steady state is reached, the
the protons on the carbons 2 of Na-acetylhistidine flow is abruptly stopped, and changes that occur in
ethyl ester in 3.0 M guanidinium chloride? the solution in the cuvette after cessation of the flow are
monitored. The mean time the solution in the cuvette
Problem 13–4: Below are listed several thermodynamic has spent between being mixed and the cessation of flow
parameters that are involved in the process of protein coincident with the initiation of the monitoring is the
folding. dead time of the apparatus. No measurements can be
made of events that occur during the dead time. In most
(a) Change in standard free energy for the hydropho- cases, the dead time of such a stopped-flow apparatus is
bic effect 1–50 ms. Changes in the absorbance, molar ellipticity, or
(b) Standard free energy of formation for hydrogen fluorescence of the solution can be monitored continu-
bonds ously from the dead time onward.
(c) Change in standard electrostatic free energy Often, during the folding of a protein, significant
(d) Configurational entropy of the random coil changes in absorbance, fluorescence, or molar ellipticity
(e) Configurational entropy of the native state or two or three of these properties occur within the dead
(A) Which one is most affected by the steric con- time of the apparatus. Type I ribonuclease H from E. coli
straints described in the Ramachandran plot? displays such behavior (Figure 13–9).225,226 When its
molar ellipticity at either 220 nm (Figure 13–9A) or
Suppose proteins were held together by imine linkages 292 nm (Figure 13–9B) is monitored, the signal observed
rather than peptide bonds: after flow has stopped decays to the value for the native
fC state in a single, apparently first-order relaxation
H a
(Equation 13–13) with a rate constant of 0.6 s–1. When
C N
y H these time courses are extrapolated through the dead
Ca time back to the instant of mixing, however, it can be
(B) What effect would this have on the parameter you seen that 83% of the change in molar ellipticity at 220 nm
have chosen above? and 44% of the change in molar ellipticity at 292 nm did
not occur during this apparently homogeneous transfor-
(C) How would the value of the standard free energy mation but in one or more kinetic steps that were much
of folding DG∞Fd be affected by this change? more rapid than the final isomerization. These steps
occurred within the dead time of the apparatus, and,
consequently, they could not be resolved.
Kinetics of Folding A reaction that is unresolved in a stopped-flow
experiment because it is complete during the dead time
The most straightforward way to initiate the folding of
the random coil of a polypeptide that has been unfolded
to a random coil in a concentrated solution of guani- * It is also possible to begin the experiment with a solution of the
dinium chloride or urea is to dilute that solution. The complex between the polypeptide and dodecyl sulfate and then
rapidly strip the dodecyl sulfate from the protein.224 The difficulty
dilution is performed so that the final concentration of
with this approach is that the polypeptide in the complex with
denaturant is well below the region of transition so that dodecyl sulfate is completely a-helical rather than a random coil.
the equilibrium between the folded state and the † A cuvette is a chamber with transparent walls through which
unfolded state is shifted from one heavily in favor of the spectrophotometric measurements can be made.
Kinetics of Folding 689
[q ]292 (deg cm2 dmol –1) [q ]220 (deg cm2 dmol –1)
0
ribonuclease H from E. coli.225 Type I ribonuclease H (naa = 155) was
dissolved in 3.3 M guanidinium chloride and 10 mM sodium acetate, A
pH 5.5 at 25 ∞C. After it had unfolded completely, it was mixed in a
–5000
stopped-flow apparatus with 10 volumes of 0.65 M guanidinium chlo-
ride and 10 mM sodium acetate, pH 5.5 (final concentration = 0.9 M
guanidinium chloride), and the effluent from the mixing chamber was
monitored following cessation of flow (dead time = 50 ms). (A) Molar –10000
ellipticity at a wavelength of 220 nm ([q]220) in units of degrees cen-
timeter2 (decimole of peptide bond)–1 as a function of time (seconds).
The molar ellipticity of unfolded ribonuclease H at 220 nm in 0.9 M –15000
guanidinium chloride, as determined by extrapolation at equilibrium
as in Figures 13–1A and 13–7A, should be 0. (B) Molar ellipticity at a 20
wavelength of 292 nm ([q]292) in units of degree centimeter2 (decimole B
of peptide bonds)–1 as a function of time (seconds). The molar ellip- 0
ticity of unfolded ribonuclease H at 292 nm in 0.9 M guanidinium –20
chloride, again as determined by extrapolation at equilibrium, should
be 20 (dashed line). (C) Fluorescence of tryptophans in the protein
monitored at emission wavelengths of greater than 300 nm with exci- –60
tation at 280 nm. The scale for fluorescence was not quantified, so the
relative fluorescence of the unfolded protein in 0.9 M guanidinium
chloride was not reported. The three sets of data were fit with first- –100
order relaxations (smooth curves) with rate constants of (A) 0.59 s–1,
(B) 0.74 s–1, and (C) 0.51 s–1 and 1.95 s–1. Reprinted with permission
from ref 225. Copyright 1995 American Chemical Society.
C
Fluorescence
is a kinetic burst. The observation of a kinetic burst is
interpreted to mean that one or more transformations of
the random coil have occurred during the dead time and
that they have produced an intermediate state. This
intermediate state then turns into the native state as the
reaction is monitored over time. In the case of type I 0 2 4 6 8 10
ribonuclease H, this intermediate state becomes the Time (s)
native state in a reaction that appears by measurements
of circular dichroism to display simple first-order kinet- results from the instantaneous changes in the molar
ics with a rate constant of about 0.6 s–1. This latter trans- ellipticity or fluorescence that occur in the random coil
formation is also revealed in the change in fluorescence upon the abrupt decrease in the concentration of denat-
of the solution (Figure 13–9C). urant and that may or may not be able to be corrected for
The kinetics of the folding of apomyoglobin from by linear extrapolation of the values at equilibrium for
Physeter catodon,227 of micrococcal nuclease from these physical properties from beyond the region of tran-
Staphylococcus aureus,228,229 of equine cytochrome c,230 sition, as was done in Figure 13–1A.235 Their magnitude,
of dihydrofolate reductase from E. coli,231 of equine however, is sufficiently large in most cases that they must
b-lactoglobulin,232 and of equine lysozyme233,234 all dis- reflect significant conformational changes in the
play similar kinetic bursts producing intermediate unfolded state that produce real intermediates in the
states that then apparently decay through one or several process of folding.
first-order steps to their native states. The appearance of Within the range of denaturant concentration in
these intermediates during a kinetic burst and their the region of transition where the respective equilibrium
decay during the period of measurement can be detected constants for folding have been shifted into measurable
by their absorbance, by their molar ellipticities in the ranges, most of these proteins display two-state behav-
range of 220–230 nm (the far ultraviolet), by their molar ior in the isomerization of their folding without evidence
ellipticities in the range of 270–290 nm (the near ultravi- for intermediates. Why do intermediates in folding
olet), by their fluorescence, or by the transfer of energy appear at lower concentrations of denaturant? The
between donors and acceptors placed at particular posi- answer lies in the behavior of the observed rate constant*
tions in their amino acid sequences. They are observed
upon rapid dilution to 0.4–0.8 M urea or to 0.3–0.9 M
guanidinium chloride. They are formed usually within * The progress of the folding of a protein monitored spectrophoto-
less than 10 ms at temperatures between 10 and 25 ∞C, metrically can usually be fit by a rate equation for one or more
sequential first-order steps. Even though this fit is probably an
and they then decay at various rates. oversimplification of the actual events, the observed apparently
A certain fraction of the changes in molar ellipticity uncomplicated first-order rate constants obtained by such numer-
or fluorescence that occurs in each of these kinetic bursts ical analysis will be referred to as observed rate constants.
690 Folding and Assembly
of folding, kF, as a function of the concentration of Figure 13–10.* This behavior is characteristic of a change
denaturant. in rate-limiting step† and consequently is consistent
When the logarithm of the observed rate constant with a kinetic mechanism for folding in which there are
for the approach to equilibrium for the folding of type I two or more steps and one or more intermediates.238 At
ribonuclease H from E. coli is plotted as a function of the concentrations of guanidinium chloride between 1.5 and
concentration of guanidinium chloride (Figure 1 M, the folding of type I ribonuclease H (Figure 13–10)
13–10),225,236 two-state behavior is observed in the region has an observed rate constant that is strongly dependent
of transition, where the observed rate constant for on the concentration of guanidinium chloride. At con-
unfolding dominates at high concentrations of denatu- centrations of guanidinium chloride between 0.2 and
rant and the observed rate constant for folding domi- 0.6 M, however, the folding of the protein has an
nates at low concentrations. The logarithm of the observed rate constant that is only weakly dependent on
observed rate constant for folding, however, does not the concentration of guanidinium chloride. At concen-
display a continuous linear decrease below the region of trations of guanidinium chloride between 0.6 and 1.0 M,
transition as is observed in Figure 13–1B. Instead, its the change in rate-limiting step occurs from a step
behavior is resolved further into two components, one strongly dependent on concentration of denaturant to a
dominant at intermediate concentrations of denaturant later step in the process of folding that is weakly depend-
and the other at low concentrations. These two steps are ent.
distinguished by the two different slopes of the two dis- The two observed rate constants for the two respec-
tinct linear segments below the region of transition in tive steps between which the rate limitation shifts are the
rate constant for the production of the intermediate
present following the kinetic burst and the rate constant
for its decay to the native state, respectively. The
observed rate constant for the formation of this interme-
diate, which is defined by the linear segment of greater
1
slope, is strongly dependent on the concentration of
denaturant. Because of this strong dependence, above a
k obs (s –1)
Bacillus stearothermophilus,12 cytochrome c2 from lin.246 The rotational relaxation time of 1-anilinonaph-
Rhodobacter capsulatus,241 and human lysozyme.242 thalene-8-sulfonate tightly bound to the intermediate
In addition to explaining the appearance of these formed in the kinetic burst during the folding of dihy-
kinetic intermediates as the concentrations of denatu- drofolate reductase from E. coli is almost identical to that
rant are decreased and providing further evidence for the of the same probe bound to the native state, a result also
existence of one or more intermediates in the folding of suggesting that most if not all of the condensation of the
each of these polypeptides in the absence of denaturant, polypeptide has already occurred in this isomeriza-
these observations of a change in the rate-limiting step tion.247 Increases in energy transfer by resonance
provide a clue about the structures of the intermediates between donors and acceptors covalently attached to
formed during a kinetic burst. Because the observed rate particular positions in a polypeptide also indicate that it
constants for their formation decrease significantly as condenses during one of these isomerizations occurring
the concentration of denaturant is increased while the in a kinetic burst.228
observed rate constants for their conversion to the In addition to being compact, the intermediates
respective native states decrease much less significantly, formed during a kinetic burst contain secondary struc-
each of these intermediates must be a more compact, ture. This conclusion follows from the fact that large
condensed state of the polypeptide than the random changes in molar ellipticity in the far ultraviolet, similar
coil. This follows from the proposal that the slope m of to those accompanying the formation of b structure and
the line relating the logarithm of an observed rate con- a helices (Figure 12–10), usually accompany the burst
stant for the folding of a polypeptide to the concentration (Figure 13–9).226,227,231,233,248,249 A random coil has a slight
of guanidinium chloride or urea (for example, the slopes positive molar ellipticity in the range from 210 to 230 nm
of the linear segments in Figures 13–1B and 13–10) is a while both b structure and a helices have significant,
measure of the change in exposure of that polypeptide to negative molar ellipticities (Figure 12–10), so the changes
the solvent between its initial state and the transition observed are decreases in molar ellipticity in this range.
state of either the rate-limiting step in the transformation In fact, the molar ellipticity of the polypeptide of bovine
being monitored104,243 or of one or more of the rate-deter- b-lactoglobulin at 222 nm actually decreases to a level
mining steps* that together establish the value of the 2-fold lower than that of the native state during the
composite rate constant244 for that transformation or the kinetic burst before increasing to the proper value during
change in exposure of that polypeptide experienced the formation of the native state.232 This result suggests
during an unfavorable preequilibrium that precedes the that extra a-helical secondary structure is transiently
rate-limiting step for that transformation. Consequently, forming in the intermediate and then disappearing as the
either during the rate-limiting step or prior to the rate- native state forms.
limiting step of the transformation occurring during the The positions in the amino acid sequence that par-
kinetic burst in which the intermediate is formed from ticipate in the secondary structure formed in these inter-
the random coil, a significant decrease in the exposure of mediates can be defined by measurements of the
the polypeptide to the solvent must occur. exchange of specific amido protons from the peptide
Further evidence that these intermediates formed backbone. To perform such measurements, the unfolded
during a kinetic burst are compact, condensed forms of polypeptide as a random coil in a concentrated solution
the polypeptide is provided by studies of their scattering of denaturant is passed in turn through a series of mixing
of X-radiation at small angles. It is possible to measure chambers (Figure 13–11).227 In a typical experiment, the
small-angle scattering of X-radiation from a sample in unfolded polypeptide in 1H2O is rapidly diluted into
the cuvette of a stopped-flow apparatus. When the inter- aqueous buffer prepared in 2H2O at a pH low enough to
mediate formed from the polypeptide of apomyoglobin suppress proton exchange for the time being (Figure
during the kinetic burst227 was examined in this way, it 12–31), and folding commences. After various millisec-
was found that the angular dependence of its scattering ond intervals, during which the folding of the protein has
(Figure 12–2) was indistinguishable from that of the progressed normally, the pH of the solution is increased,
native state of the protein but clearly different from that usually to a level greater than 9, in a second rapid mixing
of its unfolded random coil.245 This result indicates that chamber to initiate the rapid and complete exchange of
most if not all of the condensation required to occur all amido protons still exposed to the deuterated solvent
between the random coil and the native state must be (Figure 12–31). The pH and duration of this period of
accomplished within the kinetic burst. Similar results rapid exchange are set so that it is long enough for
were observed for the folding of bovine b-lactoglobu- exposed amido protons to exchange completely but not
long enough for buried amido protons to exchange sig-
nificantly. Finally, in a third rapid mixing chamber the pH
* A rate-determining step in a reaction is any step the rate of which is dropped again to slow the exchange and permit fold-
affects the rate of the overall reaction. In other words, if a step is
rate-determining, an increase or decrease in its individual rate will ing to be completed in the absence of further exchange.
cause a change in the overall rate of the reaction, but not necessar- In the final folded protein, most of the amido pro-
ily of the same magnitude. tons are well protected from further exchange by its sec-
692 Folding and Assembly
ondary and tertiary structure. Consequently it can be in these erased regions may have been protected in the
submitted to two-dimensional nuclear magnetic reso- intermediate but not in the native state. It is also possible
nance spectroscopy (Figure 12–33) for the periods of to identify amido protons on particular amino acids that
time necessary to obtain two-dimensional spectra and participate in hydrogen bonds in an intermediate formed
determine which amido protons had become protected in a kinetic burst by examining the effect of the concen-
from exchange during the time spent folding before tration of denaturant on the exchange of amido protons
rapid exchange was initiated. Those positions in the within the native structure of a protein.253 Again, how-
amino acid sequence of the protein that have lost their ever, the fact that amido protons formed in the interme-
protons during the experiment are those that were acces- diate identified in this way coincide with elements of
sible when the jump in pH occurred; those that have not secondary structure in the crystallographic molecular
are those that had become protected. The times spent in model may be only a consequence of the fact that
the various steps and the levels of pH established in each exchange from the native protein is being monitored.
step of these triple mixing experiments vary,233,250–252 but The fact that all of the positions within a particular
the intentions of initiating folding, of performing rapid element of native secondary structure register similar
exchange after folding has progressed for a certain levels of protection suggests but cannot prove that many
period, and of then locking the information in the native of the same elements of secondary structure found in the
state of the protein remain the same.* native state are formed as discrete units early in the
Although most of the amido protons in an interme- process of folding. There are, however, exceptions to this
diate formed during a kinetic burst exchange rapidly correspondence. For example, only a portion of the
during the respective jumps in pH, many are already pro- amino acids in helix B of apomyoglobin is protected from
tected from exchange by the structures of these interme- exchange in the intermediate formed during the kinetic
diates.226,233,239,249,250 Because amido protons protected burst (Figure 13–11).
from exchange during the kinetic burst are found in seg- Many if not most of the intermediates observed
ments within which each of a string of consecutive posi- during kinetic bursts in stopped-flow experiments are
tions in the amino acid sequence is protected, it is similar if not identical to stable molten globules of the
assumed that these segments of continuous protection same polypeptide that are observed at equilibrium under
represent either a helices or b structure that have already unphysiological concentrations of denaturant, at
formed in the intermediate. unphysiological temperatures, at low pH, in the absence
Proteins during the folding of which intermediates of salt, or with some combination of these perturbations.
do not accumulate in a kinetic burst nevertheless will It has already been noted that, as is a molten globule,
often have similar intermediate states that form more these kinetic intermediates are condensed conforma-
slowly, in the milliseconds following the dead time. For tions of the polypeptide. In addition, by varying the
example, an intermediate forms during the refolding of wavelength at which the kinetic measurements are
cytochrome c with a rate constant of 50 s–1 at 10 ∞C that is made, it has been possible to demonstrate that the molar
compact and contains several elements of secondary ellipticities at several wavelengths, both in the near ultra-
structure but lacks the complete secondary and tertiary violet and in the far ultraviolet, match the values in the
structure of the native protein.251 These intermediates circular dichroic spectrum of a stable molten globular
also have strings of consecutive amido protons protected state of the same polypeptide formed at equilibrium,
from exchange.251,252 usually under acidic conditions (Figure 13–12).226,232
It seems that some of the same secondary struc- Furthermore, the protection factors for the amido pro-
tures found in the final native state of the protein are tons in the peptide backbone buried during the forma-
already assembled in their entirety in these early kinetic tion of an intermediate in a kinetic burst239 are often in
intermediates formed either during a kinetic burst the range of those observed for a molten globule at equi-
(Figure 13–11) or in the period immediately following the librium rather than in the range of the much larger pro-
dead time. To a certain extent, this impression is illusory. tection factors observed for amido protons locked
Because the secondary structures in the native state are within the secondary and tertiary structure of the native
used to store the information about the protection that state. The effects of site-directed mutations of particular
occurred upon the formation of the intermediate, this isoleucines to leucines and valines or particular leucines
information is automatically divided into segments to isoleucines and valines in the hydrophobic core of the
bounded by those elements of native secondary struc- crystallographic molecular model of dihydrofolate
ture. Any information about the regions of the polypep- reductase from E. coli were dramatically different on the
tide outside of these segments of native secondary yield of the intermediate observed in the kinetic burst
structure has been automatically erased. Amido protons than on the stability of the native state, a result suggest-
ing that the packing of the native structure had not yet
* In most of these experiments the protein is unfolded in 2H2O and been established in the intermediate, as is the case in a
then folded in 1H2O, and the gain of protons rather than their loss molten globule.254
is monitored. The most explicit evidence that these intermediates
Kinetics of Folding 693
100
80
60
Proton occupancy (%)
40
20
0
100
80
60
40
20
0
1 10 102 103 104 1 10 102 103 104 1 10 102 103 104 1 10 102 103 104
Time (ms)
Figure 13–11: Sequestration of amido protons during the folding of apomyoglobin as followed by rapid mixing.227 A series of three rapid
mixing chambers fed by four syringes were assembled in a cold room at 5 ∞C. Apomyoglobin from P. catodon was dissolved in 6 M urea and
10 mM sodium acetate, pH 6.1, in 1H2O and stood until it was fully unfolded. Flow through the mixing chambers was then initiated. The solu-
tion was diluted in the first mixing chamber 8.5-fold with 10 mM sodium acetate, pH 6.1, in 2H2O. This mixture then travelled in the tubing
connecting the first mixing chamber to the second for various periods of time (milliseconds). In the second rapid mixing chamber, the solu-
tion was mixed with an equal volume of a solution containing the buffers tris(hydroxymethyl)methylammonium ion, N-(ethylsulfonato)mor-
pholinium ion, and acetate ion at a final ionic strength of 0.2 M, pH 10.2 in 2H2O, which immediately brought the pH of the mixture to pH 10.2.
This second mixture passed through a piece of tubing in which it spent 20 ms in transit to the third mixing chamber. In the third rapid mixing
chamber it was diluted by mixing with a solution of the same ionic buffers at an ionic strength of 0.25 M, pH 1.9 in 2H2O, that adjusted the
final pH to 5.6. The effluent from the last mixing chamber was directed into a solution of hemin to turn the apomyoglobin to myoglobin and
lock the protein in its native state. The amplitudes of the absorptions from the amido protons in two-dimensional nuclear magnetic reso-
nance spectra (Figure 12–33) of the final solutions were determined for each sample. The different absorptions had been previously assigned
to specific amido protons in the amino acid sequence of the protein.216,588 The percentage that each position in a set of representative posi-
tions was occupied (percentage occupancy) by a proton is plotted as a function of the time (milliseconds) the polypeptide was allowed to
fold before the protons were exchanged with deuterons at pH 10.2. The amido protons are grouped according to the secondary structure they
occupy in the crystallographic molecular model of myoglobin.216,588 The eight a helices in the crystallographic molecular model are desig-
nated in alphabetical order from the amino terminus and the CD loop is the segment of random meander between helix C and helix D.
Occupancies of the amides of Leucine 29, Isoleucine 30, and Phenylalanine 33 from helix B are plotted in the upper panel; those of Isoleucine
28 and Arginine 31 from helix B are shown in the lower panel. Reprinted with permission from ref 227. Copyright 1993 American Association
for the Advancement of Science.
formed during the kinetic bursts are similar if not identi- ribonuclease H from E. coli, the pattern in which pro-
cal to well-characterized molten globules of the same tected amido protons are distributed over its sequence of
polypeptide observed at equilibrium is the correspon- amino acids226 closely matches the pattern in which
dence in the specific amido protons protected during those amido protons are protected in the molten globule
their formation with the respective amido protons pro- that predominates at equilibrium at levels of pH less than
tected in the respective molten globule. For example, in 2.255 When amino acids in regions that have been
the intermediate formed during the kinetic burst in the observed to be protected in the intermediate formed
folding of apomyoglobin, amido protons in positions in during the kinetic burst are mutated, the yield of the
the amino acid sequence that form the first (A), a portion intermediate decreases; but when amino acids that have
of the second (B), the seventh (G), and the eighth (H) not been observed to be protected are mutated, the yield
a helices in the native state of the protein216 are pro- of the intermediate is unaffected.256
tected, but those that form the rest of the second, the Apomyoglobin and type I ribonuclease H both
third (C), and the fifth (E) a helices are not protected seem to form an intermediate that already contains some
(Figure 13–11). This is the same pattern of protection but not all of the specific secondary structures that will
observed in the molten globule of this polypeptide that is eventually end up in their native states. The kinetic inter-
the dominant state at equilibrium between pH 4 and 5.65 mediates of b-lactoglobulin, however, which is also a
Likewise, in the intermediate formed in the kinetic burst molten globule, displays a circular dichroic spectrum
in the folding of the cysteineless version of type I indicating that it contains significant amounts of a helix
694 Folding and Assembly
Relative fluorescence
other intermediate precedes the molten globular inter-
mediate.
Events that occur within even shorter intervals can 0.6
be monitored by temperature jump. This approach
exploits the fact that at a low temperature, the stability of
a protein increases as the temperature is raised (Figure 0.4
13–5). A solution of protein in a concentration of denat-
urant within the region of transition is brought to a low
temperature, which increases the concentration of the
0.2
denatured state at the expense of the folded state. The
temperature of the solution is then jumped by the rapid
application of heat, and the solution stabilizes at a higher
temperature within a hundred nanoseconds. The 0.0
approach to the new equilibrium now favoring the folded 0 200 400 600 800
state is then monitored. When the folding of equine Time ( ms)
cytochrome c264 and barstar from B. amyloliquifaciens265 Figure 13–13: Folding of cytochrome c monitored by continuous
are examined after a temperature jump of 10 ∞C, relax- flow.270 Equine cytochrome c was dissolved in 0.01 M HCl and
ations with rate constants of 11,000 s–1 and 3000 s–1 were deionized by molecular exclusion chromatography in 0.01 M HCl.
observed, similar to those observed for other proteins by It was then mixed in a rapid mixing chamber with 10 volumes of
50 mM sodium acetate and 50 mM sodium phosphate, pH 5.1. The
continuous flow. A much faster relaxation with a rate
final pH of the mixture was 4.5. The effluent from the mixing cham-
constant of 200,000 s–1 at 10 ∞C is observed for apomyo- ber was passed through a 0.25 mm ¥ 0.25 mm channel in a quartz
globin, and this rate constant is significantly affected by block at 0.62 mL s–1 (0.99 mm ms–1). The block was illuminated with
the viscosity of the solvent, an observation suggesting light at 280 nm wavelength, and fluorescence emission at a wave-
that it represents the collapse of the denatured state,266 length greater than 324 nm was measured as a function of the dis-
tance along the channel. The fluorescence relative to that of the
and a relaxation with a similar rate constant has been
denatured polypeptide in 0.01 M HCl (1.0) and the folded native
observed for the folding of bovine ribonuclease A.267 That state at pH 4.5 (0.0) is presented as a function of time (microsec-
these very rapid relaxations, however, monitor the initial onds). The data are fit with the solid curve, which is the sum of two
global collapse of the random coil has been ques- first-order exponentials with rate constants of 17,000 s–1 and
tioned,268 and it is possible that the initial collapse of 2300 s–1 and amplitudes of 0.60 and 0.29. The dead time of the
apparatus was measured directly and found to be 45 ms. Reprinted
these proteins usually occurs much more slowly, with
with permission from ref 270. Copyright 1998 Nature Publishing
rate constants in the range below 10,000 s–1. Group.
When it is dissolved in 0.01 M HCl, equine
cytochrome c is in a denatured state that is not a random
coil but is at least as expanded.269,270 Upon jumping of the proton exchange that both the amino- and carboxy-
pH to 4.0, the protein refolds from this expanded state. terminal a helices of the native structure form during
The refolding can be followed from 45 ms to 1 ms by con- this step, but not the a helices between positions 60 and
tinuous flow (Figure 13–13).270 It appears to pass through 80 in the amino acid sequence (Figure 7–9).251 The
two clearly resolved steps with observed rate constants of molten globule formed during this second step, however,
17,000 s–1 and 2300 s–1. No kinetic burst is observed. displays none of the molar ellipticity at 420 nm indicative
During the first step 60% of the fluorescence from of the asymmetric environment of the native structure
Tryptophan 59, the only tryptophan in the protein, is surrounding the heme.248 Finally the native state arises in
quenched by the covalently attached heme as the a biphasic process with rate constants of 2.5 s–1 and
expanded conformation collapses, but no secondary 0.25 s–1 at 10 ∞C251 or 8 s–1 and 0.8 s–1 at 25 ∞C.248
structure forms beyond the residual a helix found in the The expanded denatured state of equine
expanded denatured state.259 During the second step, cytochrome c in 0.01 M HCl has an a-helical content that
about 70% of the a-helical content of the native state is is 20% that of the native state,259 and although the a-hel-
regained, to produce a molten globule. ical content does not increase during its collapse, the
What seems to be the same second step, involving earliest observed collapsed intermediate does contain
the formation of the same level of a-helical content, this amount of a helix. Moreover, when the folding is
occurs more slowly (50 s–1 at 10 ∞C) when the random coil performed by diluting the random coil of cytochrome c,
in 4.4 M guanidinium chloride is diluted to 0.7 M guani- which contains no a helix, from 4.4 to 0.4 M guanidinium
dinium chloride, so it is possible to determine by rapid chloride, there is still no kinetic burst and all of the
696 Folding and Assembly
change in fluorescence at times less than 1 ms can be the hydrophobically collapsed state, is formed. The rate
resolved into two steps with rate constants of 21,000 s–1 equation for the formation of the molten globule from
and 730 s–1 at 22 ∞C.270 The initial collapsed state, how- the unfolded state by this mechanism is
ever, again has 20–30% of the a-helical content of the
( )
native state.230,259
These observations raise the question of whether or k 2 K cpse
not the polypeptide of a protein is able to collapse
[MG] = [protein]TOT 1 – exp –
1 + K cpse
t
hydrophobically in an isomerization that involves no for-
mation of secondary structure. Is there a purely (13–26)
hydrophobic collapse? Certainly, all of the condensed
kinetic intermediates observed during the folding of where [protein]TOT is the total concentration of protein
polypeptides, when they are assayed for secondary struc- and Kcpse is the equilibrium constant for the hydrophobic
ture, do contain it. Furthermore, the fastest events in collapse:
protein folding involving condensation of the random
coil to form a globular state usually have rate constants [C] k1
of less than 20,000 s–1, often much less. For example, in K cpse = = (13–27)
the case of the immunoglobulin binding domain of pro-
[ U ] k –1
tein L from Peptococcus magnus, the condensation of the
random coil to a globular state271 occurs in a first-order Equation 13–26 defines a first-order formation of the
reaction with a rate constant of only 0.12 s–1. Yet there are molten globule with an observed rate constant
measurements indicating that the purely hydrophobic
collapse of a polypeptide should have a rate constant of k 2 K cpse
at least 106 s–1 at 20 ∞C,272 and theoretical treatments264 k obs = (13–28)
1 + K cpse
suggest that the rate constant should be 107 s–1.
Furthermore, a purely hydrophobic collapse should have
no energy of activation, but the observed relaxations Upon initiation of the reaction, the equilibrium between
assigned to the collapses of denatured states do. the unfolded state and the hydrophobically collapsed
Explanations are required for both the slower than state would be established immediately, and the molten
expected rate constants observed for these condensa- globule would appear in a kinetically first-order reaction.
tions and the fact that most if not all of the initial con- If the equilibrium constant between the unfolded state
densed states contain significant secondary structure. and the hydrophobically collapsed state is less than 1, no
Even the extremely rapid relaxation of apomyoglobin purely hydrophobically collapsed state should be
with a rate constant of 200,000 s–1 observed by tempera- observed, and none is.
ture jump nevertheless seems to involve an intermediate It is reasonable that this equilibrium constant
with significant secondary structure.266,268 should be less than 1. Few if any polypeptides should be
Suppose that kinetically, a purely hydrophobic col- able to bury enough hydrogen–carbon bonds upon their
lapse occurs before the formation of any of the second- hydrophobic collapse to overcome both the unfavorable
ary structure characteristic of a molten globule, and that loss of the configurational entropy of the random coil and
the mechanism for folding is the unfavorable loss of solvation arising from the
inescapable transfer of donors and acceptors of hydrogen
k1 k2 bonds from the water into the interior of the collapsed
U 1 C Æ MG (13–25) state. Certainly the small values for standard free energy
k –1
of folding (Table 13–2) suggest that this must be the case.
Many of these unbonded donors and acceptors buried
where C is the hydrophobically collapsed state and MG is during the collapse, however, do become occupied within
the subsequent molten globule. If both steps were intrin- the secondary structure of the molten globule when it
sically fast reactions, if k2 were greater than k1, and if k–1 forms. These buried hydrogen bonds within a helices and
were greater than k2, the reactions would be coupled, b structure of the molten globule, because they are
and little purely hydrophobically collapsed state would formed in the absence of water, stabilize it relative to a
be observed. It has just been noted, however, that k1, the hydrophobically collapsed state lacking any internal
hydrophobic collapse, should be much faster than the hydrogen bonds. Consequently, the molten globule with
observed rate for the overall formation of the molten its characteristic secondary structure can be the first
globule. intermediate observed even though the mechanism of
If k–1 >> k2, then the unfolded state and the folding passes obligatorily through a hydrophobically col-
hydrophobically collapsed state are in a rapid preequi- lapsed state lacking any secondary structure.
librium that precedes the step in which the secondary This explanation, however, is inconsistent with the
structure, which distinguishes the molten globule from observation that the reciprocals of the observed rate con-
Kinetics of Folding 697
B. amyloliquifaciens affect the observed rate constant for into groups as the concentration of denaturant is
the production of the native state from the intermediate increased (Figure 13–14).116,117 These groups are groups
formed during the kinetic burst, while others affect the of amido protons involved in particular secondary struc-
stability of that intermediate.276 The observed rate con- tures in the native protein. For example, in the crystallo-
stant (20 s–1 at 20 ∞C) for the principal relaxation graphic molecular model of cytochrome c (Figure 7–9),
observed in both molar ellipticity and fluorescence fol- Arginine 91 through Lysine 100 (Figure 13–14A) are
lowing the kinetic burst during the folding of lysozyme within one a helix; Methionine 65 through Asparagine 70
from bacteriophage T4 is affected significantly by site- (Figure 13–14B) are within another a helix; and the
directed mutations in the carboxy-terminal half of the e amido proton of Tryptophan 59 and the a amido pro-
polypeptides but not by mutations in the amino-termi- tons of Leucine 64, Lysine 60, Phenylalanine 36, and
nal half even though the mutations in both halves affect Glycine 37 are within a cluster of adjacent hydrogen
the stability of native state.277 Presumably the process bonds (Figure 13–14C). The coalescence of the respective
being registered as an apparent single step by both cir- sets of standard free energies of exposure indicate that
cular dichroism and fluorescence is the folding of only each of these elements of secondary structure opens to
the one half of the protein and not the other. exchange its amido protons cooperatively.
It is more common to observe two or more steps There are indirect observations suggesting that
rather than just one during the transition between a these openings of the structure, which are occurring all
kinetic molten globular intermediate and the native state. the time in the native protein, actually occur sequen-
For example, four additional steps can be discerned fol- tially.281 For example, the cluster of hydrogen bonds
lowing the formation of the first kinetic intermediate involving the a amido proton of Lysine 60 must open
during the folding of b-lactoglobulin;260 two, during the before the a helix containing Leucine 68, which must
folding of human lysozyme;242 two, during the folding of open before the a helix containing Leucine 98 during the
cytochrome c;230,251 and two, during the folding of micro- exchange of the amido protons in the latter a helix.
coccal nuclease from S. aureus.278 Again, the number of Furthermore, as was the case with the exchange of the
phases detected often depends on how many spectral amido proton at Methionine 47 in cysteineless type I
properties have been monitored. For example, five addi- ribonuclease H from E. coli (Figure 13–8), the exchange
tional steps in the refolding of dihydrofolate reductase of the amido protons of the secondary structure that is
from E. coli were discerned following the formation of a the last to open, namely the a helix containing Leucine
molten globular intermediate if molar ellipticity at 98 in cytochrome c, tracks the global unfolding of the
220 nm, molar ellipticity at 235 nm, absorbance, and protein.
intrinsic fluorescence were all monitored.231 All of these observations suggest that the exchange
Some if not most of these multiple steps may each of amido protons in the native state of a protein reveal, in
involve the assembly of a particular portion of the final reverse, the steps in the formation of secondary struc-
secondary structure of the native state.278,279 ture and the locking in of that secondary structure as
Intermediates formed after the formation of the initial the elements pack next to each other during the normal
molten globule, however, often have all of the secondary folding of the protein. In other words, during the folding
structure of the native state, but some of that secondary of cytochrome c, the a helix containing Leucine 98 forms
structure is in a less stable state than it will eventually before that containing Leucine 68, which forms before
assume upon complete folding,230,251 as if the rigid con- the cluster of hydrogen bonds involving the a amido
formation of the native structure maintained by the proton of Lysine 60.
proper packing of the secondary structures locks in place The slowest steps in the folding of a protein from its
only in the final step or one of the final steps in the random coil are often the isomerizations of peptide
process. The last or almost the last property to appear in bonds to the amino-terminal side of prolines. In the
the folding of a protein is the spatial arrangement of the crystallographic molecular models of proteins, about 6%
constellation of side chains responsible for its func- of the peptide bonds on the amino-terminal sides of pro-
tion.258,280 line are cis peptide bonds282,283 and the rest are trans pep-
The order in which different elements of secondary tide bonds (Equation 6–1). These are geometric isomers
structure form and lock into place during the transition of each other. The peptide bond on the amino-terminal
from the initial molten globular intermediate to the final side of a proline at a particular position in the amino acid
native state can be inferred from studies of native-state sequence of a particular protein usually will be either cis
proton exchange. The amido protons of the peptide in every molecule of the native state or trans in every
bonds buried in the native state of a protein and defining molecule of the native state. In the random coil, however,
its secondary structure exchange with deuterons in the every proline is free to adopt either geometric isomer,
solvent at different rates. When standard free energies for and the cis and trans isomers slowly come to equilibrium.
the conformational equilibria leading to their exposures In dipeptides, the equilibrium constants between cis and
are followed as a function of the concentration of a trans isomers of proline vary with pH, but they fall
denaturant, it is observed that they generally coalesce between 10 and 1.5 in favor of the trans isomer. The more
Kinetics of Folding 699
prolines that must be one or the other isomer before the folding form and the slowly folding forms of the random
native state can be achieved, the more random coils with coil of ribonuclease consistent with them being due
at least one incorrect isomer of proline will be present in entirely to isomerization of peptide bonds on the amino-
the solution. terminal sides of prolines in the sequence. The observed
When bovine ribonuclease A is added to 5 M guani- rate constant for the approach to the equilibrium
dinium chloride at pH 2.3, the unfolding of the polypep- between the rapidly folding form and slowly folding
tide is very rapid (<10 s), and the unfolding produces a
random coil, cross-linked by its four cystines.131 If the
solution containing this random coil is diluted within 50 L98 A
15 s to 1.3 M guanidinium chloride, pH 6.4, at 25 ∞C, all of
the polypeptide (> 95%)131 refolds to the native state,
capable of full enzymatic activity,284 in an uncomplicated 40 A96,Y97
first-order relaxation285 with a rate constant of about L94
R91
10 s–1. This result demonstrates that a random coil that a E92
30 D93
moment ago was native ribonuclease can refold rapidly,
K100
and with no obvious complications, back into native
ribonuclease.
20
If rapidly unfolded ribonuclease, however, is
allowed to sit as a random coil in a solution of guani-
dinium chloride over a period of 10 min, the kinetics of
50 L98 B
refolding are split into several phases, one with the same
Figure 13–14: Rates of exchange of particular amido protons along the polypeptide backbone of native equine cytochrome c as a function
of the concentration of guanidinium chloride.116 Equine cytochrome c was dissolved at p2H 7 in 2H2O at the noted concentrations of guani-
dinium chloride. After different intervals, samples were removed, the pH was adjusted to 5 to slow the rates of exchange, and a two-dimen-
sional nuclear magnetic resonance spectrum was gathered. The amplitudes of each of the peaks in each of the respective spectra were
tabulated as a function of the time spent at p2H 7 in 2H2O before the pH was lowered, and rates of exchange were calculated from the
decreases in these amplitudes as a function of time. Each exchange was at the EX2 limit, and the equilibrium constant for the formation of
the conformation exposing each amido proton at each concentration of guanidinium chloride was calculated (Equation 12–63). From these
equilibrium constants, standard free energies for the exposures of each proton, DG∞HX, were calculated. These standard free energies of expo-
sure (kilojoules mole–1) are plotted as a function of the concentration (molar) of guanidinium chloride (GdmCl). The standard free energies
of exposure coalesced into specific groups as the concentration of guanidinium chloride was raised. Separate plots for three of these groups
are presented: (A) the amido protons in the a helix containing Arginine 91 (R91) to Lysine 100 (K100); (B) the a helix containing Methionine
65 (M65) to Asparagine 70 (N70); and (C) a cluster of hydrogen bonds containing the e amido proton of Tryptophan 59 and the a amido pro-
tons of Leucine 64 (L64), Lysine 60 (K60), Phenylalanine 36 (F36), and Glycine 37 (G37). In each successive panel, the lines for the most exten-
sively protected amido protons from the previous plots are drawn as a dashed line to identify the discrete groups. Reprinted with permission
from ref 116. Copyright 1995 American Association for the Advancement of Science.
700 Folding and Assembly
forms of the random coil288 and the observed rate con- random coils of ribonuclease is diluted into conditions
stant for the formation of the native state from the slowly favorable to folding, some of the slowly folding random
folding forms of the random coil of ribonuclease289 are coils assume nativelike conformations in which the crit-
both increased by strong acid, as are the rate constants ical peptide bonds on the amino-terminal sides of pro-
for the cis–trans isomerization in dipeptides of proline.286 lines are nevertheless the incorrect isomer. One
The standard enthalpy of activation for the approach to intermediate, I1, is sufficiently folded to trap almost 20
the equilibrium between the rapidly folding form and the amido protons in stable hydrogen bonds,293 and another,
slowly folding forms of the random coil is between 75 IN, is compactly folded by several criteria,294 including its
and 90 kJ mol-1, either at low pH or in 5 M guanidinium insensitivity to digestion by pepsin.295 These compact
chloride,288,290 which compares favorably to the values for intermediates, however, differ from the native state and
the standard enthalpy of activation (80–90 kJ mol–1) for can be distinguished from it by having the incorrect iso-
the cis–trans isomerization of dipeptides of proline.286 mers at particular prolines.289
Neither the rate of the approach to cis–trans equilibrium A similar but more dramatic effect of the slow iso-
of dipeptides of proline nor the approach to the equilib- merizations of prolines is observed in the folding of
rium between the rapidly folding form and the slowly ribonuclease T1 from A. oryzae. In the crystallographic
folding forms of the random coil is affected by the con- molecular model of this even shorter protein (104 aa), the
centration of guanidinium chloride.291 peptide bonds preceding both Proline 39 and Proline 55
In the crystallographic molecular model of bovine are cis.296 If the native protein is unfolded in 6.0 M guani-
ribonuclease A, the peptide bonds on the amino-termi- dinium chloride, pH 1.6, and the resulting random coil is
nal sides of Proline 93 and Proline 114 are cis peptide diluted after 5 s to 1.0 M guanidinium chloride, pH 5.0,
bonds. The amount of the peptide bond amino-terminal 80% refolds in a single first-order relaxation with a rate
to Proline 93 that is in the cis form in the random coil can constant of 6 s–1.297 As it sits in 6.0 M guanidinium chlo-
be monitored by its insensitivity to endopeptidolytic ride, however, the percentage of the fast-folding isomer
cleavage by Xaa-Pro dipeptidase.287 In 8.5 M urea at decreases to 3% and the approach to this equilibrium has
10 ∞C, 70% of this peptide bond is cis. When the urea is a rate constant of around 0.05 s–1. From an analysis of the
diluted to 0.3 M, the 30% of this peptide bond that is kinetics of this loss of the rapidly folding state of the
trans slowly and completely reverts to the cis isomer with random coil in both wild-type protein and protein in
a rate constant of 0.01 s–1, as the polypeptide folds. Under which Proline 55 had been mutated to asparagine, it
these conditions, 30% of the random coil refolds in the could be concluded that the random coil with both pro-
slowest phase and 30% of the activity of the enzyme is lines in the cis isomer folds to the native state with full
regained in this slowest phase, both with a rate constant enzymatic activity at 6 s–1, that the isomerization of cis-
of 0.01 s–1. Proline 93 is preceded by Tyrosine 92, and the Proline 55 to trans-Proline 55 has a rate constant of
fluorescence from this tyrosine tracks the slow isomer- 0.05 s–1 and a cis–trans equilibrium constant of 0.16, and
izations that produce fully native enzyme during the that the isomerization of cis-Proline 39 to trans-Proline
refolding of the equilibrated random coil after a decrease 39 has a rate constant of 0.02 s–1 and a cis–trans equilib-
in the concentration of guanidinium chloride.289 It was rium constant of 0.1. At equilibrium, 78% of the random
presumed that the slow process monitored by the fluo- coils have both prolines in the trans conformation.
rescence of Tyrosine 92 is the state of isomerization of Nevertheless, when this mixture of geometric iso-
the peptide bond between Tyrosine 92 and Proline 93. mers of the random coil is diluted to 1 M guanidinium
If both Proline 93 and Proline 114 are mutated, the chloride, pH 5.0, at least 70% of the random coils collapse
former to an alanine and the latter to a glycine, the sta- at a rate of 50 s–1 to molten globules in which the central
bility of the native protein is decreased significantly, but b sheet has formed298 and the a helix of the native state
its random coil is still able to fold to produce a protein then forms within these molten globules with a rate con-
that is enzymatically active.292 The folding of this double stant of 20 s–1. When the same equilibrium mixture is
mutant, when monitored by molar ellipticity, is a single diluted to 0.15 M guanidinium chloride, the entire far-
first-order reaction with a rate constant of 0.07 s–1 in ultraviolet circular dichroic spectrum of the native pro-
0.4 M guanidinium chloride at 10 ∞C with no evidence for tein is regained in a few seconds.299 The resulting
any slower phase. condensed, molten globular states, however, do not
It has been concluded from all of these observa- become native protein until both Proline 55 and Proline
tions that all of the slow isomerizations of the random 39 have become cis. The isomerizations that produce the
coil of bovine ribonuclease A that in turn produce the proper cis isomers, and hence the native state, proceed in
slowly folding forms from the rapidly folding form are these molten globular states with rate constants between
isomerizations of peptide bonds on the amino-terminal 0.01 s–1 and 0.0003 s–1 at 10 ∞C (Figure 13–15A).299 The iso-
sides of prolines from the isomer found in the native merization of Proline 39 is retarded significantly by the
state and that the most disruptive isomerization of the formation of these partially folded molten globules, and
random coil is to a trans proline at position 93. When the this retardation is in part responsible for the slowest rate
equilibrium mixture of rapidly folding and slowly folding constant of 0.0003 s–1 for 66% of the protein.300
Kinetics of Folding 701
In the laboratory, the foldings of many polypeptides is in the cis isomer and the other in which it is in the trans
have slow phases with rate constants between 0.1 s–1 and isomer.304,311 Because the slow isomerizations of the pro-
0.002 s–1 at 25 ∞C that are attributed to proline isomeriza- lines significantly complicate the kinetics of folding,
tion.228,240,286,301–306 Many of these attributions have been most of the proteins chosen for detailed studies of fold-
validated by demonstrating that when particular pro- ing are those that do not have prolines that have to iso-
lines in the polypeptide are eliminated by site-directed merize.
mutation, the slow phases disappear.228,240,301–303 A prob- This choice is appropriate because in the cytoplasm
lem with this approach is that if the proline that must be and the extracytoplasmic spaces in which all polypep-
mutated is critical to the structure of the protein, in par- tides normally fold, as opposed to the laboratory, the iso-
ticular if it is cis in the native state, its removal often merizations between the cis and trans isomers of the
destabilizes the protein significantly and alters the kinet- peptide bonds to the amino-terminal sides of prolines
ics of folding.307 For example, the double mutant of are catalyzed by peptidylprolyl isomerases. The first
bovine ribonuclease A folds about 100-fold more slowly enzyme with this catalytic activity that was purified to
than the wild-type polypeptide with both Proline 93 and homogeneity312,313 was assayed by its ability to catalyze
Proline 114 in the cis isomer.292 the cis–trans isomerization in N-glutaryl-Ala-Ala-Pro-
In the folding of small proteins that contain only Phe 4-nitrophenylanilide.314 This particular enzyme is
one or two domains, most if not all of the steps that have able to increase the rates of the slowest phases in the
rate constants less than 0.1 s–1 result from required iso- refolding of, among other proteins, ribonuclease A,315,316
merizations of one or more prolines. The rate constants the light chain of immunoglobulin,317 human acylphos-
for the isomerization of proline between cis and trans phatase,306 and type III collagen.318
isomers in a random coil are in the range from 0.1 s–1 to In most of these instances, the increases observed
0.002 s–1 at 25 ∞C.297 The isomerization of trans to cis is in the rate constants for these slowest phases in folding
always slower by a factor of 2–10 because of the equilib- were relatively unremarkable (less than a factor of 10)
rium constants, so if the proline is cis in the native state, even at high concentrations of the peptidylprolyl iso-
which because of its peculiar geometry is usually the merase. It has been found, however, that there are many
more critical for proper folding, any step during folding different isoforms of peptidylprolyl isomerase in a given
that involves the formation of that cis isomer will be quite organism; for example, there are at least 25 different pep-
slow (Figure 13–15). tidylprolyl isomerases encoded by the human genome,
A related set of isomerizations proceed at a more and several are often found within the same cell.319,320 It
rapid rate than those for cis-prolines. In the fully equili- is possible that if the folding protein were matched with
brated random coil, about 0.0015 of the peptide bonds its proper peptidylprolyl isomerase under conditions
amino-terminal to amino acids other than proline are in resembling those encountered by the folding protein in
the cis conformation.308 Although this is a small fraction, the cytoplasm, much more effective rates of catalysis
in a protein with 100 amino acids only 86% of the random would be observed, but peptidylprolyl isomerases from
coils will be all trans at equilibrium. If the native state has bacteria, fungi, and animals are about as effective when
a cis peptide bond amino-terminal to an amino acid catalyzing the folding of the same protein.321 Within an
other than proline or if a significant percentage of its intact mitochondrion, the increase in the rate of folding
positions cannot tolerate cis peptide bonds during its catalyzed by endogenous peptidylprolyl isomerase was
folding, a fraction of the random coils will be in geomet- observed to be a factor of only 2–6.322 It is possible that
ric isomers incapable of folding to the native state. The the effect of peptidylprolyl isomerases on the rate of fold-
monocationic or monoanionic forms of dipeptides ing in the cytoplasm and the various extracytoplasmic
approach the equilibrium between their cis and trans spaces is usually modest.
isomers at a rate constant of about 1 s–1 at 25 ∞C,309 and a Peptidylprolyl isomerases have been isolated from
slow phase with a rate constant of 2.5 s–1 at 25 ∞C involv- bacteria,323 fungi,324 and plants325 as well as from animals.
ing 5% of the random coils during the folding of a-amy- The peptidylprolyl isomerase associated with the ribo-
lase inhibitor HOE-467A (naa = 74) from Streptomyces some in E. coli appears to be one of the most effective.326
tendae has been attributed to random coils in which crit- The extent of exposure of the peptide bonds
ical peptide bonds are in the incompatible cis isomer.308 amino-terminal to proline is an important factor in the
Most of the time, the isomerizations of only a few efficiency with which peptidylprolyl isomerases can
prolines, usually to the cis isomer, are required steps in function299,327 because they must be able to find the bond
the complete folding of a protein. The isomerizations of before they can isomerize it. For example, peptidylprolyl
many of the peptide bonds amino-terminal to prolines in isomerase is able to catalyze the faster of the isomeriza-
a protein have no effect on its kinetics of folding,310 and tions of proline that must occur during the folding of
the folding of a number of proteins show no slow phases ribonuclease T1 to its native state, but not the slowest
that result from proline isomerization.242 In fact, the (Figure 13–15). This slowest step involves the isomeriza-
native states of some proteins have two conformations in tion of Proline 39, which has already been slowed con-
slow equilibrium with each other, one in which a proline siderably from its rate in the random coil297 by being
702 Folding and Assembly
the rate-limiting step, the rate constants of any rate- constant for its folding increases if the mutation is in the
determining steps preceding the rate-limiting step, and second a helix but not if it is in the first.340 When the two
the equilibrium constants of any unfavorable preequilib- a helices of transcriptional repressor arc from bacterio-
ria that precede the rate-limiting step (Equations 13–28 phage P22 were destabilized by mutations of alanines to
and 13–30). Favorable preequilibria preceding the rate- glycines, the observed rate constants of folding
limiting step would require the formation of observable decreased significantly,341 while alanine to glycine muta-
intermediates in the reaction. tions in only one of the a helices in the amino-terminal
The following facts indicate that, immediately upon domain of the repressor from bacteriophage l affect the
completion of the rate-limiting step in one of these fold- observed rate constant of its folding.342 The magnitudes
ings that appears to proceed through a single kinetic of the changes in the observed rate constants of folding
step, a molten globular intermediate containing signifi- relative to the change in the standard free energy of fold-
cant secondary structure has formed from the random ing for mutations at 37 positions in chymotrypsin
coil. The observed rate constants of these foldings are inhibitor 2A suggest that some elements of secondary
significantly affected by the concentration of denaturant structure have formed by the time the transition state of
(Figure 13–1B); they increase by factors as large as 10,000 the rate-limiting step has been reached but that they are
between a solution containing a concentration of denat- less stable than they are in the native state.343 The effects
urant within the region of transition and one containing of site-directed mutation on the folding of the WW
no denaturant.331 This fact requires that considerable domain from peptidylprolyl isomerase Pin 1, however,
accessible surface area be lost either in the transition indicate that its folding is most sensitive to changes in
state of the rate-limiting step, during rate-determining what is an external loop in its crystallographic molecular
steps preceding the rate-limiting step, or during preequi- model,344 an observation suggesting that secondary
libria preceding the rate-limiting step. The apparent structure in the native state may not correspond to sec-
molar activation volume for one of these foldings is pos- ondary structure in a molten globular intermediate.
itive and even larger than the positive change in molar All of the observations discussed so far can be
volume between random coil and native state,337 an explained as effects either on the rate-limiting step or on
observation indicating that the transition state for the one or more unfavorable preequilibria preceding the
rate-limiting step or for rate-determining steps that pre- rate-limiting step. It is the case, however, that the
cede it is globular or that an intermediate formed in an observed rate constant for at least one of these foldings
unfavorable preequilibrium is globular, as is the native that appears to proceed in a single kinetic step is linearly
state, but is somewhat less compact than the native state. related to the inverse of the viscosity of the solvent.273
All338 or most339 of the change in standard heat capacity This fact requires that the condensation of a conforma-
between random coil and native state has occurred by tion or set of conformations of the random coil occur
the time the transition state in the rate-limiting step during the rate-limiting step, not before it. This conclu-
appears, an observation indicating that the hydrophobic sion follows from the fact that the viscosity of the solvent
functional groups buried in the native state are buried must affect both contraction and expansion of a
during the rate-limiting step or before it. The effect of pH polypeptide equally and hence an equilibrium constant
on the observed first-order rate constant for one of these for contraction not at all.
foldings indicates that some carboxylates have been The observed rate constants for these foldings that
sequestered from the solvent, but far fewer than those appear to proceed in a single kinetic step span a wide
sequestered in the native state, by the time the transition range from 0.2 s–1 (28 ∞C) to 120,000 s–1 (37 ∞C). The fact
state of the rate-limiting step has formed.339 In the case of that these rate constants span such a wide range and the
the carboxy-terminal domain of protein L9 from the 50S fact that nevertheless they all seem to register condensa-
subunit of the ribosome from E. coli, the mutation of tions producing molten globular intermediates suggest
Histidine 134, the histidine in the protein with the lowest that their rate-limiting steps are not hydrophobic col-
pKa in the native state, caused the most dramatic change lapse, for which the observed rate constant would be
in the dependence of the rate constant kf on pH, a result related solely to the length of the polypeptide and would
suggesting that this same histidine is also the most not vary so dramatically,272 but hydrophobic collapse of
buried by the time the transition state of the rate-limiting a conformation or set of conformations of the random
step has formed and that at this point the protein resem- coil that are present at low occupancy; the lower that
bles the native state.113 occupancy, the slower the observed rate constant
It is the effects of site-directed mutations on the (Equation 13–30). Presumably, the subset of conforma-
observed rate constants of these foldings appearing to tions of the random coil that are present at such low
proceed in a single kinetic step which indicate that sec- levels of occupancy are those that contain sufficient sec-
ondary structure has formed before or during the rate- ondary structure to condense to a stable molten globule.
limiting step. For example, when one or the other of the Consistent with this presumption is the fact that the
two a helices in acylphosphatase is stabilized by mutat- observed rate constant for at least one of these foldings
ing one of its amino acids to alanine, the observed rate that appears to occur in a single step is significantly
704 Folding and Assembly
increased by the addition of low concentrations (0.006 observed rate constants are only apparent rate con-
mole fraction) of trifluorethanol or 1,1,1,3,3,3-hexaflu- stants. For example, the kinetic progress of the folding of
oro-2-propanol.330 These cosolvents are known to stabi- cytochrome c¢ from Rhodopseudomonas palustria,
lize secondary structure in unfolded polypeptides. although apparently proceeding through four steps
The apparent single step registered in each of these when the absorbance of the heme is followed at 440 nm,
foldings is equivalent to the step producing the molten can be fit more exactly by a set of 80 rate constants span-
globular intermediates formed during the kinetic burst in ning a range from 105 to 10–2 s–1.345 This observation by
the foldings of proteins that proceed through multiple itself is not surprising. Were the kinetic measurements of
steps. In the case of the proteins folding in an apparent a simple chemical reaction that definitely had only four
single step, the steps following the molten globular inter- steps to be fit with an equation derived for 80 steps, the
mediate are kinetically silent. The advantage of the fold- fit for 80 steps would necessarily be better than the fit for
ings in a single step is that attention can be directed four steps. Nevertheless, this exercise suggests that there
exclusively on this one step, uncomplicated by the later are more than just four steps involved in the folding of
ones; the disadvantage is that the later steps cannot be this cytochrome c¢. Careful examination of Figure 13–13
studied at all. also suggests that a mechanism involving more than two
What seems to occur during folding of a polypep- steps in the range monitored would yield a rate equation
tide can be summarized. The random coil has conforma- more successful at fitting the actual data.
tions within its ensemble that are evanescently present at It has been shown that the apparently first-order
low concentrations and that contain sufficient secondary rates of the transformations between apparently discrete
structure to stabilize a molten globular intermediate. intermediate states in the folding of a protein and the
One or more of these minor conformations of the behavior of those rates as a function of the concentration
random coil collapses hydrophobically to form one or of denaturant are more successfully fit by kinetic models
more molten globules.* Within these molten globules, involving a large number of intermediate states in
some of the secondary structure of the native state is sequence.236 It has already been noted that the number
present and the rest forms in a series of steps before or as of transitions observed during the folding of a polypep-
it locks into its proper packing to produce the native tide usually increases in number as more and more phys-
state. Superimposed on this progression are the slow iso- ical properties are monitored during the same process,
merizations of the peptide bonds amino-terminal to pro- so observations based on only one physical property
lines, each of which has the potential to decrease the usually fail to register all of the discrete steps in the
observed rate constant of any one of these steps dramat- process of folding. It has also been proposed that the dif-
ically but only for those isomers of the polypeptide that ferences observed among the effects of denaturant on
happen to contain an incompatible isomer of proline. If the rates of exchange of different amido protons along
isomerizations of peptide bonds amino-terminal to pro- the backbone of a polypeptide (Figures 13–8 and 13–14)
lines were not or are not involved, the folding of a are evidence for the existence of a continuum of inter-
polypeptide would be or is usually complete within less mediate states in the process of folding.346 These various
than 10 s at 25 ∞C, which is remarkably fast for such a intermediate states are in preequilibrium with each
complicated process. other prior to a rate-limiting step, represent a set of rate-
Up to this point, for the sake of clarity, the transfor- determining steps, are in a steady state with each other
mations between each of the major intermediate states so that the observed first-order rate constant is actually a
encountered during the folding of a protein have been composite rate constant,244 or are related to each other in
presented as if they were uncomplicated first-order reac- some combination of these three possibilities. Because it
tions proceeding in single steps just as the bimolecular is so unlikely that any one of these apparently first-order
nucleophilic displacement of an iodide in the alkylation transformations is actually a simple single step, a discus-
of a thiolate anion (Equation 3–17) is an uncomplicated sion of the transition state associated with the observed
second-order reaction proceeding in a single step first-order rate constant for any one of these transforma-
through a single transition state. It is certainly the case, tions is meaningless.
however, that in a process as complicated as protein Another complication is the existence of multiple,
folding none of these transformations between the dena- parallel pathways in the folding of a protein. The most
tured state, major intermediate states, and the final obvious example of such a situation results from the iso-
native state proceeds in a single step. Consequently, the merization of prolines. Each geometric isomer of the
transformations only appear to be single steps, and the random coil at each proline that is required to have a par-
ticular configuration for proper folding folds along its
own distinct pathway (Figure 13–15). In the case of
* If the effect of solutes increasing the viscosity of the solution have lysozyme from G. gallus347–349 and dihydrofolate reduc-
been misinterpreted, it is also possible that hydrophobic collapse
is the unfavorable preequilibrium preceding the formation of suf- tase from E. coli,350,351 however, there are two and four
ficient secondary structure to stabilize the molten globule or parallel pathways, respectively, through which these
molten globules. foldings progress, none of which are distinguished by
Kinetics of Folding 705
differences in the isomerization of prolines. At one or can fold intramolecularly to achieve its native state.360,361
more points in the process of folding, some of the mole- There are two species of chaperones that are responsible
cules assume one state and the others assume another for most of this suppression of aggregation and protec-
state, and from there on, each of the two populations tion of proper folding. These two species are represented
proceeds through a different sequence of steps to form by chaperonin 60 (the product of the groEL gene) and
the same final native state of the protein. In the case of heat shock protein 70 (the product of the dnaK gene),
dihydrofolate reductase, the existence of parallel path- respectively, from E. coli. Chaperonins 60 have been puri-
ways may be related to the fact that there are two differ- fied from fungi,362 chloroplasts,363 mitochondria,364 and
ent conformations of the native state in equilibrium with eukaryotic cytoplasm,365 and heat shock proteins 70 are
each other that are both substantially occupied under also universally distributed. The most obvious distinction
most circumstances.352,353 between these two different species of chaperones, other
There are also kinetic dead-ends that can compli- than their unrelated amino acid sequences, is that chap-
cate the process of folding. For example, during the fold- eronins 60 are all large oligomers of 14–16 subunits
ing of equine cytochrome c from its random coil, the arranged with the respective symmetries of the point
wrong side chains can form the fifth and sixth ligands to group 722(D7) or the point group 822(D8),366–368 while heat
the covalently bound heme,354 and they must dissociate shock proteins 70 are monomers or dimers.
and the proper side chains must associate before folding The chaperone about which the most is known is
can proceed successfully.355 chaperonin 60 from E. coli. It is a homotetradecamer
One of the most consequential kinetic dead-ends is with symmetry of the point group 722(D7) enclosing a
aggregation of the folding polypeptides. When, in the large central cavity (Figure 13–16)366,369–371 that is divided
laboratory, the concentration of urea or guanidinium into two halves (upper and lower heptamers in Figure
chloride is rapidly lowered by dilution, the hydrophobic 13–16) at its middle by a thick septum formed from the
side chains on a random coil are no longer favorably sol- irregular coalescence of the 14 carboxy-terminal seg-
vated. They can either promote the desired intramolecu- ments, each 23 amino acids long, which are disordered in
lar hydrophobic collapse, or they can associate with the crystallographic molecular model.372 Each of the 14
hydrophobic side chains on other unsolvated polypep- subunits in turn is divided into three domains (arranged
tides to form an undesired intermolecular aggregate, one on top of the other and consequently difficult to dis-
which just as successfully removes them from contact tinguish in the view presented in Figure 13–16). The
with the water. When the intermolecular aggregation is apical domains are symmetrically arrayed around
reversible so that the aggregate is in equilibrium with an the central 7-fold rotational axis of symmetry to form the
unaggregated, denatured intermediate produced during entrances to the upper and lower cavities, the equatorial
some step in the process of proper folding, the aggrega- domains surround the central septum, and each of the
tion has the effect of lowering the concentration of that intermediate domains connects its apical domain to its
intermediate on the pathway to the native state and respective equatorial domain.
decreasing the rate of folding. In this situation, the Chaperonin 60 prevents proteins that aggregate
amount of intermolecular aggregate decreases as the irreversibly during their folding in its absence from doing
total concentration of protein is decreased, and the rate so in its presence.373,374 It accomplishes this task by rec-
of folding increases356 as the concentration of protein is ognizing and associating with intermediates in the
decreased. The aggregate at the dead end can also appear pathway of folding that are prone to aggregation.375 By
to be an intermediate in the folding process, forming rap- itself, chaperonin 60 forms a tight complex with such an
idly and then disappearing as the folding depletes the intermediate.376,377 If it were the only capability of chap-
solution of the intermediate in equilibrium with the eronin 60, the formation of this tight complex would
aggregate.357 interrupt the folding of a protein and prevent its comple-
When the intermolecular aggregation of the folding tion. Consequently, the bound intermediate must disso-
polypeptide is irreversible, however, it competes with ciate so that folding can proceed. If, however, the bound
proper folding, and the final yield of the native state, intermediate were to dissociate from chaperonin 60
rather than its rate of formation, is decreased accord- unchanged, it would then proceed to aggregate as it was
ingly. Often such irreversible aggregation can be mini- about to do just before it was bound. Consequently, the
mized by lowering the total concentration of structure of the bound intermediate must be altered so
protein,358,359 but in many instances the folding protein that it dissociates in a form that is not prone to aggrega-
passes through an intermediate so prone to aggregation tion. The standard free energy necessary to promote the
that very little if any native state forms at any concentra- dissociation of the bound intermediate and to change its
tion of protein. This catastrophic problem is solved structure before it dissociates is provided by the binding
within the cell by the chaperones. and hydrolysis of MgATP.
A chaperone is a protein that intercepts and sup- The structure of the form of the folding protein rec-
presses the unproductive, nonspecific, irreversible inter- ognized and bound by chaperonin 60 has not been
molecular aggregation of a folding polypeptide so that it clearly defined. In theory, chaperonin 60 should recog-
706 Folding and Assembly
the x axis puts the cap on the upper cavity of the chaperonin 60.
camer and seal the respective cavity. It is drawn so that a –135 ∞ flip around
metry coincident with the 7-fold rotational axis of symmetry of the tetrade-
upon the top of the chaperonin 60 with its 7-fold rotational axis of sym-
cal heptamer of seven identical subunits (naa = 97), is a cap that can sit
between chaperonin 60 and chaperonin 10.371 Chaperonin 10, a symmetri-
It is drawn from the crystallographic molecular model of the complex
a-carbon skeleton of chaperonin 10 is drawn below that of chaperonin 60.
structures of the upper and lower cavities are identical to each other. The
the plane of the page and run through the center of the drawing so the
seven 2-fold rotational axes of symmetry of the point group 722 (D7) are in
graphs370 that divides the central cavity into upper and lower halves. The
rial septum observed in image reconstructions from electron micro-
the central cavity and are thought to coalesce372 to form the thick equato-
mini of the crystallographic molecular model protruding into the center of
the model. They extend in the plane of the page from the 14 carboxy ter-
each subunit are disordered, unresolved, and, consequently, absent from
crystallographic molecular model, the carboxy-terminal 22 amino acids of
ent widths. Those in the lower heptamer are drawn with gray lines. In the
(naa = 547) in the upper heptamer are drawn with line segments of differ-
down the 7-fold rotational axis of symmetry. The seven identical subunits
complex with 14 molecules of MgKATP.369 The tetradecamer is viewed
is drawn from the crystallographic molecular model of the protein in a
a-carbon skeleton of the tetradecamer of chaperonin 60 (upper structure)
Figure 13–16: Chaperonin 60 and chaperonin 10 from E. coli. The
nize only intermediates in the process of folding that are
prone to aggregation and ignore intermediates that are
not, but when it is added in the absence of MgATP to
solutions of some proteins that are in the process of fold-
ing, their folding slows dramatically378,379 or ceases,380 as
if almost all or all of the states of these proteins other
than the native state are recognized and bound. With
other proteins, the rate of their folding is unaffected or
slightly accelerated upon the addition of chapero-
nin 60,381 but these are usually proteins that are not sus-
ceptible to aggregation during their folding.
Chaperonin 60 also forms complexes when added to
solutions of certain native proteins,382,383 presumably by
recognizing denatured forms in equilibrium with the
native state and shifting that equilibrium by sequestering
those denatured forms.384 Site-directed mutants in which
hydrophobic amino acids have been replaced with ala-
nine or glycine are recognized and bound by chapero-
nin 60 less successfully during their folding,378 suggesting
that it is exposed hydrophobic side chains that elicit the
attention of the chaperone.
After it has been bound by chaperonin 60, a folding
protein is usually in a molten globular state in the com-
plex.385 This conclusion follows from the fact that the
rates of the exchange of its amido protons are
rapid.380,382,383,386,387 With some proteins, unlike what is
observed in the usual molten globular state, the exposure
of all amido protons is increased to the same extent,383,388
a result suggesting that the association of the folding pro-
tein with the chaperone has disrupted all secondary
structure. With other proteins exposure of amido pro-
tons varies along the sequence,380 a result suggesting that
some secondary structure is preferentially preserved in
the bound form of the protein. With yet other proteins
almost no difference in exchange rates between the
bound form of the protein and the native state is
observed,389 a result suggesting that the bound protein is
similar in its structure to the native state. It has also been
observed that a tridecapeptide that is structureless in
solution is an a helix when bound to chaperonin 60.390
Because it is not known which conformation or set
of conformations of a folding protein is recognized and
bound by a chaperonin 60, it is possible that the struc-
ture of that folding protein once it has been bound is
identical to the structure that it had when it was recog-
nized and bound, but it is also possible that the structure
of the protein has been altered significantly during its
binding to the chaperone to produce the conformation
or set of conformations that are observed in the complex.
This alteration in structure caused by the binding itself
would be at least a portion of the alteration required so
that a form of the protein that is not prone to aggregation
can dissociate from the chaperone.
The folding protein is bound by one or more of the
apical domains that surround the entrances to each of
the two central cavities in chaperonin 60 (Figure
13–16).370,391 In some cases, the bound protein protrudes
Kinetics of Folding 707
outward from the cavity;391 in other cases, it protrudes needed before the native state is achieved.404,410 Each
into the cavity and fills it392,393 Whether the bound pro- cycle of binding and release requires that the MgATP be
tein ends up within the cavity or protruding away from hydrolyzed to regenerate the competent conformation of
it is in part a function of its size;369,394 the larger proteins the chaperone, so the presence of a folding protein in
protrude outward because they do not fit within. A fold- need of assistance elicits an ATPase activity from
ing polypeptide larger than 600 aa is too large to fit in the chaperonin 60.404
cavity.395 The dissociation of the folding protein from
When the apical domains of chaperonin 60 are chaperonin 60 that is promoted by the binding of MgATP
detached genetically and expressed separately, they is accelerated by the addition of chaperonin 10.402
may396 retain the ability to recognize and bind forms of a Chaperonin 10 is a homoheptamer of subunits 97 aa in
folding protein that are prone to aggregation.397 A site in length that can sit as a cap upon the seven apical
the crystallographic molecular model of a detached domains of chaperonin 60 and seal off the respective
apical domain to which an extended hydrophobic seg- cavity from the solution (Figure 13–16).371,393,411 Its addi-
ment of polypeptide has bound has been tentatively tion is required for the maximum yield of the native state
assigned as the site for binding of the folding protein to of some folding proteins prone to aggregation; with other
the apical domain.369,398 proteins that are prone to aggregation, its addition
The next step in the rescue of a folding protein from increases the rate of their folding.
aggregation performed by chaperonin 60 is its dissocia- When the folding protein is too large to fit in the
tion from this bound state. This dissociation requires cavity, chaperonin 10 nevertheless still increases the rate
the addition of MgATP. When MgATP is added to a com- of folding or the yield of properly folded protein or both
plex of chaperonin 60 and denatured protein, it pro- of these outcomes but does so by binding to the opposite
motes the dissociation of that protein, permitting its end of chaperonin 60 from the end at which the folding
folding to recommence.399–405 The dissociation of the protein was bound.394 When the folding protein is small
denatured protein from the complex is more rapid than enough, chaperonin 10 binds at the end to which it is
the hydrolysis of the MgATP400,404 and also occurs when associated and traps it within the respective cavity of
analogues of ATP that cannot be hydrolyzed are used chaperonin 60,408,412 where it is definitely protected from
instead of ATP,376,402,403,406 so it is the binding of MgATP intramolecular aggregation because it is alone. In the
and not its hydrolysis that promotes the dissociation. presence of chaperonin 10, a trapped folding protein,
The binding of MgATP to chaperonin 60 occurs at a site although it has been dissociated from its binding site on
located in each equatorial domain,407 while the folding chaperonin 60 by the binding of MgATP, remains associ-
protein is bound to the apical domain. These two sites ated with the complex of chaperonin 60 and chaper-
are 3.2 nm apart at their closest approach, so no direct onin 10. Only after the MgATP has been hydrolyzed is the
interaction between them is possible. Consequently, folding protein released from the complex along with the
there must be two global conformations of chaper- chaperonin 10.408,409,413,414 Because the hydrolysis of the
onin 60 in equilibrium with each other, one of which MgATP is so slow, however, this means that the complex
binds denatured protein at the apical domain strongly remains intact for, on the average, about 15 s, which is a
and MgATP at the equatorial zone weakly and the other long period of time in the folding of a protein and during
of which binds denatured protein weakly and MgATP which considerable progress towards the native state can
strongly, and the binding of MgATP must shift the equi- be achieved.
librium between these two conformations in favor of the The requirement that chaperonin 60 alter the struc-
one that binds denatured protein weakly.244,403,404,407 ture of the folding protein and release it in a form not
In the conformation of chaperonin 60 that is stabi- prone to aggregation seems to be accomplished both
lized by the binding of MgATP, an a helix in the apical before and after MgATP is bound. The alterations in the
domain that forms part of the site at which the folding structure of the folding protein before MgATP is bound
protein is thought to bind398 is rotated by 102 ∞ relative to have already been described, but it has been noted that
its orientation in the conformation of the protein that is the addition of MgATP and chaperonin 10 to the complex
the more stable in the absence of MgATP.369 This rotation between a folding protein and chaperonin 60 further
sequesters a set of hydrophobic side chains and makes increases the rates of amido proton exchange of the fold-
the site considerably less hydrophobic, perhaps promot- ing protein,415 a result suggesting that significant struc-
ing the dissociation of the folding protein. tural alterations are made to the folding protein after the
The subsequent slow hydrolysis of the MgATP binding of MgATP but before it dissociates from its bind-
(0.06 s–1 at 25 ∞C),408 at all seven of the sites in one half of ing site on chaperonin 60.
the protein,409 serves to regenerate the conformation of The chaperones of the species of heat shock pro-
the chaperone that can again recognize intermediates teins 70, for which heat shock protein 70 from E. coli is
prone to aggregation. During the folding of a protein that the paradigm, also rescue folding proteins from aggrega-
requires significant assistance to avoid aggregation, sev- tion by binding them and releasing them in associations
eral cycles of binding to chaperonin 60 and release are and dissociations that are coupled to the binding and
708 Folding and Assembly
hydrolysis of MgATP,416 but the details of the process are centration in the lumen of the endoplasmic reticulum.425
much less clear. It is the complex between heat shock This enzyme fulfills two roles in the formation of correct
protein 70 and MgATP that binds the folding protein in a cystines during the folding of a protein. It breaks cystines
rapidly reversible equilibrium, and upon hydrolysis of to reverse their incorrect formation, and it oxidatively
the MgATP, the folding protein is locked onto the chap- couples pairs of adjacent cysteines to form cystines.427
erone.417–419 The binding of the next molecule of MgATP Consequently, it catalyzes the rapid rearrangement of
rapidly releases the bound protein into the solution, pre- cystines in a folding protein until the correct partners are
sumably in an altered conformation. In a crystallo- joined to produce the native state of the protein. In fact,
graphic molecular model of the complex between heat there are some proteins, for example mammalian pan-
shock protein 70 and a peptide thought to mimic the creatic human insulin-like growth factor428 and bovine
bound folding protein, the peptide is bound in an ribonuclease A,61 that cannot fold stably unless all of
unstructured extended conformation.420 Heat shock pro- their cystines have formed correctly, so their folding is
tein 40 increases the rate at which the ATPase recycles required to proceed in tandem with the rearrangement
heat shock protein 70 among its various conforma- of their cystines by protein disulfide-isomerase until the
tions,419 as does the chaperonin 10 with chaperonin 60. combination of the correct tertiary structure and the cor-
There is, however, no cavity in any of these proteins sim- rect pairing of cysteines as cystines is reached. It is both
ilar to that in the tetradecamer of chaperonin 60. the packing of the tertiary structure and the properly
As a polypeptide folds to its native state in the cyto- paired cystines that create their native states.
plasm of a cell, the high concentration of reduced glu- Protein disulfide-isomerase (490 aa) contains two
tathione (3–5), or some other mercaptan with the same domains (amino acids 5–100 and 350–440) that are homol-
function,421 prevents its cysteines from forming adventi- ogous to each other and to thioredoxin (110 aa).429 As is
tious cystines, and for the same reason, the cysteines of the case with thioredoxin, each domain contains a pair of
the native protein remain reduced throughout its life- cysteines in the sequence –VEFYAPWCGHCK–. It is these
time. Most of the proteins, however, that are excreted pairs of cysteines found in protein disulfide-isomerase
from a cell into the extracellular spaces contain cystines. that are responsible for the catalysis of the rearrangement
In a eukaryotic cell, these cystines are formed as these and formation of cystines in a folding protein. Two shorter
soon to be excreted proteins fold within the lumen of the proteins, thiol:disulfide interchange protein DsbA
endoplasmic reticulum. In the lumen of the endoplasmic (218 aa) and thiol:disulfide interchange protein DsbC
reticulum, the ratio of oxidized glutathione to reduced (216 aa), each with only one pair of cysteines in the
glutathione is much higher (0.5)422 than it is in the cyto- sequences –LEFFSFFCPHCY– and –TVFTDITCGYCH–,
plasm (0.02). The native conformation of the protein jux- respectively, fulfill the roles of protein disulfide-isomerase
taposes the two cysteines that will form a correctly paired for proteins excreted from bacteria. The former can form
cystine.423 This juxtaposition increases the equilibrium cystines, but the latter is the enzyme responsible for the
constant for the formation of that cystine dramatically,424 shuffling of incorrectly formed cystines.430–433
so the ambient ratio of oxidized to reduced glutathione The two domains in each molecule of protein disul-
in the endoplasmic reticulum is sufficient to form a cys- fide-isomerase, which each contain an identical pair of
tine from the two cysteines once they have been juxta- cysteines, are similar in their catalytic abilities and act
posed:423,425 independently.425 The second cysteine in each of these
pairs of cysteines is responsible for breaking and rear-
HS SH S S ranging cystines by disulfide interchange during the fold-
–CA CB–
+ GSSG 1 –CA CB–
+ 2 GSH ing of a protein:
(13–31) HS SH S S HS S S SH
–CGHC–
+
–CA CB–
1 –CGHC– –CA CB–
where CA and CB are the two cysteines juxtaposed, GSSG
1
1
merase by disulfide interchange with oxidized glu- In order to perform both the rearrangement of
tathione cystines and their formation most efficiently during its
catalysis of the folding of a protein (Equation 13–32), pro-
HS SH S S
–CGHC–
+ GSSG 1 –CGHC–
+ 2 GSH tein disulfide-isomerase must be poised at a level of
reduction potential where only a portion of its cysteines
(13–33) are cystines, and in the laboratory, the optimal potential
for the catalysis of the folding of a protein is reached at a
is able to convert a pair of juxtaposed cysteines in a fold- ratio of oxidized glutathione to reduced glutathione of
ing protein into a cystine (reverse of the reactions on the 0.2–0.5.427 This ratio is not significantly different from the
lower right and top of Equation 13–32).425 Protein disul- ratio found in the endoplasmic reticulum.
fide-isomerase also catalyzes the formation of a mixed The proteins the foldings of which have been dis-
disulfide between glutathione and a cysteine on a folding cussed so far have been fairly small and in each case the
protein,434 which is also a central intermediate435 in the entire protein has folded as a unit to produce the native
formation of cystines by oxidized glutathione (Equation state. The folding of a larger protein is usually compli-
13–31).425 cated not only by the larger number of prolines it con-
Because protein disulfide-isomerase contains one tains but also by the fact that larger proteins usually
pair of cysteines in each of its two domains, and because contain domains, which often fold independently of each
most folding proteins have enough cysteines to give rise other. For example, lysozyme, itself not a large protein,
to two or more cystines while they are folding, at the nevertheless contains two structural domains. On the
proper molar ratios protein disulfide-isomerase and a basis of measurements of amido proton exchange, it was
folding protein form a precipitate,436 just as an antigen demonstrated that one of these domains folds more rap-
and a set of polyclonal immunoglobulins form a precipi- idly than the other,441 and the completion of the foldings
tate at equivalence. The existence of this precipitate of the two domains is followed by a step in which they
serves to demonstrate the importance of the mixed disul- become properly oriented and associate correctly with
fide in Equation 13–32 in the reactions catalyzed by pro- each other to form the native state.349 The rates of the final
tein disulfide-isomerase. slow steps in the foldings of both aspartate
In order to participate in any of these reactions, in kinase–homoserine dehydrogenase from E. coli359 and
442
particular the formation of the mixed disulfide between D-octopine dehydrogenase from Pecten jacobaeus are
it and the folding protein, protein disulfide-isomerase inversely proportional to the viscosity of the solvent and
must be able to find the cystine or the cysteine on the are thought to represent the association of independently
folding protein. This necessity requires both that the fold- folding domains. The transfer of energy by resonance
ing protein be molten enough to expose unpaired cys- between donors and acceptors positioned at several loca-
teines or mispaired cystines on its surface and that the tions on phosphoglycerate kinase from S. cerevisiae has
reaction between protein disulfide-isomerase and those been used to follow the changes in the distances between
exposed cysteines and cystines be rapid and efficient.61,423 these locations during the unfolding of the protein.443 To
In a role that may be connected to this search for incor- the extent that unfolding is the reverse of folding, the fact
rect cystines, protein disulfide-isomerase also acts as a that the first step in the unfolding of the protein is the dis-
chaperone.437 Once the proper cystines are formed, how- sociation of its two domains suggests that the last step in
ever, they can be buried without penalty. It is probably its folding is the association of these two domains.
the burial of the correctly paired cystines within the The results of these experiments suggest that, in
proper native structure that terminates the rearrange- larger proteins, the individual domains fold independ-
ments of the cystines catalyzed by protein disulfide-iso- ently as if they were the small unitary proteins that have
merase. The same problem of accessibility is faced by been discussed in detail until there is sufficient structure
peptidylprolyl isomerase, and it is interesting that protein developed for them to recognize each other and associ-
disulfide-isomerase and peptidylprolyl isomerase func- ate. Following their association, the information devel-
tion synergistically to catalyze the folding of a protein.438 oped during this association may or may not dictate
The histidine between the two cysteines in each further folding to reach the final native state depending
domain of protein disulfide-isomerase, as opposed to a on the intimacy and interdependence of their interaction.
proline at the homologous position in thioredoxin,
causes the cystine in the former protein to have a reduc- Suggested Reading
tion potential 30–40 mV more positive than that in the
Jennings, P.A., & Wright, P.E. (1993) Formation of a molten globule
latter.439,440 This higher reduction potential causes a cys- intermediate early in the kinetic folding pathway of apomyoglo-
tine in protein disulfide-isomerase to be more effective at bin, Science 262, 892–896.
forming cystines in folding proteins by disulfide inter- Raschke, T.M., & Marqusee, S. (1997) The kinetic folding interme-
change than a cystine in thioredoxin would be. The diate of ribonuclease H resembles the acid molten globule and
homologous cystine in thiol:disulfide interchange pro- partially unfolded molecules detected under native conditions,
tein DsbA from E. coli is also exceptionally reactive.431 Nat. Struct. Biol. 4, 298–304.
710 Folding and Assembly
Mayr, L.M., Odefey, C., Schutkowski, M., & Schmid, F.X. (1996) intermediate states in the absence of guani-
Kinetic analysis of the unfolding and refolding of ribonuclease dinium chloride.
T1 by a stopped-flow double-mixing technique, Biochemistry
35, 5550–5561. (H) Is the isomerization of peptide bonds amino-ter-
minal to prolines involved in the folding of human
lysozyme? How did you decide?
Problem 13–5: The figure displays the behavior of the (I) How does the value of kU change as the concen-
observed rate constants, kobs, in units of seconds–1 for tration of guanidinium is increased up to 5 M?
folding and unfolding of human lysozyme. The rate con-
stants for folding were obtained by rapidly diluting the (J) In the transition region in which the equilibrium
unfolded polypeptide from a solution of 4.5 M guani- constant for folding can be measured, what are
dinium chloride to the noted final concentration; and the relative values of the rate constants kF and kU?
those for refolding, by rapidly mixing the native protein
with a solution of guanidinium chloride to produce the Problem 13–6: Using the notation of Equations 13–31
noted final concentration. through 13–33, write a set of equations that describes the
possible ways that protein disulfide-isomerase could cat-
104 alyze the formation of the mixed disulfide between glu-
tathione and a cysteine on a folding protein.
103
Assembly of Oligomeric Proteins
102 When oligomeric proteins are dissolved in solutions of
(s–1)
100
[q ] (deg cm2 dmol–1 x 10–4)
(%)
Figure 13–17: Assembly of yeast phosphoglycerate mutase follow-
ing dilution from 4 to 0.1 M guanidinium chloride.445 (A) Far-ultra- 75
Relative amounts
violet circular dichroic spectra of native enzyme (2), native enzyme
in 0.1 M guanidinium chloride (䉭), and enzyme in 4 M guani-
dinium chloride (3); all measurements were made at a concentra- 50
tion of protein of 1.7 mg mL–1. Molar ellipticity ([q]) in units of
degree centimeter2 (decimole of peptide bonds)–1 is presented as a
function of wavelength (nanometers). (B) Regain of molar elliptic-
ity at 225 nm upon dilution from 4 to 0.1 M guanidinium chloride.
25
Ellipticity is presented as a function of time (minutes) in relative
units where 0% is the molar ellipticity of fully unfolded protein and
100% is the molar ellipticity of the native protein (see panel A). 0
(C–E) Assembly of the oligomer. At the noted times after initiation E
of folding by dilution from 4 to 0.1 M guanidinium chloride, sam-
ples were removed and cross-linked quantitatively with 1% glu- 75
taraldehyde for 2 min, and the complexes between the resulting
covalent oligomers and dodecyl sulfate were submitted to elec-
trophoresis on gels of polyacrylamide in the presence of dodecyl 50
sulfate. The amounts of monomer (2), dimer (䉭), and tetramer (Í)
were assessed by scanning the stained gels for absorbance. The rel-
ative amounts of monomer, dimer, and tetramer (as a percentage 25
of the sum of the three amounts) are plotted as a function of the
time (hours) between dilution of the guanidinium chloride and the
addition of the glutaraldehyde. The final concentrations of protein
were (C) 11, (D) 21, and (E) 37 mg mL–1. The solid curves were drawn
0
0 0.25 0.5 0.75 1.0 24
in all three panels with integrated rate equations based on
Equation 13–34 with k1 = 6.25 ¥ 103 M–1 s–1, k–1 = 6.0 ¥ 10–3 s–1, and Time (h)
k2 = 2.75 ¥ 104 M–1 s–1. The temperature for all of these experiments
was 20 ∞C, and they were run at pH 7.5. Reprinted with permission
from ref 445. Copyright 1983 Journal of Biological Chemistry.
with full enzymatic activity at these concentrations of k1 = 0.0006 s–1 and k2 = 30,000 M–1 s–1. The value of kF is
protein (<50 mg mL–1).446 In this oligomerization, the too small to be the rate constant for the rapid refolding of
association of two dimers to form the tetramer is the a polypeptide with the correct proline isomers to form a
rate-limiting step in the reaction (Equation 13–34), but in structure with no domains. The rate is probably slow
the oligomerization of the tetramerization domain (naa = because isomerizations of peptide bonds amino-termi-
30) of human cellular tumor antigen p53, it is the associ- nal to prolines are required or because the two structural
ation of two monomers to form a dimer that is the rate- domains of the monomer observed in the crystallo-
limiting step, and the association of the two dimers to graphic molecular model of the protein449 associate
form the tetramer is so rapid that it is kinetically silent.447 slowly or because both of these problems must be over-
The assembly of a dimer from its dissociated come before the monomer has regained sufficient native
random coils is an even simpler reaction. Porcine mito- structure to recognize another monomer and dimerize
chondrial malate dehydrogenase is an a2 dimer that can with it.
be reversibly unfolded in several different ways. After In the case of the reassembly of the dimer of aspar-
random coils, aU, of the a polypeptide are transferred to tate transaminase from E. coli, there are two consecutive,
a solution at neutral pH, coincident with the dilution of slow, first-order, unimolecular steps which produce a
the denaturant, the reappearance of enzymatic activity molten globular monomer that is enzymatically inactive
shows the same time course regardless of the mode of the but that has sufficient native structure to dimerize. This
original denaturation (Figure 13–18).448 The time course monomer displays a circular dichroic spectrum in the far
displays two phases, a lag followed by an increase. The ultraviolet similar to that of the native protein. Its dimer-
increase in activity has a second-order dependence on ization and the formation of the final enzymatically
the concentration of protein. The lag is unaffected by the active native state are rapid, and because these later
concentration of protein and is a first-order process. The steps follow the slow rate-limiting formation of the
results can be explained quantitatively with the following molten globular monomer, they are kinetically silent.450
mechanism: In the folding and assembly of the dimer of steroid D-iso-
merase from Pseudomonas testosteroni, however, all
kF k2 three steps, the unimolecular formation of an enzymati-
2 a U Æ 2a F Æ (a F)2 (13–35)
cally inactive monomeric intermediate (60 s–1 at 25 ∞C),
its bimolecular association (60,000 M–1 s–1 at 25 ∞C), and
if only the dimer, (aF)2, and not the folded monomer, the formation of the final enzymatically active native
aF, is enzymatically active. At pH 7.6 and 20 ∞C, structure (0.017 s–1 at 25 ∞C) could be resolved kineti-
cally.451 These experiments demonstrate that a
100 monomeric intermediate does not have to assume its
(%)
an unstable monomeric state that is competent to dimer- this structure is loosely folded. For example, it may be
ize. This unstable monomer is then captured and stabi- sensitive to endopeptidolytic cleavage. This globular
lized by the favorable dimerization. monomer either combines directly with other globular
The assembly of a trimer (Figure 9–11) is somewhat monomers to form the oligomer or it undergoes one or
more complicated than that of either a tetramer or a more isomerizations before it is competent to assemble.
dimer because the addition of the third subunit to the These isomerizations may be isomerizations of peptide
dimer is quite different from the initial combination of bonds amino-terminal to critical prolines or rearrange-
two monomers to form the dimer. The catalytic subunit ments of domains, but these possibilities have not been
of aspartate carbamoyltransferase (Figure 9–37) is an validated. The competent monomers then assemble in
a3 trimer. The assembly of trimers of the catalytic subunit simple, reasonable bimolecular steps to form the enzy-
from random coils of the a polypeptide is a first-order matically active oligomer. When the reactions can be
process with no evident intermediates and a rate con- observed, the rate constants measured for these bimole-
stant of 2 ¥ 10–4 s–1 at 0 ∞C.454 It seems that again a slow cular steps are between 104 and 106 M–1 s–1 at
isomerization of the partially folded, monomeric 25 ∞C,445,448,455 several orders of magnitude below diffu-
a polypeptide is the rate-limiting step in the assembly sion-controlled rates for the collision of molecules of this
from random coils. To circumvent the barrier presented size. Therefore, they proceed with significant standard
by this isomerization to the kinetic observation of inter- free energies of activation.
mediates in the process, native a3 trimer was dissociated Whether or not enzymatic activity is displayed by
into globular rather than unfolded a monomers with the various compact intermediates in this process seems
thiocyanate ion (S=C=N–),455 which is a milder denatu- to be a property of the individual protein. Both
rant than either urea or guanidinium ion. It is an anion a monomer and a2 dimer of phosphoglycerate mutase
that salts in protein as does urea or guanidinium but not have enzymatic activity.446 Fumarate hydratase, an
so vigorously. The enzymatically inactive a monomers (a2)2 tetramer, can be denatured to random coils and
that result retain most of the circular dichroic ellipticity reassembled to an a2 dimer. The a2 dimer is enzymati-
and ultraviolet absorption of the native a3 trimers and cally inactive until it assembles to the (a2)2 tetramer.456
have a frictional ratio f/f0 of 1.27.455 These globular Porphobilinogen synthase from Pisum sativum, an
a monomers assemble readily to form a3 trimers after [(a2)2]2 octamer, can be disassembled to a2 dimers by
the dilution of the thiocyanate. dilution. Only the octamer and the (a2)2 tetramer are
When the assembly was followed by quantitative enzymatically active.457 The single active site of HIV-1
cross-linking, a monomers turned directly into retropepsin from human immunodeficiency virus type 1
a3 trimers with no evidence for the formation of any is formed from both subunits of the a2 dimer, so it is not
a2 dimers (<3%). The appearance of a3 trimers was coin- surprising that the a monomer has no enzymatic activ-
cident with the return of enzymatic activity. Both of these ity.458 When fructose-bisphosphate aldolase, an
processes, however, were strictly second-order in the (a2)2 tetramer, is denatured to random coils that are then
concentration of a monomer:455 transferred to a solution at pH 5.5, the random coils fold
to form a monomers that have the sedimentation coeffi-
d[a 3 ] cient of a globular protein of their length and the circular
= k obs[ a ]2 (13–36) dichroic spectra and ultraviolet spectra of the native pro-
dt tein. Their enzymatic activity cannot be determined
because these a monomers oligomerize too rapidly to
A mechanism consistent with both of these results is (a2)2 tetramers when mixed with substrates,459 but they
must bind those substrates for their assembly to be
k1 k2 affected by them.
3a Æ a + a 2 Æ a 3 (13–37)
The assembly of heterooligomers constructed
from several copies of each of two or more different
where k2 >> k1. When the third monomer adds to the polypeptides is somewhat more complex than that of
dimer, two interfaces form simultaneously, and this reac- homooligomers. When the assembly of a heterooligomer
tion could have a much lower standard free energy of is studied, the reactants employed are the globular,
activation than the formation of the dimer itself. Because homooligomeric subunits, such as catalytic subunits and
the second step in Equation 13–37 is so much faster than regulatory subunits of aspartate carbamoyltransferase
the first, no a2 dimer accumulates. The first step in (Figure 9–37). It is generally assumed, in the absence of
Equation 13–37, however, a bimolecular reaction, is the any evidence, that under physiological circumstances
rate-limiting step. the homooligomeric subunits assemble first and then
When homooligomeric proteins are assembled combine to form the heterooligomer. Only those pro-
from random coils, the observations are consistent with teins formed from two or more polypeptides translated
the first step in the process being the folding of the from different messenger RNAs are of interest. Proteins
random coil to a globular structure. In many instances, containing different polypeptides arising from the post-
714 Folding and Assembly
translational cleavage of identical larger polypeptides the conformers ab2* or ab2** are too rapid to be isomeriza-
fold and assemble as simple homooligomers before the tions of prolines. They presumably represent rearrange-
posttranslational modification occurs. ments of the structures after the association of the a
Alkanal monooxygenase (FMN-linked) from Vibrio and b subunits and are similar to the changes that permit
harveyi is an ab heterodimer in which the a subunit the tetramer of phosphoglycerate mutase to resist
(naa = 355) and the b subunit (naa = 324) are homologous endopeptidolytic degradation or that permit enzymati-
and superposable, but neither of these subunits will form cally inactive subunits to regain full enzymatic activity
a homodimer. When they are expressed separately, each after oligomers such as malate dehydrogenase, aspartate
is a globular monomer containing about 50–60% of the transaminase, and fumarate hydratase reach the native
a helix of a monomer in the native heterodimer but no stoichiometry. The equilibrium constant (Kis = kf/kb) for
discernible tertiary structure as judged from their the isomerization following the addition of the first
nuclear magnetic resonance spectra. These properties a monomer (Kis = 6000) is the same as that following the
are consistent with these two monomers being molten addition of the second a monomer (Kis¢ = 8000), indicat-
globules. Nevertheless, when they are mixed together, ing that the same local adjustments are occurring after
these a monomers and b monomers heterodimerize and each a monomer adds in turn.
form native alkanal monooxygenase.460 Either the two Aspartate carbamoyltransferase is constructed
molten globular forms dimerize and then assume their from two catalytic C subunits that are a3 trimers and
native states while in the dimer, or the fully native states three regulatory R subunits that are b2 dimers. From its
of the a monomer and the b monomer are present in the crystallographic molecular model (Figure 9–37), it is clear
separate solutions of each at indetectably low equilib- that only certain steps are possible in its assembly from
rium concentrations, and it is these forms that dimerize separated C subunits and R subunits (Figure 13–19).463
while the dimerization pulls the equilibria in the direc- The intermediates that appear during the assembly of
tion of the native states. the intact (a3)2(b2)3 heterododecamer (C2R3 heteropen-
Tryptophan synthase from E. coli is another simple tamer) have been followed by quenching the assembly of
example of the assembly of a heterooligomer.461 This pro- radioactive catalytic or regulatory subunits and their
tein is an ab2a heterotetramer.462 When it is dissociated unradioactive complements with large excesses of unra-
into its components, the products are a monomers and dioactive catalytic or succinylated catalytic subunits,
b2 dimers, and both can be obtained in globular, folded respectively.463 The specific radioactivity of the various
states. When a monomer is mixed with excess b2 dimer, mosaic oligomers, which, because of the excess negative
the major product is the complex ab2*. It forms in a reac-
tion the kinetics of which are consistent with the mecha-
nism C
C 2R
R
kR f k
a + b2 1 ab 2 1 a 2 b 2* (13–38) CR
C2R2
R
kD kb
C
where the complex ab2* is an isomerized form of the ini- R R R C
C CR CR2 CR3 C2R3
tial intermediate ab2. The rate constants kR, kD, kf, and kb
for this process at 25 ∞C are 1 ¥ 106 M–1 s–1, 3 s–1, 6 s–1, and
0.001 s–1, respectively. When excess a monomer is then
added to the ab2* complex, the next step in the assembly
has kinetic behavior consistent with the mechanism Figure 13–19: Intermediates in the assembly of aspartate car-
bamoyltransferase.463 Catalytic (C) or regulatory (R) subunits that
k AR had been made radioactive by iodinating their tyrosines with 125I
k Af
(Reaction 10–33) were mixed with excess unradioactive R or
a + ab 2* 1 a2 b 2* 1 a 2 b 2** (13–39) C subunits, respectively, to initiate assembly. At various times, the
k AD k Ab
assembly reaction was quenched with either excess C subunit or
excess succinylated C subunit to cap off partially formed com-
and the rate constants kR¢, kD¢, kf¢, and kb¢ for this process plexes and scavenge all unreacted R subunit. From examining the
specific radioactivity of complexes separated by electrophoresis,
at 25 ∞C are 1.6 ¥ 106 M–1 s–1, 26 s–1, 16 s–1, and 0.002 s–1, an estimate of the relative concentrations of all of the intermedi-
respectively. ates in the process of assembly at each time point could be made.
Each time an interface forms between an The changes in these relative concentrations with time were used
a monomer and one of the two b monomers in the to formulate the assembly diagram displayed in the figure. Four of
b2 dimer of tryptophan synthase, an isomerization of the steps in this process are rapidly reversible: C + R 1 CR, CR + R
the structure of either the participating b monomer or
1 CR2, CR2 + R 1 CR3, and CR + C 1 C2R. In contrast, processes
forming the complexes C2R2 and C2R3 are essentially irreversible
the conjoined a monomer, or both, occurs, producing the because these complexes are so stable. Reprinted with permission
asterisked conformer. The isomerizations producing from ref 463. Copyright 1980 Journal of Biological Chemistry.
Assembly of Oligomeric Proteins 715
charge (Equation 10–27) on the succinylated C subunits, (acetyl-transferring) and g2 dimers of dihydrolipoyl dehy-
can be separated from each other by electrophoresis, drogenase are added together to the dihydrolipoyllysine-
permits the concentrations of the various intermediates residue acetyltransferase core, substoichiometric
at the time the reaction was quenched to be calculated. amounts of each are bound,469 presumably because of
When a limiting concentration of C subunit is steric crowding. Certainly the native protein, which has
mixed with various excesses of R subunit, equilibrium an average of about 12 g2 dimers of dihydrolipoyl dehy-
mixtures of the intermediates CR, CR2, and CR3 are drogenase and an average of somewhat less than 24
formed. Subsequent addition of excess C subunit causes b2 dimers of pyruvate dehydrogenase (acetyl-transfer-
CR3 to be trapped as CR3C, the intact native protein, and ring),467 appears to be a crowded structure (Figure
CR2 to be trapped as CR2C.463 When excess C subunit is 13–20).470 When a preformed complex containing an
mixed with a limiting concentration of R subunit, the average of 12 g2 dimers of dihydrolipoyl dehydrogenase
only two products other than unreacted C subunit are for each a24 oligomer of dihydrolipoyllysine-residue
CR3C and CR2C, with the former in the majority.464,465 The acetyltransferase is mixed with increasing amounts of
complex CR2C can be isolated as a stable protein. When pyruvate dehydrogenase (acetyl-transferring), about 22
it is combined with R subunit, it produces CR3C in a clean b2 dimers of pyruvate dehydrogenase (acetyl-transfer-
bimolecular reaction.464 In these experiments, most of ring) bind to the a24 oligomers at saturation, and the
the intermediates in the general scheme (Figure 13–19) overall enzymatic activity increases in direct proportion
have been directly observed, and rate constants and to the number bound.467 All of these results suggest that
equilibrium constants for their interconversion have b2 dimers of pyruvate dehydrogenase (acetyl-transfer-
been established.466 Most of the steps in the scheme ring) and g2 dimers of dihydrolipoyl dehydrogenase add
seem to occur simultaneously, and different pathways at random to the respective faces on the a24 oligomer of
become more or less important as concentrations of the dihydrolipoyllysine-residue acetyltransferase, at least
subunits are changed. under the circumstances of these experiments, until
The pyruvate dehydrogenase complex of E. coli is there is no more room left around the core. What is not
composed of three different polypeptide chains, a, b, and clear is whether the dimers of dihydrolipoyl dehydroge-
g. The protein can be resolved into these three inde- nase and pyruvate dehydrogenase (acetyl-transferring)
pendent components. These are the dihydrolipoyllysine- add at random to the core during normal assembly
residue acetyltransferase core, the pyruvate within the cell until no more can fit or there is some
dehydrogenase (acetyl-transferring) subunits, and the ordered sequence that determines the final stoichiome-
dihydrolipoyl dehydrogenase subunits. The dihy- try.
drolipoyllysine-residue acetyltransferase core is an The 30S subunit of a ribosome from E. coli (Figure
octahedral a24 oligomer (Figure 9–23), pyruvate dehydro- 11–5) is composed of a single strand of 16S ribosomal
genase (acetyl-transferring) is a b2 dimer, and dihy- RNA (nnuc = 1541)471 and 21 polypeptides that, when
drolipoyl dehydrogenase is a g2 dimer. No association
can be detected between the b2 dimers of pyruvate dehy-
drogenase (acetyl-transferring) and the g2 dimers of dihy-
drolipoyl dehydrogenase.467 Therefore, the a24 oligomer
of the dihydrolipoyllysine-residue acetyltransferase
serves as the point of attachment of the other compo-
nents.
Unlike the closely related dihydrolipoyllysine-
residue succinyltransferase from the 2-oxoglutarate
dehydrogenase complex, which can associate with only
six b2 dimers of oxoglutarate dehydrogenase (succinyl-
transferring) at saturation because of a steric effect,468 the
empty a24 oligomer of the dihydrolipoyllysine-residue
acetyltransferase from E. coli can associate with up to 24
b2 dimers of pyruvate dehydrogenase (acetyl-transfer- Figure 13–20: Electron micrographs of (A) the pyruvate dehydro-
ring).467,469 Presumably, in the saturated complex, one of genase complex from E. coli and (B) the core of dihydrolipoylly-
sine-residue acetyltransferase from the same protein.470 Both
the two faces on each of the 24 b2 dimers occupies one of specimens were adsorbed onto a thin, supported layer of amor-
the 24 equivalent faces of the octahedral a24 oligomer phous carbon on an electron microscopic grid and negatively
with no steric hindrance. The empty a24 oligomer of the stained with sodium methylphosphotungstate. Magnification is
dihydrolipoyllysine-residue acetyltransferase can also 300000¥. The complete complex was purified directly from a
associate with as many as 20 g2 dimers of dihydrolipoyl homogenate of the bacteria; the acetyltransferase core was pre-
pared from the complete complex by stripping away dihydrolipoyl
dehydrogenase in the absence of pyruvate dehydroge- dehydrogenase and pyruvate dehydrogenase (acetyl-transferring).
nase (acetyl-transferring).469 Reprinted with permission from ref 470. Copyright 1971 Cold
When both b2 dimers of pyruvate dehydrogenase Spring Harbor Laboratory.
716 Folding and Assembly
folded and assembled, constitute its 21 subunits. When 50S ribosomal subunit of E. coli from 23S ribosomal RNA,
the ribosomal RNA and the separated individual 5S ribosomal RNA, and 31 polypeptides.474
polypeptides are mixed together, they spontaneously From the crystallographic molecular model of the
reassemble in high yield to form 30S subunits that are 30S subunit,475 some inferences can be drawn to explain
fully competent to participate in protein biosynthesis.472 the order in which its subunits are incorporated into the
The assembly of the intact 30S subunit from its compo- assembling particle (Figure 13–21). None of the polypep-
nents (Figure 13–21)473 proceeds through an explicit tides except S4, S8, S17, and S15 can add until other sub-
sequence of steps beginning with the binding of a few of units have been incorporated. Subunits S6 and S18 form
the subunits to the 16S ribosomal RNA itself. As the an intimate complex in one corner of the complete
assembly progresses, the binding to the 16S ribosomal 30S subunit, and subunits S10, S14, and S3 form an inti-
RNA of the subunits earlier in the sequence of events or mate complex in another corner. These close associa-
the binding of polypeptides to complexes between the tions explain the interdependences between the
16S ribosomal RNA and other polypeptides creates sites additions of the polypeptides of these subunits during
to which subunits later in the sequence of events can assembly. Most of the subunits of the intact 30S subunit,
attach (Figure 13–21). If a polypeptide is added to the however, have little if any contact with each other in the
mixture before all of the subunits that must precede it final particle.
have been incorporated, it will not bind to the partially The last polypeptides to add to the assembling par-
assembled 30S subunit. An assembly map, necessarily of ticle, S3, S10, S14, S11, S18, S5, S12, and S9 (Figure 13–21),
greater complexity but describing a similar hierarchically all form contacts with at least one and as many as five
ordered process, has been drawn for the assembly of the double helices of RNA also contacted by the subunit or
subunits that must precede them onto the particle. These
relationships suggest that the preceding subunit or sub-
units control the orientations of these double helices so
16S RNA
that the site for the binding of the following subunit
among these double helices is either created or stabilized.
The earlier subunits to add to the assembling parti-
S4 S8 cle, however, subunits S19, S13, S7, and S20, do not share
either direct contacts or contacts with double helices in
common with the subunits that must precede them. This
S20 fact suggests that global conformational changes of the
S15 S7 16S ribosomal RNA are effected by the earliest polypep-
tides to add, namely those of subunits S4, S8, S20, and S7,
S17
S16 to create the distant sites for the polypeptides of subunits
S13
S19, S13, S7, and S20.
S9 S19 Not only does the structure of the 16S ribosomal
RNA seem to adjust upon the association of the individ-
S12 ual subunits but also the conformations of the separated
S5 subunits seem to adjust, sometimes dramatically, upon
S14
their association. One polypeptide that seems to be
S18 S6
almost structureless before it associates with the 16S
S10
ribosomal RNA is polypeptide S4 (naa = 203). The nuclear
S11 magnetic resonance spectrum of polypeptide S4 under
S3 the conditions in which assembly takes place is almost
S2 indistinguishable from its spectrum in 8 M urea, which is
the spectrum of the sum of the amino acids from which
S21
it is composed.476 This result indicates that, when alone
Figure 13–21: Assembly diagram for the 30S subunit of the ribo- in solution, polypeptide S4 cannot assume a unique
some from E. coli.473 The sequence of events was determined by native state. The circular dichroic spectrum476 and fric-
mixing, in various combinations, the 21 purified polypeptides with tional ratio (f/f0 = 1.7), however, are not those of a fully
the purified 16S RNA and assaying for formation of a complex or
complexes among the components. For example, only polypep- random coil (f/f0 = 2.4 for naa = 245),477 and an explana-
tides S15, S17, S4, and S8 would bind alone to 16S ribosomal RNA. tion of these results and those from nuclear magnetic
Polypeptide S20 will form a complex with 16S RNA only when sub- resonance spectroscopy would be that the polypeptide in
units S4 and S8 have been incorporated. Polypeptide S13 binds to solution is rapidly passing through an array of loosely
16S RNA only when subunits S4, S8, and S20 have been incorpo- folded conformations, none of which is unique. When it
rated, and so forth. Upon binding each polypeptide becomes a sub-
unit of the assembling 30S ribosomal subunit. Reprinted with is bound to the 16S ribosomal RNA, the subunit S4
permission from ref 473. Copyright 1974 Journal of Biological assumes a defined structure with seven a helices and
Chemistry.
Assembly of Helical Polymeric Proteins 717
A
a Helical coiled coil
b COO– CHO
S–S
g COO–
Ring
GHRP GPRP
CHO
S–S CHO
S–S Ring GHRP
GPRP Ring Central
Endopeptidolytic domain CHO g COO–
b COO–
Terminal domain cleavage
Terminal domain
B
Æ
Æ
Dimer
Monomer
Æ
Æ
Æ
Polymer
Assembly of Helical Polymeric Proteins 719
Figure 13–22: Assembly of fibrin from fibrinogen. (A) Skeletal has the amino-terminal sequence GHRP–. A synthetic
drawing in stereo of the polypeptide backbones of the subunits in peptide of the sequence GPRP can inhibit completely the
the crystallographic molecular model of fibrinogen from polymerization of fibrin monomers to fibrin polymer.484
G. gallus.481 The molecule is an (abg)2 heterohexamer with a rota-
tional axis of symmetry normal to the plane of the page passing