You are on page 1of 8

230

Definitions of enzyme function for the structural genomics era


Patricia C Babbitt

Questions are being asked about how enzyme function is This review focuses on some of the problems associated
described at the molecular level and the strengths and with describing the molecular functions of enzymes and
weaknesses of the EC system for this purpose. A new approach relating those descriptions to sequence and structural
to describing enzyme function has been proposed that might information in a way that is useful for functional infer-
improve our capabilities for functional inference for members of ence. We describe recent large-scale attempts to correlate
enzyme superfamilies. sequence and structural information with enzyme func-
tion and cite a few examples of individual enzyme super-
Addresses families whose study has provided special insight into the
Departments of Biopharmaceutical Sciences and Pharmaceutical problem. In the context of these observations, the current
Chemistry, University of California, 513 Parnassus Street, San Francisco, system for describing enzyme function, the Enzyme
CA, 94143-0446, USA
e-mail: babbitt@cgl.ucsf.edu
Commission (EC) system, is evaluated. Finally, a new
approach to describing enzyme function is proposed that
may be useful for improving our capabilities for functional
Current Opinion in Chemical Biology 2003, 7:230–237 inference at the molecular level.
This review comes from a themed section on
Biocatalysis and biotransformation Evaluation of the EC system for describing
Edited by Tadhg Begley and Ming-Daw Tsai enzyme function
In the EC system, enzymes are named according to the
1367-5931/03/$ – see front matter
ß 2003 Elsevier Science Ltd. All rights reserved.
overall transformations they perform. Each enzyme name
is associated with a four digit numerical code describing
DOI 10.1016/S1367-5931(03)00028-0 each distinct transformation [10]. The Nomenclature
Committee of the International Union of Biochemistry
Abbreviations and Molecular Biology (IUBMB) updates the system
CMLE carboxymuconate lactonizing enzyme regularly using specific rules that have been developed
EC Enzyme Commission for naming enzymes and assigning them EC numbers (see
MLE muconate lactonizing enzyme http://www.chem.qmul.ac.uk/iubmb/enzyme/). The EC
PDB Protein Databank
system now describes most of the enzymes known,
although as increasingly large numbers of new enzymes
Introduction are discovered in the genome projects, the number of
The ability to predict functions for the protein products of enzymes without an EC designation continues to rise.
sequenced genomes is required for understanding the
relationships between structure and function, applying The EC system represents a hierarchy of functional
that knowledge to important problems in protein engi- classification in which each digit of the four-digit system
neering and drug design, and for understanding the represents a different level of granularity in describing
molecular basis of disease. Yet, although large numbers chemical function. The first digit describes an overall
of genomes have been solved and annotated, many pre- class of enzyme reaction (e.g. transferases, hydrolases,
dicted protein products of unknown function remain. isomerases), whereas the second and subsequent digits
Complicating the annotation problem, a still unknown indicate the subclass, sub-subclass, and specific serial
but probably significant level of misannotation exists number assigned to each individual enzyme. For each
across the sequence databases [1,2,3,4,5], further com- of the six major EC classes (first digit), subclasses and sub-
promising our ability to understand protein function at subclasses may have somewhat different meanings. For
the molecular level. As the structural genomics projects example, in class 1 (oxidoreductases), the second (sub-
move into high gear [6–9], we can expect increasing class) digit describes the substrate type upon which the
numbers of three-dimensional protein structures to EC 1. enzymes act; for class 5 enzymes (isomerases), the
become available whose functions are either uncertain subclass second digit describes different general types of
or even entirely unknown. The promise of structural isomerases (e.g. EC 5.1, isomerases; EC 5.2, racemases
genomics will be blunted without more effective and epimerases; EC 5.3, intramolecular isomerases).
approaches for predicting protein function from sequence Because of its long-term use as a gold standard and its
and structural information. For protein products with ease of use in computational efforts, the EC system is
insufficient similarity to proteins of characterized func- used almost universally to describe enzyme function in
tion, functional inference from sequence and structure databases of sequence, structural and metabolic pathway
remains a difficult problem. information.

Current Opinion in Chemical Biology 2003, 7:230–237 www.current-opinion.com


Definitions of enzyme function for the structural genomics era Babbitt 231

Despite these significant advantages, the EC system also Figure 1


has several disadvantages for use in genome-era analyses
[3,11]. For example, EC designations presume a 1:1:1 NH2 OH
relationship between gene, protein, and reaction, which Melamine
N N N N
does not take into account the fact that enzymes can have
deaminase
more than one function or that oligomeric proteins gen- H2N N NH2 H 2N N NH2
erated from the protein products of different genes may
perform a single function. Perhaps the most important
disadvantage of the EC system for describing function in Cl OH
the era of structural genomics is that it does not provide N N
Atrazine
N N
any information about the mechanism, especially in a way chlorohydrolase
that can be effectively associated with sequence or struc- N N N N N N
tural patterns of conservation. Although it has been noted Current Opinion in Chemical Biology
that similarity among sequences is strongly correlated
with similarities in mechanism, this is most frequently Chemical reactions of melamine deaminase and atrazine
so at the level of a common partial reaction, rather than at chlorohydrolase.
the level of an overall transformation such as described by
the EC system [2,11–14,15]. In effect, because the
underlying conceptual framework of the EC system catalyzing the o-succinylbenzoate synthase reaction across
was developed before sequence or structural information several bacterial species is generally well below the 40%
was available, it is not well suited to mapping mechan- identity cutoff [20].
istically important functional characteristics to enzyme
structure, particularly in terms of families and superfa- The correlation between EC number and sequence/
milies of homologous proteins. structure conservation below the 30–40% sequence iden-
tity cutoff is much more problematic. This represents a
The consequences of this deficit can be evaluated by level generally associated by structural biologists with
examining the results of several recent attempts to cor- superfamily classification [21]. At the superfamily level,
relate enzyme functions to families and superfamilies of where the member proteins have often diverged to per-
protein sequences using the EC system. At high levels of form different overall functions, EC families often do not
sequence similarity, EC numbers correlate well, as might correlate well with structural families, suffering from two
be expected. Here, descriptions of overall chemical reac- general problems: first, the EC system may cluster struc-
tions as reflected in enzyme names and their associated turally dissimilar proteins as functionally similar; and
EC numbers can usually be used for transfer of functional second, the EC system may cluster structurally similar
annotation in highly similar proteins. In a recent assess- proteins as functionally dissimilar. Both pose significant
ment of functional variation across the members of homo- problems for transfer of functional annotation.
logous enzyme superfamilies represented in the Protein
Data Bank (PDB), Todd et al. [16] found, in general An early example of the first problem is provided by the
agreement with others [3], that even variation in the enzymes muconate lactonizing enzyme (MLE) and car-
fourth digit of the EC number is rare above a sequence boxymuconate lactonizing enzyme (CMLE), whose reac-
identity threshold of 40%. However, exceptions to this tions are illustrated in Figure 2. These two enzymes,
rule are prevalent, with one recent report describing the
sequences of two enzymes that perform different overall
Figure 2
transformations, melamine deaminase and atrazine chloro-
hydrolase (Figure 1), to be 98% identical [17]. Consistent

with this example, other groups have found the correla- O2C
CO2−
CO2−
tion between the level of sequence similarity and the CO2−
CO2−
level of EC classification similarity to be somewhat pro-
blematic, particularly with respect to the unevenness of MLE EC 5.5.1.1 EC 5.5.1.2 CMLE
the correlations [18,19]. According to these studies, reli-
able annotations are difficult to achieve using simple
CO2− CO2−
sequence identity cutoffs. Numerous other examples O O
O O
can be cited from the general literature in which sequence −
H O2C
identities >40% represent proteins with different EC
numbers and thus, different overall functions. Conver- Current Opinion in Chemical Biology
sely, homologs with much lower sequence identities
sometimes have the same or very similar EC codes. Chemical reactions of muconate lactonizing enzyme (MLE) and
For example, the overall sequence identity of the enzyme carboxymuconate lactonizing enzyme (CMLE).

www.current-opinion.com Current Opinion in Chemical Biology 2003, 7:230–237


232 Biocatalysis and biotransformation

members of the catachuate and protocatechuate bran- Additional evidence for disparities between EC func-
ches, respectively, of the b-ketoadipate pathway, have tional class and structural similarity has been recorded
nearly identical EC numbers, differing only in the last by other investigators [24,26].
digit, which in this case denotes a difference only in
substrate specificity. The similarities in their overall How should we define enzyme function at
reactions as well as their analogous roles in parallel the superfamily level?
branches of the same metabolic pathway presumably If the EC system does not map well to the structural
led to their receiving nearly identical EC designations. similarities seen at the superfamily level, what alterna-
Because these numbers were assigned before the deter- tives can be suggested? It would seem that development
mination of their sequences, it was not possible to of more structurally contextual mappings between struc-
ascertain whether or not they were structurally related. ture and function requires a better conceptual under-
Once their sequences were determined, however, it was standing of how conserved elements of function explicitly
trivial to deduce that they evolved not only from dif- map to conserved elements of structure. Several recent
ferent superfamilies but from different fold classes as studies have looked in detail at this problem, and from
well [22]. For MLE and CMLE, it is clearly incorrect to these investigations, several theoretic models of protein
infer structural similarity from similarities in EC num- evolution have emerged [15]. Because these models
ber. Thus, this example provides a caution for the have been described in some detail elsewhere, including
general inference of structural similarity on the basis in the companion paper in this issue by Gerlt and Raushel
of common function. In MLE and CMLE, nature devel- [27], only two are described briefly here.
oped different structural (and mechanistic) strategies to
perform essentially the same overall reaction. While it is The first model, based originally on the work of Horowitz
difficult to assess how widespread this problem may be, [28,29], can be termed ‘substrate-constrained’ and
several large-scale studies have recently provided addi- describes the case in which substrate specificity is con-
tional examples of this type of disconnection between served in divergent evolution. This model predicts that
EC designations and protein superfamilies [23,24,25]. the conserved elements of structure that can be mapped
In one of these studies, Galperin and co-workers [23] to conserved elements of function are associated with the
found many enzymes without detectable sequence simi- ability to bind common ligands. A recent study of
larities to one another for 105 EC numbers. In 34 of the enzymes in Escherichia coli suggests that the occurrence
cases examined, these enzymes could be shown to of this model may be relatively uncommon [30]. The
belong to different folds, similar to the MLE/CMLE second model, termed ‘chemistry-constrained evolution’,
example. As the sequence and structure databases con- builds on the earlier observations of Petsko et al. [31], and
tinue to grow, we expect that many additional examples predicts that conserved elements of structure that can be
of such ‘analogous’ (but not homologous) enzymes will mapped to conserved elements of function are chemical
be described. capabilities (e.g. a specific partial reaction or chemical
capability) [2,12–14,15,27]. Other investigators have
Perhaps more problematic for transferring functional described related concepts for the association of function
information from sequence or structure relationships is and structure through evolutionary divergence and
the second problem described above, that EC designa- detailed the prevalence of the ‘chemistry-constrained’
tions often differ for structurally similar proteins. In two model in genomes that have been examined [11,16,
large-scale studies focusing on all of the protein super- 25,30,32,33].
families in the PDB [16] and, in greater detail, on the
superfamilies on the (a/b)8 (TIM barrel) fold [25], We and our collaborators have examined the chemistry-
Thornton and colleagues found many superfamilies con- constrained model using specific superfamilies, including
taining more than one EC designation, even at the level the enolase [34,35], crotonase [36], haloacid dehalogenase
of the first digit. Of the 167 protein superfamilies studied [37], vicinal oxygen chelate [38], and amidohydrolase
from the PDB, nearly half showed variation in their EC [27] superfamilies, and discussed the importance of this
classification; for 22 of these superfamilies, the EC des- model for developing new ways of describing enzyme
ignation was not conserved at any level. Extensive details function likely to be useful for functional inference. Many
of this study are available at http://www.biochem.ucl.ac. other superfamilies that appear to follow one (or some-
uk/bsm/FAM-EC/. In the study on (a/b)8 barrels, Nagano times more) of these evolutionary models have been
et al. [25] found that 61 EC numbers are represented in described.
the 21 superfamilies described. One of these superfami-
lies, the FMN-oxidoreductase/PP-binding proteins, con- A lesson that has emerged from such studies, particularly
tains enzymes representing four of the six primary EC with respect to the chemistry-constrained model of super-
classes! Moreover, when homologous sequences were family evolution, is that describing enzyme function in
added into their analyses, the number of functions asso- terms of the overall transformation, such as with the EC
ciated with a given (a/b)8 superfamily often doubled. designations, is not very useful for prediction of function

Current Opinion in Chemical Biology 2003, 7:230–237 www.current-opinion.com


Definitions of enzyme function for the structural genomics era Babbitt 233

Table 1 Figure 3

EC numbers and chemical capabilities associated with some


members of the enolase, crotonase and haloacid dehalogenase (a) 10
superfamilies. Cu++ATPase.Ec LDTVVF D KTGTLTEG
Cu++ATPase.Hs VKVVVF D KTGTITHG
Superfamily Fundamental partial reaction/chemical capability
Ca++ATPase.At ATTICS D KTGTLTTN
Enolase Metal-dependent abstraction of a-protons of Urf.Mj KVAIVF D SAGTLVKI
carboxylic acids to form stabilized enolate PhosSerPhos.Hs ADAVCF D VDSTVIRE
intermediates 2-DO-6-PPhos.Sc VDLCLF D LDGTIVST
EC number Overall reaction DL-Gly-3-Phos.Sc INAALF D VDGTIIIS
4.2.1.6 Galactonate dehydatase Phosphon.Pa LQAAIL D WAGTVVDF
4.2.1.11 Enolase Phosphon.St IHAVIL D WAGTTVDF
4.2.1.40 Glucarate dehydratase Phosphon.Bc IEAVIF D WAGTTVDY
4.3.1.2 Methylaspartate ammonia-;yase PhosGlycolPhos.Rs MPGVVF D LDGTLVHS
5.1.2.3 Mandelate racemase NtermDom.IGPD.Pp VQALLL D MDGVMAEV
5.5.1.1 Muconate lactonizing enzyme B-PhosGlucoMut.Ll FKAVLF D LDGVITDT
6.2.1.26 o-Succinylbenzoate-CoA synthase HaloAcidDehal.PspYL IKGIAF D LYGTLFDV
NtermDomEpoxHyd.Hs LRAAVF D LDGVLALP
Crotonase Stabilization of oxyanion intermediates derived EnolasePhos.Ko IRAIVT D IEGTTSDI
from thioesters
EC number Overall reaction
3.1.2.4 3-Hydroxyisobuyryl-CoA hydrolase (b)
3.4.21.92 Atp-dependent Clp protease
3.8.1.6 4-Chlorobenzoyl-CoA dehalogenase
4.1.1.41 Methymalonyl-CoA decarboxylase
4.1.3.36 Naphthoate synthase
4.2.1.17 Enoyl-CoA hydratase (crotonase)
5.3.3.- D3,5,D2,4-dienoyl-CoA isomerase D10

Haloacid Hydrolysis, phosphoryl group transfer via


dehalogenase hydrolytic nucleophilic substitution
3.1.3.3 Phosphoserine phosphatase
3.1.3.15 Histidinol phosphatase
3.11.1.1 Phosphonatase
3.1.3.18 Phosphoglycolate phosphatase
3.8.1.2 Haloacid dehalogenase
5.4.2.6 b-Phosphoglucomutase

of unknown reading frames with no close homologs in the


sequence databases. Thus, if the closest homolog for a
newly sequenced protein has diverged to perform a Conserved motifs important for function in the haloacid dehalogenase
superfamily. (a) Multiple sequence alignment showing the motif
different overall function than that of the unknown containing the important active site nucleophile (labeled as D10 in the
sequence in question, annotation transfer on the basis alignment) common to the fundamental chemistry of members of the
of this overall reaction will be wrong. Examples of super- haloacid dehalogenase superfamily. Modified from [37] with permission
families that illustrate this problem are provided in (copyright, Biochemistry 1998, from which sequence accession
numbers can be obtained. (b) Structural superposition showing several
Table 1, which shows that several different overall reac-
members of the haloacid dehalogenase superfamily. The active site
tions (as exemplified by their EC numbers) can be nucleophile designated in Figure 3a is also labeled as D10 in the
associated with a given superfamily. Yet for each super- superposition. Structures shown are haloacid dehalogenase from
family set, all of the homologous member structures can Xanthobacter autrophicus (white), PDB 1qq5 [47];
be explicitly associated with a single fundamental che- phosphonoacetaldehyde hydrolase from Bacillus cereus (cyan), PDB
1fez [39]; phosphoserine phosphatase from Methanococcus jannaschii
mical capability. For one of these superfamilies, the (green), PDB 117min [48]; probable phosphatase from Haemophilus
haloacid dehalogenase superfamily, the structural corre- influenzae (magenta), PDB 1kle [49]; b-phosphoglucomutase from
lates to the fundamental chemical capability are given in Lactococcus lactis (yellow), PDB 1lvh; and polynucleotide kinase from
Figure 3. Figure 3a shows the conserved sequence motif T4 phage (orange), 1ltq [50]. Structural superposition and visualization
for Figures 3 and 4 were generated using Chimera [51], available from
that contains the aspartic acid (labeled as D10 in Figures
http://www.cgl.ucsf.edu/chimera.
3a and 3b) that functions as an active site nucleophile in
the C–X, P–C and P–O bond cleavage reactions catalyzed
by members of the superfamily [37]. Figure 3b shows a
superposition of several conserved active site residues, sequences cannot easily be aligned, except around the
including the critical aspartic acid nucleophile, for some conserved active site residues whose sidechains are
structurally characterized members of the superfamily. shown in Figure 3b (see [39], for a sequence alignment
Members of this superfamily are so divergent that their showing all of these active site motifs).

www.current-opinion.com Current Opinion in Chemical Biology 2003, 7:230–237


234 Biocatalysis and biotransformation

The value of structure–function mapping for In this latter experiment, we asked whether the active
developing new definitions of enzyme site residues associated with the fundamental partial
function reactions common to all enolase superfamily members,
From the ideas described here emerges a general abstraction of a proton a to a carboxylate (see Babbitt et al.
approach for functional annotation of new superfamily [34] for details), could provide sufficient information to
members. First, sequence and structural relationships at find homologous structures in available databases. Simi-
the superfamily level should be determined and con- lar in concept to several other published approaches
served structural characteristics identified from multiple using active site templates to find structural homologs
alignments of sequence and structure. Next, the partial [40–43], this experiment used only the information from
reactions/catalytic capabilities of functionally character- the positions of the a carbons and side-chain centroids for
ized members of the superfamily should be deduced and five active site residues conserved across all of the
those that are common to all members of the superfamily divergent superfamily members (Figure 4). Using the
identified. Explicit mappings can then be made between SPASM algorithm [44], several enolase superfamily
conserved elements of structure and conserved elements member templates were generated and used to search
of function. From this information, the model or models the PDB for active site homologs. The results showed
of protein evolution that apply to the superfamily can be that the best of these templates could find all of the other
identified. If the superfamily follows the ‘substrate-con- divergent superfamily members with both high specifi-
strained’ model, new superfamily sequences are likely to city and sensitivity (EC Meng and PC Babbitt, unpub-
bind the specific ligand (or moiety of a ligand) bound by lished results), providing additional support for the
other members of the superfamily. If the superfamily notion that the tight coupling between conserved func-
follows the ‘chemistry-constrained’ model, a new super- tional characteristics and conserved structural character-
family sequence can be expected to perform the partial istics in enzyme superfamilies represents a fundamental
reaction/chemical capability associated with the other property of those enzymes. Definition of such structure–
members of the superfamily, even though it may perform function paradigms provides a powerful basis for infer-
a very different overall reaction and bind ligands different ence of function, especially for ‘chemistry-constrained’
from those bound by any other member of the super- superfamilies.
family. For one such superfamily, the enolase superfam-
ily, the usefulness of this approach has recently been Looking forward: problems and
expanded for in vitro protein engineering (described in unanswered questions
the companion paper in this volume by Gerlt and Raushel Although the concepts associated with the structure–
[27]) and for finding structurally distant superfamily function paradigm described here have been shown to
members using only the information from the conserved be useful for functional inference and analysis for the
active site architecture. small number of enzyme superfamilies upon which they

Figure 4

Structural superposition showing conserved active site residues of several members of the enolase superfamily. Structures shown are mandelate
racemase from Pseudomonas putida (white), PDB 1mdr [52]; muconate lactonizing enzyme from P. putida (cyan), PDB 1muc [53]; galactonate
dehydratase from E. coli (green) [54]; o-succinylbenzoate synthase from E. coli (magenta), PDB 1fhv [55]; enolase from Saccharomyces cerevisae
(yellow), PDB 1ebh [56]; and b-methylaspartate ammonia-lyase from Clostridium tetanomorphum (orange), PDB 1kcz [57].

Current Opinion in Chemical Biology 2003, 7:230–237 www.current-opinion.com


Definitions of enzyme function for the structural genomics era Babbitt 235

have been tested thus far, it will be important to evaluate References and recommended reading
this approach for functional inference on many more Papers of particular interest, published within the annual period of
review, have been highlighted as:
examples. To this end, we have begun a collaboration
with the UCSF Resource for Biocomputing, Informatics,  of special interest
 of outstanding interest
and Visualization to develop a Structure–Function Link-
age Database to capture superfamily information in terms 1. Brenner SE: Errors in genome annotation. Trends Genet 1999,
15:132-133.
of the sequences, structures and overall and partial reac-
2. Gerlt JA, Babbitt PC: Can sequence determine function?
tions for a large number of enzyme superfamilies. This Genome Biol 2000, 1:reviews0005.1-0005.10.
effort is also intended to facilitate the development of
new enzyme function descriptors that are appropriate for 3. Wilson CA, Kreychman J, Gerstein M: Assessing annotation
 transfer for genomics: quantifying the relations between
mapping structure to function and that can be handled protein sequence, structure and function through traditional
computationally. However, for such definitions to be and probabilistic scores. J Mol Biol 2000, 297:233-249.
A useful discussion of major issues in functional annotation for the
useful beyond the limits of the investigation of detailed genome projects.
molecular function, it will be important that they be 4. Devos D, Valencia A: Intrinsic errors in genome annotation.
integrated with computationally tractable descriptions Trends Genet 2001, 17:429-431.
of higher levels of organization (e.g. overall chemical 5. Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA: Modeling
function, metabolic paths/networks, cells and organisms). the percolation of annotation errors in a database of protein
sequences. Bioinformatics 2002, 18:1641-1649.
At the next highest level of organization for enzyme
function, metabolic pathways, the Gene Ontology system 6. Brenner SE: A tour of structural genomics. Nat Rev Genet 2001,
2:801-809.
[45] and other computational resources for functional
analysis all use the EC system effectively. Consistent 7. Teichmann SA, Murzin AG, Chothia C: Determination of protein
function, evolution and interactions by structural genomics.
with this observation, in a recent attempt at exploring the Curr Opin Struct Biol 2001, 11:354-363.
evolution of enzymes in metabolism using a network 8. Burley SK, Bonanno JB: Structuring the universe of proteins.
approach, Alves et al. [46] found the EC system to be Annu Rev Genomics Hum Genet 2002, 3:243-262.
useful for connecting sequence information to substrates 9. Chance MR, Bresnick AR, Burley SK, Jiang JS, Lima CD, Sali A,
and products. Thus, although the EC system can be Almo SC, Bonanno JB, Buglino JA, Boulton S et al.: Structural
genomics: a pipeline for providing structures for the biologist.
inappropriate for mapping conserved functional charac- Protein Sci 2002, 11:723-738.
teristics to conserved elements of structure in divergent
10. International Union of Biochemistry and Molecular Biology:
enzyme superfamilies, it remains essential for describing Nomenclature Committee: Enzyme Nomenclature 1992:
overall chemical functions and relating those overall Recommendations of the Nomenclature Committee of the
International Union of Biochemistry and Molecular Biology on the
functions to higher-order biological organization. Devel- Nomenclature and Classification of Enzymes. Edited by Webb EC.
opment of a new system based on structure–function San Diego: Published for the International Union of Biochemistry
and Molecular Biology by Academic Press; 1992.
mappings at the finer level suggested here can best be
envisaged as an addition to, not a replacement for, the EC 11. Riley M: Systems for categorizing functions of gene products.
Curr Opin Struct Biol 1998, 8:388-392.
system. Ultimately, such a system would be useful for
structure–function mapping at the individual enzyme, 12. Babbitt PC, Gerlt JA: Understanding enzyme superfamilies:
chemistry as the fundamental determinant in the evolution of
family and superfamily levels. new catalytic activities. J Biol Chem 1997, 272:30591-30594.
13. Gerlt JA, Babbitt PC: Mechanistically diverse enzyme
Conclusions superfamilies: the importance of chemistry in the evolution of
catalysis. Curr Opin Chem Biol 1998, 2:607-612.
Although the EC system is useful for describing
enzyme function at the level of overall reactions, these 14. Babbitt PC, Gerlt JA: New functions from old scaffolds: how
nature reengineers enzymes for new functions. Adv Protein
definitions are problematic for correlating structure and Chem 2000, 55:1-28.
function at a finer level. Recent research has shown that 15. Gerlt JA, Babbitt PC: Divergent evolution of enzymatic function:
more refined definitions of enzyme function described  Mechanistically diverse superfamilies and functionally distinct
suprafamilies. Annu Rev Biochem 2001, 70:209-246.
at the level of the partial chemical reactions that make A recent review of the major models describing the evolution of enzyme
up the overall reactions described by the EC provide function from a structural perspective. This paper includes numerous
more useful mappings between structural and func- examples of mechanistically diverse superfamilies and introduces a
model for ‘functionally distinct suprafamilies’ in which only active site
tional elements conserved across superfamilies. Devel- architecture, but not a common functional correlate, is conserved.
opment of a new system for describing enzyme function 16. Todd AE, Orengo CA, Thornton JM: Evolution of function in
at this level of granularity is needed for the inference  protein superfamilies, from a structural perspective. J Mol Biol
of functional properties from sequence and structural 2001, 307:1113-1143.
An extensive and thoughtful assessment of functional variation across a
similarities. large set of enzyme superfamilies and the relationships of EC number
similarity to structural similarity. Accompanied by a link to an extensive
on-line compilation of supporting material.
Acknowledgements 17. Seffernick JL, de Souza ML, Sadowsky MJ, Wackett LP:
We thank Elaine C Meng, PhD, for making Figures 3b and 4 and for helpful Melamine deaminase and atrazine chlorohydrolase: 98
discussions. The research in the author’s laboratory is supported by NIH percent identical but functionally different. J Bacteriol 2001,
GM60595 and NC RR01081. 183:2405-2410.

www.current-opinion.com Current Opinion in Chemical Biology 2003, 7:230–237


236 Biocatalysis and biotransformation

18. Devos D, Valencia A: Practical limits of function prediction. lactonizing enzyme compared with mandelate racemase and
Proteins 2000, 41:98-107. enolase. Proc Natl Acad Sci USA 1998, 95:10396-10401.
19. Rost B: Enzyme function less conserved than anticipated. 36. Holden HM, Benning MM, Haller T, Gerlt JA: The crotonase
J Mol Biol 2002, 318:595-608. superfamily: divergently related enzymes that catalyze
different reactions involving acyl coenzyme a thioesters.
20. Palmer DR, Garrett JB, Sharma V, Meganathan R, Babbitt PC, Acc Chem Res 2001, 34:145-157.
Gerlt JA: Unexpected divergence of enzyme function and
sequence: ‘N-acylamino acid racemase’ is o-succinylbenzoate 37. Baker AS, Ciocci MJ, Metcalf WW, Kim J, Babbitt PC, Wanner BL,
synthase. Biochemistry 1999, 38:4252-4258. Martin BM, Dunaway-Mariano DD: Insights into the mechanism
of catalysis of the P-C bond cleaving enzyme
21. Lo Conte L, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: phosphonoacetaldehyde hydrolase derived from gene
SCOP database in 2002: refinements accommodate structural sequence analysis and mutagenesis. Biochemistry 1998,
genomics. Nucleic Acids Res 2002, 30:264-267. 37:9305-9315.
22. Williams SE, Woolridge EM, Ransom SC, Landro JA, Babbitt PC, 38. Armstrong RN: Mechanistic diversity in a metalloenzyme
Kozarich JW: 3-Carboxy-cis, cis-muconate lactonizing enzyme superfamily. Biochemistry 2000, 39:13625-13632.
from Pseudomonas putida is homologous to the class II
fumarase family: a new reaction in the evolution of a 39. Morais MC, Zhang W, Baker AS, Zhang G, Dunaway-Mariano D,
mechanistic motif. Biochemistry 1992, 31:9768-9776. Allen KN: The crystal structure of bacillus cereus
phosphonoacetaldehyde hydrolase: insight into catalysis of
23. Galperin MY, Walker DR, Koonin EV: Analogous enzymes: phosphorus bond cleavage and catalytic diversification within
independent inventions in enzyme evolution. Genome Res 1998, the HAD enzyme superfamily. Biochemistry 2000,
8:779-790. 39:10385-10396.
24. Hegyi H, Gerstein M: The relationship between protein structure 40. Artymiuk PJ, Poirrette AR, Grindley HM, Rice DW, Willett P:
and function: a comprehensive survey with application to the A graph-theoretic approach to the identification of three-
yeast genome. J Mol Biol 1999, 288:147-164. dimensional patterns of amino acid side-chains in protein
structures. J Mol Biol 1994, 243:327-344.
25. Nagano N, Orengo CA, Thornton JM: One fold with many
 functions: the evolutionary relationships between TIM barrel 41. Russell RB: Detection of protein three-dimensional side-chain
families based on their sequences, structures and functions. patterns: new examples of convergent evolution. J Mol Biol
J Mol Biol 2002, 321:741-765. 1998, 279:1211-1227.
A careful and extensive study of the functional capabilities of the (ba)
barrel fold including a survey of types and position in sequence and 42. Wallace AC, Borkakoti N, Thornton JM: TESS: a geometric
structure of catalytic residues. Includes a thoughtful discussion of the hashing algorithm for deriving 3D coordinate templates for
evolutionary origins of the fold and, based on these analyses, suggests searching structural databases. Application to enzyme active
collapsing some of the superfamilies in this fold class into supergroups. sites. Protein Sci 1997, 6:2308-2323.
Accompanied by an extensive discussion of the mappings between EC
43. Di Gennaro JA, Siew N, Hoffman BT, Zhang L, Skolnick J, Neilson
numbers and superfamily members.
LI, Fetrow JS: Enhanced functional annotation of protein
26. Nahum LA, Riley M: Divergence of function in sequence-related sequences via the use of structural descriptors. J Struct Biol
 groups of Escherichia coli proteins. Genome Res 2001, 2001, 134:232-245.
11:1375-1381. 44. Kleywegt GJ: Recognition of spatial motifs in protein structures.
Provides an analysis of functional divergence using E. coli as a model J Mol Biol 1999, 285:1887-1897.
system.
45. The Gene Ontology Consortium: Creating the gene ontology
27. Gerlt JA, Raushel FM: Evolution of function in (b/a)8-barrel resource: design and implementation. Genome Res 2001,
enzymes. Curr Opin Chem Biol 2003, 7:in press. 11:1425-1433.
28. Horowitz NH: On the evolution of biochemical syntheses. 46. Alves R, Chaleil RAG, Sternberg MJE: Evolution of enzymes in
Proc Natl Acad Sci USA 1945, 31:153-157. metabolism: a network perspective. J Mol Biol 2002,
320:751-770.
29. Horowitz NH: The evolution of biochemical syntheses -
retrospect and prospect. In Evolving Genes and Proteins. Edited 47. Ridder IS, Rozeboom HJ, Kalk KH, Dijkstra BW: Crystal
by Bryson V, Vogel HJ. New York: Academic Press; 1965:15. structures of intermediates in the dehalogenation of
haloalkanoates by L-2-haloacid dehalogenase. J Biol Chem
30. Teichmann SA, Rison SC, Thornton JM, Riley M, Gough J, Chothia 1999, 274:30672-30678.
C: The evolution and structural anatomy of the small molecule
metabolic pathways in Escherichia coli. J Mol Biol 2001, 48. Wang W, Cho HS, Kim R, Jancarik J, Yokota H, Nguyen HH,
311:693-708. Grigoriev IV, Wemmer DE, Kim SH: Structural characterization of
the reaction pathway in phosphoserine phosphatase:
31. Petsko GA, Kenyon GL, Gerlt JA, Ringe D, Kozarich JW: crystallographic ‘snapshots’ of intermediate states. J Mol Biol
On the origin of enzymatic species. Trends Biochem Sci 1993, 2002, 319:421-431.
18:372-376.
49. Parsons JF, Lim K, Tempczyk A, Krajewski W, Eisenstein E,
32. Jensen RA: Enzyme recruitment in evolution of new function. Herzberg O: From structure to function: YrbI from Haemophilus
Annu Rev Microbiol 1976, 30:409-425. influenzae (HI1679) is a phosphatase. Proteins 2002, 46:393-404.
33. Todd AE, Orengo CA, Thornton JM: Plasticity of enzyme active 50. Galburt EA, Pelletier J, Wilson G, Stoddard BL: Structure of a tRNA
 sites. Trends Biochem Sci 2002, 27:419-426. repair enzyme and molecular biology workhorse: T4
Interesting discussion of variations in enzyme active sites in related polynucleotide kinase. Structure 2002, 10:1249-1260.
proteins, including an interesting set in which related enzymes use
different functional groups, unconserved in position in the linear 51. Huang CC, Couch GS, Pettersen EF, Ferrin TE: Chimera: an
sequence, to mediate the same mechanistic roles. Accompanied by a extensible molecular modeling application constructed using
proposal for how this conundrum might have evolved. standard components. Pac Symp Biocomput 1996, 1:724.
34. Babbitt PC, Hasson M, Wedekind JE, Palmer DJ, Lies MA, Reed 52. Landro JA, Gerlt JA, Kozarich JW, Koo CW, Shah VJ, Kenyon GL,
GH, Rayment I, Ringe D, Kenyon GL, Gerlt JA: The enolase Neidhart DJ, Fujita S, Petsko GA: The role of lysine 166 in the
superfamily: a general strategy for enzyme-catalyzed mechanism of mandelate racemase from Pseudomonas
abstraction of the a-protons of carboxylic acids. putida: mechanistic and crystallographic evidence for
Biochemistry 1996, 35:16489-16501. stereospecific alkylation by (R)-alpha-phenylglycidate.
Biochemistry 1994, 33:635-643.
35. Hasson MS, Schlichting I, Moulai J, Taylor K, Barrett W, Kenyon GL,
Babbitt PC, Gerlt JA, Petsko GA, Ringe D: Evolution of an enzyme 53. Helin S, Kahn PC, Guha BL, Mallows DG, Goldman A: The refined
active site: the structure of a new crystal form of muconate X-ray structure of muconate lactonizing enzyme from

Current Opinion in Chemical Biology 2003, 7:230–237 www.current-opinion.com


Definitions of enzyme function for the structural genomics era Babbitt 237

Pseudomonas putida PRS2000 at 1.85 A resolution. J Mol Biol Escherichia coli in complex with Mg2þ and o-succinylbenzoate.
1995, 254:918-941. Biochemistry 2000, 39:10662-10676.
54. Wieczorek SW, Kalivoda KA, Clifton JG, Ringe D, Petsko GA, 56. Wedekind JE, Reed GH, Rayment I: Octahedral coordination at
Gerlt JA: Evolution of enzymatic activities in the enolase the high-affinity metal site in enolase: crystallographic analysis
superfamily: identification of a ‘‘new’’ general acid catalyst in of the MgII–enzyme complex from yeast at 1.9 Å resolution.
the active site of D-galactonate dehydratase from Escherichia Biochemistry 1995, 34:4325-4330.
coli. J Am Chem Soc 1999, 121:4540-4541.
57. Asuncion M, Blankenfeldt W, Barlow JN, Gani D, Naismith JH:
55. Thompson TB, Garrett JB, Taylor EA, Meganathan R, Gerlt JA, The structure of 3-methylaspartase from Clostridium
Rayment I: Evolution of enzymatic activity in the enolase tetanomorphum functions via the common enolase chemical
superfamily: structure of o-succinylbenzoate synthase from step. J Biol Chem 2002, 277:8306-8311.

www.current-opinion.com Current Opinion in Chemical Biology 2003, 7:230–237

You might also like