You are on page 1of 13

C H A P T E R

6
Databases for Bioenergy-Related Enzymes
Yanbin Yin
Department of Biological Sciences, Northern Illinois University, DeKalb, IL, USA
email: yyin@niu.edu

O U T L I N E

Plant Biomass 95 Purdue Cell Wall Genomics and UC-Riverside Cell


Wall Navigator Databases 102
Bioenergy-Related Enzymes and Regulation 96
Plant Coexpression Network Databases: PlaNet and
Databases and Web Servers 98 ATTED 102
CAZy Database 98
Future Perspectives 103
CAT and dbCAN 101
FOLy Database 102 References 103

PLANT BIOMASS Note that unlike celluloses, hemicelluloses and


pectins both refer to a collection of complex polysaccha-
The major components of plant biomass are rides mostly with side chains. Hemicelluloses contain
carbohydrate-rich cell walls, composed of different bio- four major groups: xyloglucans, mannans, xylans and
polymers such as polysaccharides and lignins as well mixed-linkage glucans, while pectins contain three ma-
as some minor wall structural glycoproteins (Somerville jor groups: galacturonans, rhamnogalacturonan I and
et al., 2004). Biomass used for biofuel production is pri- rhamnogalaturonan II. Each of the groups of hemicellu-
marily derived from secondary cell walls. For example, loses and pectins do not refer to a single type of polysac-
wood cells from poplar trees contain a thin layer of pri- charides; they often refer to polysaccharides with the
mary cell walls and multiple layers of much thicker and same backbone structure (sugars and linkages) while
tougher secondary cell walls. All plant cells of different with different side chains. Due to this reason, all
tissues have primary cell walls while only in developed these biopolymers are cross-linked and interwoven
cells (stopped growing) secondary cell walls appear (Somerville et al., 2004; Himmel et al., 2007) to form
(Cosgrove, 2005). The chemical compositions in primary very complex and heterogeneous cell wall structures.
and secondary cell walls differ significantly (Mohnen Particularly celluloses are wrapped by hemicelluloses
et al., 2008). The primary cell wall contains no lignins and buried in a lignin network and not accessible to
and the polysaccharides include celluloses, hemicellu- enzymes so that the degradation efficiency is very low
loses (primarily xyloglucans and mannans in dicots if no costly chemical pretreatment is applied.
and xylans in monocots) and pectins. However, in the Although celluloses are simple polymer of glucose
secondary cell walls, there are higher percentage of cellu- linked by beta-1,4,-glucosidic bond, the complexity of
loses, different hemicellulosic polysaccharides (primar- chemical compositions of hemicelluloses and pectins is
ily xylans in both dicots and monocots) and lignins. For remarkably high (Somerville et al., 2004). The reasons
example, wood secondary cell walls contain 35e50% are as follows: (1) there are 14 different monosacchar-
celluloses, 25e30% hemicelluloses (mostly xylans) and aides (sugar units) found in hemicelluloses and pectins
15e30% lignins (Himmel et al., 2007). (Pauly and Keegstra, 2008b); (2) the possible glycosidic

Bioenergy Research: Advances and Applications 95


http://dx.doi.org/10.1016/B978-0-444-59561-4.00006-1 Copyright Ó 2014 Elsevier B.V. All rights reserved.
96 6. DATABASES FOR BIOENERGY-RELATED ENZYMES

linkages formed between two sugars are extremely The most important enzymes are clearly those
diverse as theoretically they can be connected between involved in polysaccharide synthesis and lignin synthe-
any hydroxyl group of two sugars and (3) sugars in sis. To form polysaccharides, glycosyltransferases (GTs)
the polysaccharides can be further modified by, e.g. take the activated sugar donors, nucleoside diphosphate
methylation, acetylation or esterification. sugars (NDP-sugars), as the substrates to build glyco-
Lignins, however, are complex heterogeneous sidic bonds between two sugars. Except for celluloses,
phenolic polymers and chemically very distinct from other cell wall polysaccharides are mostly synthesized
polysaccharides. They are formed by three major in Golgi apparatus, where GTs, NDP-sugar biosynthetic
monomers: hydroxyphenyl (H), guaiacyl (G) and enzymes and sugar transporters are located and work
syringyl (S) units, which are derived from coumaryl together. Glycoside hydrolases (GHs), on the other
alcohol, coniferyl alcohol and sinapyl alcohol, res- hand, are used to break glycosidic bonds through hydro-
pectively (Boerjan et al., 2003). All the biopolymers lysis reactions to release sugars from polysaccharides. In
in plant cell walls are cross-linked and interwoven plants this is often used to modify existing polysaccha-
(Somerville et al., 2004; Himmel et al., 2007) to form rides, e.g. when plant cells are growing, while in mi-
very complex and heterogeneous structures, which is crobes GHs are the most critical enzymes degrading
believed to enable cell walls recalcitrant to enzymatic plant biomass. Clearly not all GH and GT enzymes are
degradation. involved in cell wall polysaccharide metabolism, as
Besides, cell wall compositions and structures also many of them are involved in metabolism of storage
vary from tissue to tissue. The reason is because the polysaccharides, glycoproteins, glycolipids and other
cell wall biosynthetic enzymes are differentially regu- glycol-conjugates that are not relevant to plant cell walls.
lated and expressed in different tissues. Furthermore, All GT and GH enzymes are categorized by the CAZy
different plants, especially those of distant evolutionary (Carbohydrate Active enZyme) database (CAZyDB)
clades, have very distinct cell wall biopolymer composi- (Cantarel et al., 2009), which provides a general classifi-
tions. For example, grasses generally have significantly cation scheme for all carbohydrate active enzymes
higher percentage of xylans than trees (Pauly and (CAZymes) and is widely accepted by the carbohydrate
Keegstra, 2008a). research community. So far there are a limited number of
For biofuel production, polysaccharides especially enzymes biochemically or genetically characterized to
celluloses are favored as their degradation releases be involved in plant cell wall synthesis or modification,
fermentable sugars. Lignins, however, are phenolic many of which belong to some large GH and GT fam-
polymers and chemically distinct from polysaccharides, ilies. For example, the GT2 family is known to include
not giving rise to sugars and meant to be removed in the cellulose synthases and some hemicellulose backbone
biofuel production. In order to develop transgenic synthases (Lerouxel et al., 2006), such as mannan syn-
plants with modified cell wall compositions (i.e. higher thases (Dhugga et al., 2004; Liepman et al., 2005), puta-
cellulose and lower lignin content), we need a better un- tive xyloglucan synthases (Cocuron et al., 2007), and
derstanding of how plant cell wall polysaccharides and mixed linkage glucan synthases (Burton et al., 2006).
lignins are synthesized. With respect to the synthesis of xylan, the most abun-
dant hemicellulose, proteins of GT43, GT47 and GT8
are likely to be involved (Zhong et al., 2005; Brown
BIOENERGY-RELATED ENZYMES et al., 2007; Lee et al., 2007; Pena et al., 2007; Persson
AND REGULATION et al., 2007; York and O’Neill, 2008; Brown et al., 2009;
Wu et al., 2009).
Because of the overwhelming complexity of cell Some of these known cell wall-related CAZyme fam-
wall polymers in terms of their chemical compositions, ilies are included in Purdue’s Cell Wall genomics data-
linkages and structures, plant biomass formation and base (Yong et al., 2005) and UC-Riverside’s (UCR) Cell
microbial degradation involve a surprisingly large num- Wall Navigator database (Girke et al., 2004), and more
ber of genes in plant and microbes, respectively. For families are discussed in the literature or to be character-
example, presumably every different glycosidic bond in ized in terms of their roles in biomass-related polysac-
the polysaccharides will be formed using a different charide formation and degradation. For example, a
enzyme. It is estimated that w10% of genes in Arabidopsis few recent papers (Scheller and Ulvskov, 2010; Driouich
genome are involved in cell wall synthesis and modifi- et al., 2012) and a book (Ulvskov, 2011) updated our
cation (Yong et al., 2005), which account for w2000 knowledge about the GT family members involved in
genes encoding enzymes for sugar and lignin precursor cell wall synthesis: GT2, 8, 31, 34, 37, 43, 47, 61, 64, 75,
synthesis, polysaccharide and lignin synthesis and 77, while there must be more GT families not included
modification, lignin-polysaccharide cross-linking, tran- and to be identified as cell wall related (CWR), e.g.
scription factors (TFs)and signaling proteins, etc. GT92 (Liwanag et al., 2012).
BIOENERGY-RELATED ENZYMES AND REGULATION 97
Lignins are complex heterogeneous polymers with few book chapters (Table 6.1) gave overviews of latest
lots of aromatic rings. The monolignol synthesis progress in a specific area of cell wall research and are
pathway that starts from phenylalanine to synthesize very useful for pointing to the original research papers
G, S and H units has been relatively well known, with reporting the characterization of specific CWR genes.
about 10 gene families characterized encoding most of In terms of degradation, cell wall polysaccharides are
the enzymes in the pathway (Humphreys and Chapple, degraded by microbial GHs and other CAZymes that
2002; Boerjan et al., 2003; Vanholme et al., 2008; Xu et al., are defined and categorized in CAZyDB. Lignins are
2009; Zhong and Ye, 2009; Li and Chapple, 2010; Weng mostly degraded by microbes too particularly by certain
and Chapple, 2010; Carpita, 2012). All these lignin fungi (Dashtban et al., 2009). Enzymes involved in the
synthesis-related enzymes have been extensively degradation include fungal laccases and peroxidases,
reviewed in the literature and are included in Purdue’s which are categorized in the FOLy (fungal oxidative
Cell Wall genomics database. Transporting the units to enzymes) database (Levasseur et al., 2008). Note that
the outside of the cell and assembling them into lignin these two families are not restricted to fungi. Instead
polymers are less understood but some candidate trans- they both belong to large protein families having many
porters and two major enzyme families, peroxidase and homologs in various organisms such as plants, animals
laccase, are suggested (McCaig et al., 2005; Liu et al., and bacteria, bearing slightly different biochemical
2011; Zhang et al., 2011; Alejandro et al., 2012; Carpita, activities (Welinder, 1992). As mentioned above, these
2012; Handford et al., 2012; Sibout and Hofte, 2012). enzyme families are also used for lignin polymerization
As with all other metabolic pathways, biomass forma-
tion and degradation are also under strict regulation. TABLE 6.1 Selected Publications for CWR Genes
However, compared to enzymatic activities, regulatory
mechanism is even more difficult to elucidate. In plants, Category Publications
only a handful of TFs are known to regulate cell wall Lignins (Humphreys and Chapple, 2002; Boerjan et al.,
synthesis. The most studied process is the regulation 2003; Vanholme et al., 2008; Xu et al., 2009; Zhong
of lignin biosynthesis (Zhong and Ye, 2009; Zhong and Ye, 2009; Li and Chapple, 2010; Weng and
et al., 2010; Zhao and Dixon, 2011; Wang and Dixon, Chapple, 2010; Carpita, 2012)
2012). TF families NAC, WRKY, and MYB among a Celluloses (Somerville, 2006; Joshi and Mansfield, 2007; Gu
few others have been shown to directly or indirectly con- and Somerville, 2010; Endler and Persson, 2011;
trol the monolignol synthesis. Some of the TF family Lei et al., 2012)
members are global regulators that regulate the entire Xylans (York and O’Neill, 2008; Carpita, 2012; Doering
secondary cell wall synthesis, including the synthesis et al., 2012)
of celluloses and xylans, suggesting that the different Hemicelluloses (Sandhu et al., 2009; Scheller and Ulvskov, 2010;
biopolymers in biomass are not synthesized indepen- Carpita, 2012; Driouich et al., 2012)
dently but in a coordinated way. On the other hand,
Pectins (Mohnen, 2008; Harholt et al., 2010; Driouich
genetically modifying the regulation of cell wall biosyn- et al., 2012)
thesis represents a very promising way to improve the
desired traits of bioenergy crops. For example, Wang NDP-sugars (Reiter and; Vanzin, 2001; Seifert, 2004; Reiter,
2008; Bar-Peled and O’Neill, 2011)
et al. showed that a mutation found in a WRKY TF could
rewire the regulatory network of secondary cell wall TFs* (Zhong and Ye, 2009; Zhong et al., 2010; Zhao and
synthesis and improve 50% of the biomass production Dixon, 2011; Carpita, 2012; Wang and Dixon,
2012)
in Arabidopsis (Wang et al., 2010). Similarly, micro ribo-
nucleic acids (miRNAs) are also excellent targets for Transporters (Liu et al., 2011; Zhang et al., 2011; Alejandro
controlling the regulation of cell wall synthesis (Fu et al., 2012; Carpita, 2012; Handford et al., 2012;
Sibout and Hofte, 2012)
et al., 2012), which is less discussed in the literature.
Clearly looking for novel transcription regulators, either miRNAs (Sun; Sun et al.; Fu et al., 2012)
TFs and miRNAs, and further building the regulatory GTs* (Yokoyama and Nishitani, 2004; Scheller and
network of cell wall synthesis is the ultimate goal for Ulvskov, 2010; Driouich et al., 2012; Harholt et al.,
the elucidation of the mechanism of biomass formation. 2012)
Recently a few plant journals published special issues Other CAZymes (Yokoyama and Nishitani, 2004; Minic, 2008)
on plant cell wall researches: Plant Physiology (McCann
DUF* families (Carpita, 2012; Hansen et al., 2012; Harholt et al.,
and Rose, 2010), Current Opinion in Plant Biology 2012)
(Pauly and Keegstra, 2008b), Frontiers in Plant Science
(Debolt and Estevez, 2012) and Molecular Plant (has a Bioinformatics (Egelund et al., 2004; Yong et al., 2005; Penning
et al., 2009; Michel et al., 2010)
cell wall biology category). Particularly, a number of
review articles published in these special issues and a * TF, transcription factor; GT, glycosyltranferase; DUF, domain of undefined function.
98 6. DATABASES FOR BIOENERGY-RELATED ENZYMES

in plants. There is also increasing evidence to show that in the literature. Therefore, dedicated, centralized
such enzymes are also used for lignin degradation in and frequently updated databases of bioenergy-related
bacteria (Claus, 2003; Li et al., 2009; Bugg et al., 2011a, genes are crucial to guide the development of transgenic
2011b). biofuel crops and the annotation of newly sequenced
Notably many cell wall biosynthesis-related gene metagenomes/genomes to look for novel enzymes.
families are also rooted in bacteria (Royo et al., 2000; Like other biology research areas, bioenergy research
Nobles and Brown, 2004; Emiliani et al., 2009; Yin has also been benefiting from bioinformatics. Table 6.2
et al., 2009; Weng and Chapple, 2010; Yin et al., 2010, provides a list of bioinformatics databases and web-
2011; Popper et al., 2011). In other words, although car- based resources developed specifically for bioenergy-
bohydrate and lignin-rich plant cell walls are almost related enzymes as well as some general bioinformatics
unique to plants, the biosynthetic machinery has web resources that are valuable for bioenergy research.
evolved from ancient gene families that were already Here we offer a summary of a few selected resources
present in early prokaryotes. On the degradation side, that are particularly useful.
microbes are responsible for breaking down biomass,
while plants also contain homologs of many microbial
degrading enzymes such GHs and peroxidases. Obvi-
CAZy Database
ously plants also inherited these enzymes for different For bioenergy research, CAZymes are obviously the
purposes: modify existing polysaccharide or complete most important enzymes. The CAZyDB team started
the lignification process. to classify and annotate CAZyme proteins from Gen-
Similarly, less is known about the regulation of Bank, UniProt and PDB to protein families since the
enzymes for the polysaccharide and lignin degradation early 1990s. It is the original database that defined
in microbes than the enzymes themselves. As opposed over 300 CAZyme protein families throughout the past
to plants, microbes involved in biomass degradation 20 years and the most comprehensive database
are more taxonomically distributed spanning from providing high-quality manual annotation by extracting
eukaryotic fungi (e.g. Neurospora crassa) to prokaryotic associated knowledge from the literature (Cantarel et al.,
bacteria (Clostridium thermocellum). As a result, the regu- 2009). Its Web site regularly updates every few weeks,
lation systems in these divergent organisms are often mainly by assigning new proteins in public databases
not very conserved, e.g. many of the TFs found in fungi to existing CAZyme families by sequence similarity
are not present in bacteria and vice versa. Furthermore, search or creating new families if there are new bio-
there are numerous model microbes used for bioenergy chemically characterized CAZyme proteins (supported
research and the transcription regulators regulating by published papers) that do not belong to existing
cellulases, hemicellulases or ligninases are highly CAZyme families. Sometimes the functional annotation
dispersed in the literature, e.g. (Aro et al., 2005; Portnoy information (e.g. known activities) for some families is
et al., 2011; Coradetti et al., 2012; Sun et al., 2012). All also updated if relevant literature came out.
these make the curation and annotation of the regulators Currently the database comprises five classes of pro-
and targeting cis elements to be very difficult. Recently, tein families: in addition to GHs and GTs, there are three
global gene expression data (e.g. microarray) and other other classes, carbohydrate esterases (CEs), polysaccha-
omics data have been generated to help study the regu- ride lyases (PLs) and carbohydrate binding modules
lation of biomass degradation (Nataf et al., 2010; Raman (CBMs). As aforementioned, GTs are used for building
et al., 2011; Riederer et al., 2011; Yang et al., 2012), which polysaccharides or glycolconjugates, while GH for
represents the future trend of understanding biofuel breaking them. CE and PL are also used for breaking car-
production at the systems biology level. Similar to regu- bohydrate molecules while using different mechanisms
lators for cell wall synthesis, there is a lack of web-based or cutting different chemical bonds. CBMs, as indicated
bioinformatics databases to include the regulatory genes by names, are structural modules used for recognizing
for bioenergy-related degradation enzymes. and binding different sugars. Currently the five classes
contain 94 GT families, 131 GH families, 16 CE families,
22 PL families and 64 CBM families. Each class also has
DATABASES AND WEB SERVERS an unclassified family, meaning proteins are annotated
to belong to a certain class but are not able to be assigned
Our knowledge about cell wall biosynthesis and to any existing families in that class. Each family is
biodegradation has steadily increased in the past named with the class name followed by a sequential
decades, although there is still a long way to go to fully number, e.g. GT2. Note such name does not indicate
understand these extremely complex processes. So far any biochemical activity of each family. The reason is
our knowledge is at best fragmented and characterized that these families are defined solely based on sequence
genes are often from various organisms and dispersed similarity: there are many cases that one family contains
DATABASES AND WEB SERVERS 99
TABLE 6.2 Bioenergy-Related Databases

Database Description References

General CAZyDB General carbohydrate active enzyme database; a classification (Cantarel et al., 2009)
scheme with five classes (GH, GT, CE, PL and CBM) and over 300
families; supported by the biochemical literature; links to proteins in
GenBank, UniProt and structures in PDB; subfamilies for selected
families; Enzyme commission annotation and biochemically curated
function annotation
CAT CAZyme analysis toolkit; allow CAZyme annotation of user (Park et al., 2010)
submitted data using BLAST (basic local alignment search tool) and
Pfam-based search
dbCAN Web server for automated CAZyme annotation; allow submission of (Yin et al., 2012)
predicted protein sequences from newly sequenced genomes and
metagenomes and return a table and graphical diagram to show the
matched CAZyme domains; users could also download HMMs
representing all CAZyme domains to perform annotation locally by
running HMMER3 (hmmer.org)

Plant biomass Survey of databases A paper reviewed nine public databases (Cao et al., 2010)
formation for cell wall
synthesis
XTH World Xyloglucosyl transferase gene nomenclature; gene structure and
literature in Arabidopsis, rice and tomato
MAIZEWALL Cell wall gene catalogue, expression data and literature in maize (Guillaumie et al., 2007)
coreCarb PlaNet family tool; Arabidopsis CAZyme proteins; sequence, (Mutwil et al., 2009)
expression, regulon (coexpressed genes), phylogeny data
GolgiP Web server for prediction of Golgi localized proteins (Chou et al., 2010)
Purdue cell wall General classification scheme for plant cell wall synthesis; with (Yong et al., 2005)
genomics mutants and spectrotype information; also including lignin
synthesis and modification and NDP-sugar synthesis genes and
signaling proteins etc.; phylogeny for gene families in Arabidopsis,
rice, maize and sorghum; literature
UC-Riverside cell Similar classification scheme for plant cell wall synthesis proteins to (Girke et al., 2004)
wall navigator Purdue cell wall database but not including lignin-related and
signaling proteins; including sequence, literature and microarray
expression data; primarily for Arabidopsis and rice, but also generally
from UniProt

Stanford cellulose Designed for the CesA (cellulose synthase) superfamily and (Richmond and
homologs; deprecated web site Somerville, 2000)
Rice GT Part of the rice phylogenomics database; rice GT protein phylogeny, (Cao et al., 2008)
sequence, expression, mutants, ortholog, BLAST
Csl families Web site supplemental to (Yin et al., 2009); protein sequences, (Yin et al., 2009)
alignments and phylogeny of the CesA superfamily in fully
sequenced plant and algal genomes
pDAWG CAZymes in fully sequenced plant and algal genomes based on (Mao et al., 2009)
search against CAZyDB; phylogeny, predicted subcellular
localization and proteineprotein interaction data; BLAST

PPI for cell wall Proteineprotein interaction graphs for cell wall-related proteins in (Zhou et al., 2010a)
Arabidopsis

PlaNet General coexpression database for seven plant organisms and (Mutwil et al., 2011)
comparison among them; highest reciprocal rank based
coexpression and clustering using Heuristic Cluster Chiseling
Algorithm (HCCA, Mutwil et al.); queried gene-centered display of
coexpression network

(Continued)
100 6. DATABASES FOR BIOENERGY-RELATED ENZYMES

TABLE 6.2 Bioenergy-Related Databasesdcont’d


Database Description References

Cell wall Biclustering coexpression analysis of cell wall-related genes from (Wang et al., 2012)
coexpression Purdue cell wall genomics database; coexpression modules and
database graphs generated using Cytoscape (Shannon et al., 2003); cis-
regulatory elements identified in promoter regions of genes of a
same module
ATTED General coexpression database and predicted cis-regulatory (Obayashi et al., 2009)
elements for Arabidopsis and rice; mutual rank based; also identified
conserved coexpression links and referred to proteineprotein
interaction data
Biomass GAS db Glycosyl hydrolase AnnotationS (GAS) database; GH data identified (Zhou et al., 2010b)
degradation from UniProt and JGI metagenomes based on CAZyDB and Pfam
search; featured with the graphical domain diagrams and
comparison between two selected bacteria
FOLyDB Fungal enzymes for lignin degradation; 10 families of lignin oxidases (Levasseur et al., 2008)
and auxiliary enzymes; proteins from GenBank, UniProt and PDB
PeroxiBase General peroxidase database including peroxidases (EC 1.11.1.x) (Fawal et al., 2012)
from over 1000 organisms; lignin-related peroxidases are a subset of
the database
LccED General laccase database and their homologs in the multicopper (Sirim et al., 2011)
oxidase superfamily
Misc Biofuel feedstock Database of 54 plant organisms with sequenced genomes or (Childs et al., 2012)
genomic significant amount of EST (expressed sequence tag) data; integrated
resource (BFGR) data including expression, ortholog and paralog, pathway
prediction, and functional information

BESC-KB Knowledgebase for the Bioenergy Science Center of DOE; a web (Syed et al., 2012)
portal to a number computational tools and databases dedicated for
bioenergy research and developed within the center
Pathway-genome Populus trichocarpa metabolic pathways generated automatically (Nag et al., 2012)
database of poplar through the Pathway Tool; currently the NDP-sugar biosynthetic
pathways were manually curated by experts
JGI IMG/M Joint Genome Institute’s integrated microbial genomes and (Markowitz et al., 2012)
metagenomes web site
Phytozome JGI’s plant genome web site; currently most sequenced plant (Goodstein et al., 2012)
genomes are available in this web site

proteins characterized with different biochemical activ- Web page was not updated since 2002 because after
ities. Recent efforts from the CAZyDB team suggest the family was created it was shown that most CE10
that further classification of family into subfamilies family members do not take carbohydrates as substrate;
could be useful as subfamily may contain proteins CBM33 was thought to be a carbohydrate active binding
with the same activity (Stam et al., 2006; Lombard module but later shown likely to be an oxidase family.
et al., 2010; Aspeborg et al., 2012). For a decade, CAZyDB provides an HTML page for
CAZyDB’s annotation also evolved in the past 20 each family to list member proteins and associated func-
years. Among the 327 CAZyme families as of December tional information. In recent updates, CAZyDB added a
2012, there are 10 depreciated families; they were Web page for each genome, providing a list of GenBank
created during the life course of CAZyDB but later protein accession numbers of that genome together with
were deleted since they were shown not related to carbo- the CAZyme family assignment for each protein, which
hydrate metabolism or due to some other reasons. How- is termed “CAZyome” of an organism. So far, there are
ever, to keep the existing nomenclature system almost 2400 genomes spanning from eukaryotes to
unchanged, these family names remain in the system prokaryotes and viruses annotated in CAZyDB. It is
but indicated to be deleted on the Web pages for these said that such genome-scale annotation of CAZyme
families. Other examples include CE10 family, whose proteins is done semiautomatically (Coutinho and
DATABASES AND WEB SERVERS 101
Henrissat, 2011). A backend automated domain module- Analysis Toolkit (CAT) (Park et al., 2010) allows a BLAST
based search is performed first and then manual curation search against CAZyme proteins annotated by CAZyDB
will be conducted to remove false positives or include and also a Pfam domain-based search. The simple BLAST
false negatives. Obviously this process is rather accurate search suffers from the inability to accurately annotate
and of high quality but time consuming because it is the prevalent multidomain CAZyme proteins. The
done manually and requires expert knowledge. Indeed Pfam domain-based search can solve this problem.
such genome annotation can only be done by the collab- These Pfam domains are either given by CAZyDB in
oration with the CAZyDB team, which is often invisible the CAZyme family Web pages or identified to corre-
to and out of the control of the users, e.g. people who spond to CAZyme family using an association rule built
sequenced the genome. Over the past 10 years, the by CAT. However, there are only 142 (46%) of over 300
CAZyDB team has done expert CAZyme annotation CAZyme families linked to Pfam domains by CAZyDB.
for dozens of genome sequencing projects that led to a In fact, many of the Pfam domains were originally
lot of collaborative genome annotation papers. created after CAZyDB.
We recently developed dbCAN (Yin et al., 2012) to
define a signature domain model for all CAZyme fam-
CAT and dbCAN ilies. Aside from the 142 CAZyme families annotated
With more and more bioenergy-related genomes of with a Pfam domain, we managed to associate other
plants and microbes as well as environmental metage- CAZyme families to functional domains in a broader
nomes sequenced, there is an urgent need for automated and general protein domain database CDD . This way,
CAZyme annotation. Although such annotation will not we were able to find a CDD domain for 248 CAZyme
reach a quality as accurate as the expert annotation from families. For the remaining families, we performed a
CAZyDB, it is expected to be much faster and users can literature curation by reading relevant biochemical
control the annotation at their will. Moreover, nowadays papers that are linked to these families by CAZyDB. In
all newly sequenced genomes are relying on generic the end, we extracted the domain regions in all the mem-
protein domain/family databases such as Pfam (Finn ber proteins annotated in CAZyDB and built a multiple
et al., 2006), InterPro (Hunter et al., 2009), and conserved sequence alignment (MSA) for each of the CAZyme fam-
domain database (CDD, Marchler-Bauer et al., 2009) for ily. These MSAs were further processed and represented
automatic genome annotation. Clearly annotation from by hidden Markov models (HMMs), statistical models
these databases is often too general and too far from widely used in the bioinformatics field to represent pro-
the exact function; the precisely actual function still tein sequence alignments, e.g. by Pfam.
needs to be determined by experimental approaches. As of June 2012, dbCAN has 320 HMMs representing
However, most genome annotators are still interested 317 CAZyme families and three cellulosome modules.
in such genome-scale annotation, as it can give them a We provide all these HMMs freely to the public so that
quick summary about what the genome encodes, how they can run domain-based tool hmmscan of the
large the gene families are and how that compares to HMMER 3.0 package (hmmer.org) to annotate their
other genomes. genomes/metagenomes in a local computer, exactly the
In fact, even CAZyDB’s manual annotation (assign way that people perform Pfam, InterPro or CDD annota-
proteins to existing CAZyme families) on newly tion. To help users who do not know how to run hmmscan
sequenced genomes is unlikely to be 100% correct. on a Linux PC, we offer the web server (http://csbl.bmb.
Considering every new genome contains a high percent- uga.edu/dbCAN/annotate.php) so that people can sub-
age of proteins that are not experimentally studied, the mit their sequences for annotation on the web. The 320
manual curation is still largely based upon additional CAZyme family-specific HMMs are our key contribution
bioinformatics analysis such as BLAST search against to the carbohydrate research community and ideally
public protein sequence databases (e.g. UniProt (Bairoch should be included in the general protein domain/family
et al., 2005)) and domain databases (e.g. Pfam) and database such as Pfam in the future.
inspection of top matches. In addition to the Web server, dbCAN also provides a
With these in mind, automated CAZyme annotation is database where precomputed CAZyme homologs in a
still very useful, e.g. particularly for a quick and general number of protein databases are showed on the Web.
overview of how many CAZymes and what CAZymes a Particularly, starting from the 320 dbCAN HMMs, we
newly sequenced genome has. Using the annotated scanned public metagenome datasets such as NCBI-
CAZyme proteins and classification scheme in CAZyDB env-nr, CAMERA (Seshadri et al., 2007), JGI metagenomes
as the foundation, two bioinformatics efforts have (Markowitz et al., 2012), human gut metagenomes
been published since 2010, both supporting automated (Meta-HIT) (Qin et al., 2010) and cow rumen metage-
CAZyme annotation, given a protein sequence dataset nomes (Hess et al., 2011) as well as plant (Goodstein
predicted from a genome/metagenome. The CAZyme et al., 2012), bacterial and fungal genomes. Tests on
102 6. DATABASES FOR BIOENERGY-RELATED ENZYMES

Arabidopsis thaliana (plant) and C. thermocellum (bacteria) category to support their classification. Proteins of the
using CAZyDB as the positive set suggest that the auto- two databases are primarily from model organisms
mated CAZyme annotation achieved a fairly good accu- such as Arabidopsis and rice. However, it is easy to use
racy (A. thaliana: sensitivity ¼ 96.3%, precision ¼ 78.8% the sequences as query to search for their homologs and
and average ¼ 87.6%; C. thermocellum: sensitivity ¼ 99.3% even orthologs in other plants. The protein accessions
and precision ¼ 89.4%). Particularly the sensitivity is over and classification complied by the two databases serve
95% for both organisms, meaning dbCAN annotation as an excellent overview of the current achievement
tends not to lose true CAZyme proteins. made by the entire plant cell wall community in terms
of our latest understanding of cell wall synthesis.
FOLy Database
Plant Coexpression Network Databases: PlaNet
Inspired by CAZyDB, Levasseur et al. developed a
new database named FOLy, for the classification of ligni-
and ATTED
nases in fungi (Levasseur et al., 2008), as these enzymes As we mentioned earlier, regulation of biomass for-
are critical for breaking down lignins in the biomass but mation and degradation is also extremely important.
are not included in CAZyDB. Similar to CAZyDB, Studying the regulation has been benefited a lot from
FOLyDB started from biochemically characterized pro- coexpression analysis of microarray data and recently
teins or structures to recruit homologs from GenBank, on high-throughput RNA sequencing data. There are
UniProt and PDB databases. Based on sequence similar- numerous tools, Web based or stand alone, allowing
ity, three lignin oxidase families and seven lignin deg- for coexpression analysis with user-submitted genes as
rading auxiliary enzyme families were created, each query or by general browsing. Many online tools even
containing biochemically characterized proteins together offer prebuilt coexpression networks, with nodes in the
with their sequence homologs. Similarly, FOLyDB is network graphs representing genes and edges repre-
featured with expert manual curation of continuingly senting coexpression relationships. These coexpression
published literature to include more characterized pro- networks are very informative and insightful, in terms
teins in order to create new families and populate the of suggesting candidate genes involving in the same
database. Like CAZyDB, it is not designed for automated metabolic pathways or potential regulators of genes of
genome annotation but BLAST and Pfam domain-based interest to the users (Ruprecht and Persson, 2012).
search against annotated proteins in FOLyDB has been Although there are many such tools available, in plant
widely used to annotate newly sequenced genomes for cell wall field PlaNet (Mutwil et al., 2011) family tools
ligninases. (coreCarb (Mutwil et al., 2009), AraNet, GeneCat) stand
out as they were developed by researchers of the cell
Purdue Cell Wall Genomics and UC-Riverside wall field, have a Web-based interface and have been
shown to be effective in suggesting new genes for cell
Cell Wall Navigator Databases
wall synthesis (Mutwil et al., 2009; Ruprecht et al., 2011).
Unlike the above general protein family databases, PlaNet placed query genes in network graphs of three
Purdue’s Cell Wall Genomics (Yong et al., 2005) and levels: (1) the coexpressed node vicinity network, con-
UCR’s Cell Wall Navigator (Girke et al., 2004) databases taining the query gene and genes coexpressed two steps
are specifically designed for plant cell wall biosynthesis. away and the links among them; (2) a larger coexpres-
As opposed to sequence similarity-based classification, sion cluster containing the query gene and genes coex-
both databases categorize proteins based on their physi- pressed, which resulted from running a heuristic
ological roles in cell wall synthesis. In UCR’s database, clustering algorithm; and (3) the largest meta-network
there are five categories: monosaccharide synthesis, with nodes now representing all coexpression clusters
polysaccharide synthesis, reassembly, structural proteins instead of individual genes. PlaNet also has a module
and glycoprotein synthesis, basically all centered on car- called NetworkComparer, allowing a comparative anal-
bohydrate molecules in cell walls. However, Purdue’s ysis of gene expression networks across seven plant or-
database is even broader with six categories: pathways ganisms. Such comparative coexpression analysis has
for substrate generation, polysaccharide synthases, recently become very popular as it can help deal with
secretion and targeting pathways, assembly/architec- missing data in a single species, reduce false positives
ture and growth, differentiation and secondary wall identified as coexpressed in a single species, and enable
formation and signaling and response pathways. Partic- to study the conservation of coexpression network from
ularly, Purdue’s database also includes lignin synthesis an evolution perspective (Movahedi et al., 2012).
and polymerization proteins as well as signaling proteins An earlier tool ATTED-II (Obayashi et al., 2009) is also
involved in cell wall synthesis. Both databases also pro- well known, which was developed by plant biologists
vide references and the literature associated with each since 2003 as a database for Arabidopsis tissue-specific
REFERENCES 103
(ATTED) expression. Compared to PlaNet, ATTED-II newly characterized CWR genes such as TF family
provides richer annotation and a lot of useful links to NAC, WRKY, MYB members shown to control lignin
external resources in the query gene browse page. It synthesis; many of the newly characterized CAZyme
also has a nicer network graph with less genes (top families such as GT43, 61, 75, transporters for NDP-
certain amount of genes for better visualization) and sugars and monolignols; miRNAs; DUF (domain of un-
gene function information (biologically meaningful known function) families etc. It includes neither much
gene names) and labeled TFs. Besides, ATTED-II also annotation data nor any search functionalities.
predicted cis-regulatory elements in the upstream re- Therefore, the future plant CWR gene databases
gions of coexpression genes. However, ATTED-II should aim to include all experimentally characterized
currently includes only two plants: Arabidopsis and CWR genes from any organisms, associated sequences
rice, and the gene locus page is only available for and functional descriptions collected from the published
Arabidopsis. literature, e.g. those listed in Table 6.1. Such character-
ized gene list could be highly useful for annotating
sequenced bioenergy plants such as switchgrass, poplar,
FUTURE PERSPECTIVES maize, sorghum and Eucalyptus grandis. The CWR gene
repertories for these organisms will be highly valuable
As a perspective for the future development of for the bioenergy research community as people are
bioenergy-related databases, we ask: what do we trying to select candidate CWR genes to knock down
need from newly developed databases? Nucleic Acids or overexpress for developing transgenic plants in these
Research publishes a prestigious annual Database Spe- model organisms. Gene families for CWR genes and
cial Issue since 20 years ago. Most databases published other extensive secondary bioinformatics data should
there for a particular class of proteins such as plant also be included in the databases, particularly phylog-
TFs (Guo et al., 2008), peroxidases (Fawal et al., 2012), eny (used to identify orthologs from homologs), pre-
transporters (Saier et al., 2006) and peptidases (Rawlings dicted cis-regulatory element, conserved coexpression
et al., 2012), all provide the following data or functional- network modules of known CWR genes, expression
ities: (1) a general classification of targeted protein profiling, coexpressed gene list including noncoding
families, manually collected references, a list of charac- RNAs, genomic location, gene neighborhood, epige-
terized proteins curated from the literature and/or pre- nomics, proteineprotein interactions, structures, subcel-
dicted member proteins; (2) secondary data derived lular locations, single-nucleotide polymorphism, indels,
from further in-depth bioinformatics analysis, such etc. Similar databases should also be developed for plant
as computer-based functional annotation (e.g. Gene CAZymes and include the above bioinformatics-derived
Ontology or protein domain annotation), sequence data types. The reason is that CAZyDB now only
alignment, phylogenetic trees, predicted protein struc- covered 2 (A. thaliana and Oryza sativa) of over 40
tures etc.; (3) simple Web-based BLAST search against sequenced plant and green algal genomes, not to
the sequence database and text search using keywords; mention there are more incomplete genomes and tran-
(4) long-term maintenance to update regularly with scriptomes (ESTs and RNA-seq data).
new data; and (5) plenty of documentation such as
help, FAQ or tutorial pages. These could be considered
as criteria for a good protein family database. References
Although the plant biomass formation-related data- Alejandro, S., Lee, Y., Tohge, T., Sudre, D., Osorio, S., Park, J., Bovet, L.,
bases listed in Table 6.2 are all very useful, none of Lee, Y., Geldner, N., Fernie, A.R., Martinoia, E., 2012. AtABCG29
them have sufficiently integrated various functional is a monolignol transporter involved in lignin biosynthesis. Curr.
omics data. Biologists working on one model plant often Biol. 22, 1207e1212.
Aro, N., Pakula, T., Penttila, M., 2005. Transcriptional regulation of
want to take advantage of these data to study their inter- plant cell wall degradation by filamentous fungi. FEMS Microbiol.
ested genes, e.g. investigate fully sequenced plant ge- Rev. 29, 719e739.
nomes to look for orthologs, or transcriptome data Aspeborg, H., Coutinho, P.M., Wang, Y., Brumer 3rd, H., Henrissat, B.,
(microarray and RNA-seq) for expression profiles or 2012. Evolution, substrate specificity and subfamily classification
look for coexpressed genes and go to the upstream re- of glycoside hydrolase family 5 (GH5). BMC Evol. Biol. 12, 186.
Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B.,
gions for candidate cis-regulatory motifs; all these ana- Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M.,
lyses have to be done using individual bioinformatics Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.,
tools or servers, which often requires expert knowledge 2005. The universal protein resource (UniProt). Nucleic Acids Res.
to run or to interpret the results. In addition, many of the 33, D154eD159.
databases are outdated and none of them have included Bar-Peled, M., O’Neill, M.A., 2011. Plant nucleotide sugar formation,
interconversion, and salvage by sugar recycling. Annu. Rev. Plant
all CWR genes. For example, Purdue’s database is an Biol.
excellent resource, but it does not include many of the
104 6. DATABASES FOR BIOENERGY-RELATED ENZYMES

Boerjan, W., Ralph, J., Baucher, M., 2003. Lignin biosynthesis. Annu. Driouich, A., Follet-Gueye, M.L., Bernard, S., Kousar, S., Chevalier, L.,
Rev. Plant Biol. 54, 519e546. Vicre-Gibouin, M., Lerouxel, O., 2012. Golgi-mediated synthesis
Brown, D.M., Goubet, F., Wong, V.W., Goodacre, R., Stephens, E., and secretion of matrix polysaccharides of the primary cell wall of
Dupree, P., Turner, S.R., 2007. Comparison of five xylan synthesis higher plants. Front. Plant Sci. 3, 79.
mutants reveals new insight into the mechanisms of xylan syn- Egelund, J., Skjot, M., Geshi, N., Ulvskov, P., Petersen, B.L., 2004.
thesis. Plant J. 52, 1154e1168. A complementary bioinformatics approach to identify potential
Brown, D.M., Zhang, Z., Stephens, E., Dupree, P., Turner, S.R., 2009. plant cell wall glycosyltransferase-encoding genes. Plant Physiol.
Characterization of IRX10 and IRX10-like reveals an essential role 136, 2609e2620.
in glucuronoxylan biosynthesis in Arabidopsis. Plant J. 57, 732e746. Emiliani, G., Fondi, M., Fani, R., Gribaldo, S., 2009. A horizontal gene
Bugg, T.D., Ahmad, M., Hardiman, E.M., Rahmanpour, R., 2011a. transfer at the origin of phenylpropanoid metabolism: a key
Pathways for degradation of lignin in bacteria and fungi. Nat. adaptation of plants to land. Biol. Direct 4.
Prod. Rep. 28, 1883e1896. Endler, A., Persson, S., 2011. Cellulose synthases and synthesis in
Bugg, T.D., Ahmad, M., Hardiman, E.M., Singh, R., 2011b. The Arabidopsis. Mol. Plant 4, 199e211.
emerging role for bacteria in lignin degradation and bio-product Fawal, N., Li, Q., Savelli, B., Brette, M., Passaia, G., Fabre, M.,
formation. Curr. Opin. Biotechnol. 22, 394e400. Mathe, C., Dunand, C., 2012. PeroxiBase: a database for large-scale
Burton, R.A., Wilson, S.M., Hrmova, M., Harvey, A.J., Shirley, N.J., evolutionary analysis of peroxidases. Nucleic Acids Res.
Stone, B.A., Newbigin, E.J., Bacic, A., Fincher, G.B., 2006. Cellulose Finn, R.D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S.,
synthase-like CslF genes mediate the synthesis of cell wall (1,3;1,4)- Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A.,
beta-D-glucans. Science 311, 1940e1942. Durbin, R., Eddy, S.R., Sonnhammer, E.L., Bateman, A., 2006. Pfam:
Cantarel, B.L., Coutinho, P.M., Rancurel, C., Bernard, T., Lombard, V., clans, web tools and services. Nucleic Acids Res. 34, D247eD251.
Henrissat, B., 2009. The carbohydrate-active enZymes database Fu, C., Sunkar, R., Zhou, C., Shen, H., Zhang, J.Y., Matts, J., Wolf, J.,
(CAZy): an expert resource for glycogenomics. Nucleic Acids Res. Mann, D.G., Stewart Jr., C.N., Tang, Y., Wang, Z.Y., 2012. Over-
37, D233eD238. expression of miR156 in switchgrass (Panicum virgatum L.) results
Cao, P.J., Bartley, L.E., Jung, K.H., Ronald, P.C., 2008. Construction of a in various morphological alterations and leads to improved
rice glycosyltransferase phylogenomic database and identification biomass production. Plant Biotechnol. J. 10, 443e452.
of rice-diverged glycosyltransferases. Mol. Plant 1, 858e877. Girke, T., Lauricha, J., Tran, H., Keegstra, K., Raikhel, N., 2004. The cell
Cao, P.J., Jung, K.H., Ronald, P.C., 2010. A survey of databases for wall navigator database. A systems-based approach to organism-
analysis of plant cell wall-related enzymes. BioEnergy Res. 3, unrestricted mining of protein families involved in cell wall
108e114. metabolism. Plant Physiol. 136, 3003e3008.
Carpita, N.C., 2012. Progress in the biological synthesis of the plant Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D.,
cell wall: new ideas for improving biomass for bioenergy. Curr. Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N.,
Opin. Biotechnol. 23, 330e337. Rokhsar, D.S., 2012. Phytozome: a comparative platform for green
Childs, K.L., Konganti, K., Buell, C.R., 2012. The biofuel feedstock plant genomics. Nucleic Acids Res. 40, D1178eD1186.
genomics resource: a web-based portal and database to enable Gu, Y., Somerville, C., 2010. Cellulose synthase interacting protein: a
functional genomics of plant biofuel feedstock species. Database new factor in cellulose synthesis. Plant Signaling Behav. 5,
(Oxford) bar061. 1571e1574.
Chou, W.C., Yin, Y., Xu, Y., 2010. GolgiP: prediction of golgi-resident Guillaumie, S., San-Clemente, H., Deswarte, C., Martinez, Y.,
proteins in plants. Bioinformatics 26, 2464e2465. Lapierre, C., Murigneux, A., Barriere, Y., Pichon, M., Goffner, D.,
Claus, H., 2003. Laccases and their occurrence in prokaryotes. Arch. 2007. MAIZEWALL. Database and developmental gene expression
Microbiol. 179, 145e150. profiling of cell wall biosynthesis and assembly in maize. Plant
Cocuron, J.C., Lerouxel, O., Drakakaki, G., Alonso, A.P., Physiol. 143, 339e363.
Liepman, A.H., Keegstra, K., Raikhel, N., Wilkerson, C.G., 2007. A Guo, A.Y., Chen, X., Gao, G., Zhang, H., Zhu, Q.H., Liu, X.C., Zhong, Y.F.,
gene from the cellulose synthase-like C family encodes a beta-1,4 Gu, X., He, K., Luo, J., 2008. PlantTFDB: a comprehensive plant
glucan synthase. Proc. Natl. Acad. Sci. USA 104, 8550e8555. transcription factor database. Nucleic Acids Res. 36, D966eD969.
Coradetti, S.T., Craig, J.P., Xiong, Y., Shock, T., Tian, C., Glass, N.L., Handford, M., Rodriguez-Furlan, C., Marchant, L., Segura, M.,
2012. Conserved and essential transcription factors for cellulase Gomez, D., Alvarez-Buylla, E., Xiong, G.Y., Pauly, M., Orellana, A.,
gene expression in ascomycete fungi. Proc. Natl. Acad. Sci. USA 2012. Arabidopsis thaliana AtUTr7 encodes a golgi-localized
109, 7397e7402. UDP-glucose/UDP-galactose transporter that affects lateral root
Cosgrove, D.J., 2005. Growth of the plant cell wall. Nat. Rev. Mol. Cell emergence. Mol. Plant 5, 1263e1280.
Biol. 6, 850e861. Hansen, S.F., Harholt, J., Oikawa, A., Scheller, H.V., 2012. Plant gly-
Coutinho, P.M., Henrissat, B., 2011. Annotating carbohydrate-active cosyltransferases beyond CAZy: a perspective on DUF families.
enzymes in plant genomes: present challenges. In: Ulvskov, P. Front. Plant Sci. 3, 59.
(Ed.), Annual Plant Reviews: Plant Polysaccharides, Biosynthesis Harholt, J., Sorensen, I., Fangel, J., Roberts, A., Willats, W.G.,
and Bioengineering. Wiley-Blackwell, Oxford, UK, pp. 93e107. Scheller, H.V., Petersen, B.L., Banks, J.A., Ulvskov, P., 2012. The
Dashtban, M., Schraft, H., Qin, W., 2009. Fungal bioconversion of glycosyltransferase repertoire of the spikemoss Selaginella moel-
lignocellulosic residues; opportunities & perspectives. Int. J. Biol. lendorffii and a comparative study of its cell wall. PLoS One 7,
Sci. 5, 578e595. e35846.
Debolt, S., Estevez, J.M., 2012. Current challenges in plant cell walls: Harholt, J., Suttangkakul, A., Vibe Scheller, H., 2010. Biosynthesis of
editorial overview. Front. Plant Sci. 3, 232. pectin. Plant Physiol. 153, 384e395.
Dhugga, K.S., Barreiro, R., Whitten, B., Stecca, K., Hazebroek, J., Hess, M., Sczyrba, A., Egan, R., Kim, T.W., Chokhawala, H.,
Randhawa, G.S., Dolan, M., Kinney, A.J., Tomes, D., Nichols, S., Schroth, G., Luo, S., Clark, D.S., Chen, F., Zhang, T., Mackie, R.I.,
Anderson, P., 2004. Guar seed beta-mannan synthase is a member Pennacchio, L.A., Tringe, S.G., Visel, A., Woyke, T., Wang, Z.,
of the cellulose synthase super gene family. Science 303, 363e366. Rubin, E.M., 2011. Metagenomic discovery of biomass-degrading
Doering, A., Lathe, R., Persson, S., 2012. An update on xylan synthesis. genes and genomes from cow rumen. Science 331, 463e467.
Mol. Plant 5, 769e771.
REFERENCES 105
Himmel, M.E., Ding, S.Y., Johnson, D.K., Adney, W.S., Nimlos, M.R., McCaig, B.C., Meagher, R.B., Dean, J.F., 2005. Gene structure and
Brady, J.W., Foust, T.D., 2007. Biomass recalcitrance: engineering molecular analysis of the laccase-like multicopper oxidase (LMCO)
plants and enzymes for biofuels production. Science 315, 804e807. gene family in Arabidopsis thaliana. Planta 221, 619e636.
Humphreys, J.M., Chapple, C., 2002. Rewriting the lignin roadmap. McCann, M., Rose, J., 2010. Blueprints for building plant cell walls.
Curr. Opin. Plant Biol. 5, 224e229. Plant Physiol. 153, 365.
Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Michel, G., Tonon, T., Scornet, D., Cock, J.M., Kloareg, B., 2010. The cell
Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., wall polysaccharide metabolism of the brown alga Ectocarpus sili-
Finn, R.D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., culosus. Insights into the evolution of extracellular matrix poly-
Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., saccharides in Eukaryotes. New Phytol. 188, 82e97.
Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Minic, Z., 2008. Physiological roles of plant glycoside hydrolases.
Mulder, N., Natale, D., Orengo, C., Quinn, A.F., Selengut, J.D., Planta 227, 723e740.
Sigrist, C.J., Thimma, M., Thomas, P.D., Valentin, F., Wilson, D., Mohnen, D., 2008. Pectin structure and biosynthesis. Curr. Opin. Plant
Wu, C.H., Yeats, C., 2009. InterPro: the integrative protein signa- Biol. 11, 266e277.
ture database. Nucleic Acids Res. 37, D211eD215. Mohnen, D., Bar-Peled, M., Somerville, C.R., 2008. Cell wall poly-
Joshi, C.P., Mansfield, S.D., 2007. The cellulose paradoxesimple mole- saccharide synthesis. In: Himmel, M.E. (Ed.), Biomass Recalci-
cule, complex biosynthesis. Curr. Opin. Plant Biol. 10, 220e226. trance: Deconstructing the Plant Cell Wall for Bioenergy. Blackwell
Lee, C., O’Neill, M.A., Tsumuraya, Y., Darvill, A.G., Ye, Z.H., 2007. The Publishing, pp. 94e159.
irregular xylem9 mutant is deficient in xylan xylosyltransferase Movahedi, S., Van Bel, M., Heyndrickx, K.S., Vandepoele, K., 2012.
activity. Plant Cell Physiol. 48, 1624e1634. Comparative co-expression analysis in plant biology. Plant Cell
Lei, L., Li, S., Gu, Y., 2012. Cellulose synthase complexes: composition Environ. 35, 1787e1798.
and regulation. Front. Plant Sci. 3, 75. Mutwil, M., Klie, S., Tohge, T., Giorgi, F.M., Wilkins, O., Campbell, M.M.,
Lerouxel, O., Cavalier, D.M., Liepman, A.H., Keegstra, K., 2006. Fernie, A.R., Usadel, B., Nikoloski, Z., Persson, S., 2011. PlaNet:
Biosynthesis of plant cell wall polysaccharides - a complex process. combined sequence and expression comparisons across plant net-
Curr. Opin. Plant Biol. 9, 621e630. works derived from seven species. Plant Cell 23, 895e910.
Levasseur, A., Piumi, F., Coutinho, P.M., Rancurel, C., Asther, M., Mutwil, M., Ruprecht, C., Giorgi, F.M., Bringmann, M., Usadel, B.,
Delattre, M., Henrissat, B., Pontarotti, P., Asther, M., Record, E., Persson, S., 2009. Transcriptional wiring of cell wall-related genes
2008. FOLy: an integrated database for the classification and in Arabidopsis. Mol. Plant 2, 1015e1024.
functional annotation of fungal oxidoreductases potentially Mutwil, M., Usadel, B., Schutte, M., Loraine, A., Ebenhoh, O., Persson,
involved in the degradation of lignin and related aromatic com- S. Assembly of an interactive correlation network for the Arabi-
pounds. Fungal Genet. Biol. 45, 638e645. dopsis genome using a novel heuristic clustering algorithm. Plant
Li, J., Yuan, H., Yang, J., 2009. Bacteria and lignin degradation. Front. Physiol. 152, 29e43.
Biol. China 4, 29e38. Nag, A., Karpinets, T.V., Chang, C.H., Bar-Peled, M., 2012. Enhancing
Li, X., Chapple, C., 2010. Understanding lignification: challenges a pathway-genome database (PGDB) to capture subcellular local-
beyond monolignol biosynthesis. Plant Physiol. 154, 449e452. ization of metabolites and enzymes: the nucleotide-sugar biosyn-
Liepman, A.H., Wilkerson, C.G., Keegstra, K., 2005. Expression of thetic pathways of Populus trichocarpa. Database (Oxford). bas013.
cellulose synthase-like (Csl) genes in insect cells reveals that CslA Nataf, Y., Bahari, L., Kahel-Raifer, H., Borovok, I., Lamed, R.,
family members encode mannan synthases. Proc. Natl. Acad. Sci. Bayer, E.A., Sonenshein, A.L., Shoham, Y., 2010. Clostridium ther-
USA 102, 2221e2226. mocellum cellulosomal genes are regulated by extracytoplasmic
Liu, C.J., Miao, Y.C., Zhang, K.W., 2011. Sequestration and transport of polysaccharides via alternative sigma factors. Proc. Natl. Acad. Sci.
lignin monomeric precursors. Molecules 16, 710e727. USA 107, 18646e18651.
Liwanag, A.J., Ebert, B., Verhertbruggen, Y., Rennie, E.A., Nobles, D.R., Brown, R.M., 2004. The pivotal role of cyanobacteria in
Rautengarten, C., Oikawa, A., Andersen, M.C., Clausen, M.H., the evolution of cellulose synthases and cellulose synthase-like
Scheller, H.V., 2012. Pectin biosynthesis: GALS1 in Arabidopsis proteins. Cellulose 11, 437e448.
thaliana is a beta-1,4-galactan beta-1,4-galactosyltransferase. Plant Obayashi, T., Hayashi, S., Saeki, M., Ohta, H., Kinoshita, K., 2009.
Cell. ATTED-II provides coexpressed gene networks for Arabidopsis.
Lombard, V., Bernard, T., Rancurel, C., Brumer, H., Coutinho, P.M., Nucleic Acids Res. 37, D987eD991.
Henrissat, B., 2010. A hierarchical classification of polysaccharide Park, B.H., Karpinets, T.V., Syed, M.H., Leuze, M.R., Uberbacher, E.C.,
lyases for glycogenomics. Biochem. J. 432, 437e444. 2010. CAZymes analysis toolkit (CAT): web service for searching
Mao, F.L., Yin, Y.B., Zhou, F.F., Chou, W.C., Zhou, C., Chen, H.L., and analyzing carbohydrate-active enzymes in a newly sequenced
Xu, Y., 2009. pDAWG: an integrated database for plant cell wall organism using CAZy database. Glycobiology 20, 1574e1584.
genes. BioEnergy Res. 2, 209e216. Pauly, M., Keegstra, K., 2008a. Cell-wall carbohydrates and their
Marchler-Bauer, A., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., modification as a resource for biofuels. Plant J. 54, 559e568.
DeWeese-Scott, C., Fong, J.H., Geer, L.Y., Geer, R.C., Pauly, M., Keegstra, K., 2008b. Physiology and metabolism ’Tear down
Gonzales, N.R., Gwadz, M., He, S., Hurwitz, D.I., Jackson, J.D., this wall’. Curr. Opin. Plant Biol. 11, 233e235.
Ke, Z., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Lu, S., Pena, M.J., Zhong, R., Zhou, G.K., Richardson, E.A., O’Neill, M.A.,
Marchler, G.H., Mullokandov, M., Song, J.S., Tasneem, A., Darvill, A.G., York, W.S., Ye, Z.H., 2007. Arabidopsis irregular
Thanki, N., Yamashita, R.A., Zhang, D., Zhang, N., Bryant, S.H., xylem8 and irregular xylem9: implications for the complexity of
2009. CDD: specific functional annotation with the conserved glucuronoxylan biosynthesis. Plant Cell 19, 549e563.
domain database. Nucleic Acids Res. 37, D205eD210. Penning, B.W., Hunter 3rd, C.T., Tayengwa, R., Eveland, A.L.,
Markowitz, V.M., Chen, I.M., Chu, K., Szeto, E., Palaniappan, K., Dugard, C.K., Olek, A.T., Vermerris, W., Koch, K.E., McCarty, D.R.,
Grechkin, Y., Ratner, A., Jacob, B., Pati, A., Huntemann, M., Davis, M.F., Thomas, S.R., McCann, M.C., Carpita, N.C., 2009.
Liolios, K., Pagani, I., Anderson, I., Mavromatis, K., Ivanova, N.N., Genetic resources for maize cell wall biology. Plant Physiol. 151,
Kyrpides, N.C., 2012. IMG/M: the integrated metagenome data 1703e1728.
management and comparative analysis system. Nucleic Acids Res. Persson, S., Caffall, K.H., Freshour, G., Hilley, M.T., Bauer, S.,
40, D123eD129. Poindexter, P., Hahn, M.G., Mohnen, D., Somerville, C., 2007. The
106 6. DATABASES FOR BIOENERGY-RELATED ENZYMES

Arabidopsis irregular xylem8 mutant is deficient in glucuronoxylan Sirim, D., Wagner, F., Wang, L., Schmid, R.D., Pleiss, J., 2011. The laccase
and homogalacturonan, which are essential for secondary cell wall engineering database: a classification and analysis system for lac-
integrity. Plant Cell 19, 237e255. cases and related multicopper oxidases. Database (Oxford) bar006.
Popper, Z., Michel, G., Herve, C., Domozych, D.S., Willats, W.G., Somerville, C., 2006. Cellulose synthesis in higher plants. Annu. Rev.
Tuohy, M.G., Kloareg, B., Stengel, D.B., 2011. Evolution and Cell Dev. Biol. 22, 53e78.
diversity of plant cell walls: from algae to flowering plants. Annu. Somerville, C., Bauer, S., Brininstool, G., Facette, M., Hamann, T.,
Rev. Plant Biol. 62, 567e590. Milne, J., Osborne, E., Paredez, A., Persson, S., Raab, T.,
Portnoy, T., Margeot, A., Seidl-Seiboth, V., Le Crom, S., Ben Vorwerk, S., Youngs, H., 2004. Toward a systems approach to un-
Chaabane, F., Linke, R., Seiboth, B., Kubicek, C.P., 2011. Differential derstanding plant cell walls. Science 306, 2206e2211.
regulation of the cellulase transcription factors XYR1, ACE2, and Stam, M.R., Danchin, E.G., Rancurel, C., Coutinho, P.M., Henrissat, B.,
ACE1 in Trichoderma reesei strains producing high and low levels of 2006. Dividing the large glycoside hydrolase family 13 into sub-
cellulase. Eukaryotic Cell 10, 262e271. families: towards improved functional annotations of alpha-
Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., Manichanh, C., amylase-related proteins. Protein Eng. Des. Sel 19, 555e562.
Nielsen, T., Pons, N., Levenez, F., Yamada, T., Mende, D.R., Li, J., Sun, G. MicroRNAs and their diverse functions in plants. Plant Mol.
Xu, J., Li, S., Li, D., Cao, J., Wang, B., Liang, H., Zheng, H., Xie, Y., Biol. 80, 17e36.
Tap, J., Lepage, P., Bertalan, M., Batto, J.M., Hansen, T., Le Paslier, D., Sun, J., Tian, C., Diamond, S., Glass, N.L., 2012. Deciphering tran-
Linneberg, A., Nielsen, H.B., Pelletier, E., Renault, P., Sicheritz- scriptional regulatory mechanisms associated with hemicellulose
Ponten, T., Turner, K., Zhu, H., Yu, C., Jian, M., Zhou, Y., Li, Y., degradation in Neurospora crassa. Eukaryotic Cell 11, 482e493.
Zhang, X., Qin, N., Yang, H., Wang, J., Brunak, S., Dore, J., Guarner, F., Sun, Y.H., Shi, R., Zhang, X.H., Chiang, V.L., Sederoff, R.R. Micro-
Kristiansen, K., Pedersen, O., Parkhill, J., Weissenbach, J., Bork, P., RNAs in trees. Plant Mol. Biol. 80, 37e53.
Ehrlich, S.D., 2010. A human gut microbial gene catalogue estab- Syed, M.H., Karpinets, T.V., Parang, M., Leuze, M.R., Park, B.H.,
lished by metagenomic sequencing. Nature 464, 59e65. Hyatt, D., Brown, S.D., Moulton, S., Galloway, M.D.,
Raman, B., McKeown, C.K., Rodriguez Jr., M., Brown, S.D., Uberbacher, E.C., 2012. BESC knowledgebase public portal. Bio-
Mielenz, J.R., 2011. Transcriptomic analysis of Clostridium thermo- informatics 28, 750e751.
cellum ATCC 27405 cellulose fermentation. BMC Microbiol. 11, 134. Ulvskov, P., 2011. In: Ulvskov, P. (Ed.), Annual Plant Reviews: Plant
Rawlings, N.D., Barrett, A.J., Bateman, A., 2012. MEROPS: the data- Polysaccharides, Biosynthesis and Bioengineering. Wiley-Black-
base of proteolytic enzymes, their substrates and inhibitors. well, Oxford, UK.
Nucleic Acids Res. 40, D343eD350. Vanholme, R., Morreel, K., Ralph, J., Boerjan, W., 2008. Lignin engi-
Reiter, W.D., 2008. Biochemical genetics of nucleotide sugar inter- neering. Curr. Opin. Plant Biol. 11, 278e285.
conversion reactions. Curr. Opin. Plant Biol. 11, 236e243. Wang, H., Avci, U., Nakashima, J., Hahn, M.G., Chen, F., Dixon, R.A.,
Reiter, W.D., Vanzin, G.F., 2001. Molecular genetics of nucleotide sugar 2010. Mutation of WRKY transcription factors initiates pith sec-
interconversion pathways in plants. Plant Mol. Biol. 47, 95e113. ondary wall formation and increases stem biomass in dicotyle-
Richmond, T.A., Somerville, C.R., 2000. The cellulose synthase donous plants. Proc. Natl. Acad. Sci. USA 107, 22338e22343.
superfamily. Plant Physiol. 124, 495e498. Wang, H.Z., Dixon, R.A., 2012. On-off switches for secondary cell wall
Riederer, A., Takasuka, T.E., Makino, S., Stevenson, D.M., biosynthesis. Mol. Plant 5, 297e303.
Bukhman, Y.V., Elsen, N.L., Fox, B.G., 2011. Global gene expression Wang, S., Yin, Y., Ma, Q., Tang, X., Hao, D., Xu, Y., 2012. Genome-scale
patterns in Clostridium thermocellum as determined by microarray identification of cell-wall related genes in Arabidopsis based on co-
analysis of chemostat cultures on cellulose or cellobiose. Appl. expression network analysis. BMC Plant Biol. 12, 138.
Environ. Microbiol. 77, 1243e1253. Welinder, K.G., 1992. Superfamily of plant, fungal and bacterial per-
Royo, J., Gimez, E., Hueros, G., 2000. CMP-KDO synthetase: a plant oxidases. Curr. Opin. Struct. Biol. 2, 388e393.
gene borrowed from gram-negative eubacteria. Trends Genet. 16, Weng, J.K., Chapple, C., 2010. The origin and evolution of lignin
432e433. biosynthesis. New Phytol. 187, 273e285.
Ruprecht, C., Mutwil, M., Saxe, F., Eder, M., Nikoloski, Z., Persson, S., Wu, A.M., Rihouey, C., Seveno, M., Hornblad, E., Singh, S.K.,
2011. Large-scale co-expression approach to dissect secondary cell Matsunaga, T., Ishii, T., Lerouge, P., Marchant, A., 2009. The Ara-
wall formation across plant species. Front. Plant Sci. 2, 23. bidopsis IRX10 and IRX10-LIKE glycosyltransferases are critical for
Ruprecht, C., Persson, S., 2012. Co-expression of cell-wall related glucuronoxylan biosynthesis during secondary cell wall formation.
genes: new tools and insights. Front. Plant Sci. 3, 83. Plant J. 57, 718e731.
Saier Jr., M.H., Tran, C.V., Barabote, R.D., 2006. TCDB: the transporter Xu, Z., Zhang, D., Hu, J., Zhou, X., Ye, X., Reichel, K.L., Stewart, N.R.,
classification database for membrane transport protein analyses Syrenne, R.D., Yang, X., Gao, P., Shi, W., Doeppke, C., Sykes, R.W.,
and information. Nucleic Acids Res. 34, D181eD186. Burris, J.N., Bozell, J.J., Cheng, M.Z., Hayes, D.G., Labbe, N.,
Sandhu, A.P., Randhawa, G.S., Dhugga, K.S., 2009. Plant cell wall Davis, M., Stewart Jr., C.N., Yuan, J.S., 2009. Comparative genome
matrix polysaccharide biosynthesis. Mol. Plant 2, 840e850. analysis of lignin biosynthesis gene families across the plant
Scheller, H.V., Ulvskov, P., 2010. Hemicelluloses. Annu. Rev. Plant Biol. kingdom. BMC Bioinform. 10 (Suppl. 11), S3.
61, 263e289. Yang, S., Giannone, R.J., Dice, L., Yang, Z.K., Engle, N.L.,
Seifert, G.J., 2004. Nucleotide sugar interconversions and cell wall Tschaplinski, T.J., Hettich, R.L., Brown, S.D., 2012. Clostridium
biosynthesis: how to bring the inside to the outside. Curr. Opin. thermocellum ATCC27405 transcriptomic, metabolomic and prote-
Plant Biol. 7, 277e284. omic profiles after ethanol stress. BMC Genomics 13, 336.
Seshadri, R., Kravitz, S.A., Smarr, L., Gilna, P., Frazier, M., 2007. CAM- Yin, Y., Chen, H., Hahn, M.G., Mohnen, D., Xu, Y., 2010. Evolution and
ERA: a community resource for metagenomics. PLoS Biol. 5, e75. function of the plant cell wall synthesis-related glycosyltransferase
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., family 8. Plant Physiol. 153, 1729e1746.
Ramage, D., Amin, N., Schwikowski, B., Ideker, T., 2003. Cyto- Yin, Y., Huang, J., Gu, X., Bar-Peled, M., Xu, Y., 2011. Evolution of
scape: a software environment for integrated models of biomole- plant nucleotide-sugar interconversion enzymes. PLoS One 6,
cular interaction networks. Genome Res. 13, 2498e2504. e27995.
Sibout, R., Hofte, H., 2012. Plant cell biology: the ABC of monolignol Yin, Y., Huang, J., Xu, Y., 2009. The cellulose synthase superfamily in
transport. Curr. Biol. 22, R533eR535. fully sequenced plants and algae. BMC Plant Biol. 9, 99.
REFERENCES 107
Yin, Y.B., Mao, X.Z., Yang, J.C., Chen, X., Mao, F.L., Xu, Y., 2012. Zhao, Q., Dixon, R.A., 2011. Transcriptional networks for lignin biosyn-x
dbCAN: a web resource for automated carbohydrate-active thesis: more complex than we thought? Trends Plant Sci. 16, 227e233.
enzyme annotation. Nucleic Acids Res. 40, W445eW451. Zhong, R., Lee, C., Ye, Z.H., 2010. Evolutionary conservation of the
Yokoyama, R., Nishitani, K., 2004. Genomic basis for cell-wall di- transcriptional network regulating secondary cell wall biosyn-
versity in plants. A comparative approach to gene families in rice thesis. Trends Plant Sci. 15, 625e632.
and Arabidopsis. Plant Cell Physiol. 45, 1111e1121. Zhong, R., Pena, M.J., Zhou, G.K., Nairn, C.J., Wood-Jones, A.,
Yong, W., Link, B., O’Malley, R., Tewari, J., Hunter, C.T., Lu, C.A., Richardson, E.A., Morrison 3rd, W.H., Darvill, A.G., York, W.S.,
Li, X., Bleecker, A.B., Koch, K.E., McCann, M.C., McCarty, D.R., Ye, Z.H., 2005. Arabidopsis fragile fiber8, which encodes a putative
Patterson, S.E., Reiter, W.D., Staiger, C., Thomas, S.R., glucuronyltransferase, is essential for normal secondary wall
Vermerris, W., Carpita, N.C., 2005. Genomics of plant cell wall synthesis. Plant Cell 17, 3390e3408.
biogenesis. Planta 221, 747e751. Zhong, R., Ye, Z.H., 2009. Transcriptional regulation of lignin
York, W.S., O’Neill, M.A., 2008. Biochemical control of xylan biosyn- biosynthesis. Plant Signaling Behav. 4, 1028e1034.
thesis - which end is up? Curr. Opin. Plant Biol. 11, 258e265. Zhou, C., Yin, Y., Dam, P., Xu, Y., 2010a. Identification of novel proteins
Zhang, B., Liu, X., Qian, Q., Liu, L., Dong, G., Xiong, G., Zeng, D., involved in plant cell-wall synthesis based on protein-protein
Zhou, Y., 2011. Golgi nucleotide sugar transporter modulates interaction data. J. Proteome Res. 9, 5025e5037.
cell wall biosynthesis and plant growth in rice. Proc. Natl. Acad. Zhou, F., Chen, H., Xu, Y., 2010b. GASdb: a large-scale and compar-
Sci. USA 108, 5110e5115. ative exploration database of glycosyl hydrolysis systems. BMC
Microbiol. 10, 69.

You might also like