You are on page 1of 6

mc2509.

qxd 11/09/1999 2:41 PM Page 542

542

Archaeal genomics
Terry Gaasterland
Four euryarchaeal genomes have been completely sequenced of Archaeoglobus fulgidus by TIGR [5] and Pyrococcus
and are publicly available: Methanococcus jannaschii, horikoshii by University of Tokyo, NITE and NIBHT in
Methanobacterium thermoautotrophicum, Pyrococcus Japan [6••,7]. At the time of writing, eight archaeal
horikoshii and Archaeoglobus fulgidus. Four more genome genomes are completely or nearly completely sequenced,
sequences, two crenarchaeal and two pyrococci, will soon be and sequencing of seven more genomes are underway.
released. In addition, seven more archaeal genome sequencing
projects are under way, including two halophiles, two Sequencing these archaeal genomes has led to insights into
Thermoplasma, and a methanogen. These projects cover all the mosaic nature of relationships between prokaryotic
branches of the archaeal domain and will lead to new insights proteins, particularly among metabolic, transport, and
into archaeal metabolism, DNA processing, and evolutionary DNA processing proteins. These insights in turn have led
relationships with the Bacteria and Eukarya. to a deeper molecular understanding of differences and
similarities between organisms in the major archaeal phe-
Addresses notypes. The genome sequencing projects are being
Laboratory of Computational Genomics, 1230 York Avenue, The followed up by proteome projects that involve whole
Rockefeller University, New York, New York 10021, USA; genome two dimensional gel electrophoresis, mass spec-
e-mail: gaasterl@genomes.rockefeller.edu
trometry, and high-throughput protein structure
Current Opinion in Microbiology 1999, 2:542–547 determination.
1369-5274/99/$ — see front matter © 1999 Elsevier Science Ltd.
All rights reserved. The Archaea
Four phenotypes characterize the Archaea broadly and dis-
Abbreviations tinguish them from Bacteria. Archaeal organisms include
Mb million basepairs
ORF open reading frame
methanogens, sulfur-reducing thermophiles, sulfur-
rRNA ribosomal RNA dependent thermophiles, and halophiles. Genomes from
all three categories are being sequenced or have already
been released. The methanogens (M. jannaschii and
Introduction M. thermoautotrophicum) are strict anaerobes and use varia-
Since the 1977 discovery of the Archaea [1], their existence tions of methanogenesis to convert CO 2, methyl
has challenged biologists to rethink the phylogenetic rela- compounds, or acetate to methane. Methanogenesis serves
tionships between organisms. The placement of the as a form of anaerobic respiration. In the sulfur-reducing
Archaea as a third domain of life, which shattered the tra- thermophiles (Archaeoglobus fulgidus), oxidized sulfur
ditional view of the prokaryote/eukaryote dichotomy, was species act as electron acceptors for anaerobic respiration.
initially argued on the basis of the careful analysis of small- Sulfur-dependent Archaea (Sulfolobus solfataricus) can grow
subunit ribosomal RNA (rRNA). Besides the RNA autotrophically using elemental sulfur as an energy source.
evidence, organisms from the newly discovered phyloge- The halophiles (Halobacterium salinarum) require a high-
netic domain exhibited compelling characteristics that set salt environment. In all Archaea, membranes are made
them apart from Eubacteria (e.g. methanogenesis, cell from ether-linked lipids bonded to glycerol, and thus differ
membranes with ether-bonds, and cell walls without pep- substantially from bacterial membranes. Like the eukary-
tidoglycan). At the same time, a large proportion of their otes, archaeal cell walls contain no peptidoglycan, again
metabolism seems to be shared to a large degree with bac- setting them apart from bacteria. The Thermoplasma differ
terial organisms, yet their transcription and translation somewhat from the other Archaea: they have no cell wall,
apparatus as well as DNA metabolism are shared more and their cell membranes contain tetraether lipids with
closely with eukaryotic organisms. mannose and glucose subunits.

In 1993, the Department of Energy (DOE) funded the Phylogenetic analysis of small-subunit rRNA sequences
sequencing of three complete archaeal genomes: distinguishes two distinct archaeal sub-domains: the
Methanococcus jannaschii, as a collaboration between The Euryarchaeotes and the Crenarchaeotes [1]. The eur-
Institute for Genome Research (TIGR) and the University yarchaeotes include methanogens, halophiles, and
of Illinois at Urbana-Champaign [2]; Methanobacterium ther- sulfur-reducing thermophiles. Although methanogenesis is
moautotrophicum, as a collaboration between Genome uniform in two of the three major methanogenic eur-
Therapeutics and Ohio State University [3]; and Pyrococcus yarchaeal lineages, variations occur within the
furiousus, at the University of Utah and the University of Methanomicrobiales lineage [8]. The latest branching
Maryland [4]. The first two archaeal genome sequences methanogenic euryarchaeal lineage, Methanomicrobiales,
were released in 1996 and 1997, respectively, and were gave rise to the extreme halophiles and sulfate-reducing
soon complemented by the release of genome sequences Archaea. The crenarchaeotes share a 16S rRNA signature
mc2509.qxd 11/09/1999 2:41 PM Page 543

Archaeal genomics Gaasterland 543

Figure 1

Phylogenetic distribution of ORFs across


nine genomes. An ORF is labeled as BAE, 100%

Percent of genomic open reading


BA, BE, A, AE, B or orphan as follows: the
label contains a B, A, or E if its translated 80%
protein matches a translated ORF from any Orphan
other genome in the Bacteria, Archaea or A
Eukaryotes, respectively. The label for an 60% B

frames
ORF contains the phylogenetic domain of its BA
own genome if its translation matches at 40% AE
least one ORF in any other genome. ORFs BE
that have matches in no other genome, BAE
within this group of nine genomes, are 20%
‘orphans’. Most ORFs in the two reduced
Mycoplasma genomes match ORFs in other 0%
genomes. The proportions of B (bacterial

us
.
ae

us

t
i
m
li

rm

as
hi
ia
co

liu

ric
nz

cc
only) and A (archaeal only) ORFs are

sc
on

he

Ye
ita

ta
E.

co

na

.t
um
lu

lfa
influenced by the proximity of another

en

an

M
nf

ch
ne

so
.g
.i

.j
genome — for example, E. coli and

ne
.p
H

M
M

S.
Sy
M
H. influenzae are closer to each other than Genomes
any other pair of genomes, as are the two Current Opinion in Microbiology
Mycoplasma. Of particular interest is the
distribution of BAE (phylogenetically complement of AE ORFs is much smaller in ORFs. Before the sequencing of complete
universal), BA (bacterial–archaeal), Archaea than the BA or BAE complement. genomes, the AE proportion was expected
AE (archaeal–eukaryotic), and Similarly, in yeast, the complement of BE and to be much larger.
BE (bacterial–eukaryotic) ORFs: the BAE ORFs is much smaller than the AE

that places them at a deeper branch point than the eur- small size of the complement of phylogenetically universal
yarchaeotes within the archaeal domain. Crenarchaeotes genes (genes with a counterpart in each of the three phylo-
are in many instances sulfur-dependent thermophiles, and genetic domains, or ‘BAE’ genes). As evident in the public
have initially been regarded as more homogeneous than annotations of the organisms, the ORFs of known function
the euryarchaotes [9]. However, isolation of small-subunit in the subset of archaeal coding regions shared exclusively
rRNA from the open environment and the discovery with bacterial organisms (‘BA’ ORFs) primarily encode
of Crenarchaeum symbiosum has led to the characterization pathways of bacterial metabolism [2,3,5,14•,15•]. The sub-
of deeply divergent lineages of low-temperature set of genes shared among all three domains (‘BAE’) fails to
Crenarchaeota [10,11]. comprise a coherent subset of metabolism; nor do they
encode consistently complete metabolic pathways [13••].
Comparative genomics Figure 1 shows the phylogenetic distribution of ORFs in
The archaeal challenge to phylogeny has continued with nine genomes [12•].
each new release of a completely sequenced archaeal
genome. Like the bacterial genomes, the archaeal genomes The existence of large numbers of archaeal genes shared
contain a significant proportion (25–68%, depending on exclusively with eukaryotes and larger numbers of genes
comparison methods) of putative coding regions with no shared with bacteria is difficult to interpret. One view is
similarity to any sequence in any other organism. A signifi- that the archaeal genomes are a mosaic, reflecting a com-
cant proportion of putative coding regions have matches bination of eukaryotic and bacterial features together with
only among other archaeal genomes (10–15%); a large num- new features unique to the Archaea [16]. Another inter-
ber of the corresponding protein sequence families have no pretation, consistent with the small-subunit rRNA tree,
known function [12•]. Of the coding regions shared exclu- suggests that the Archaea were involved in the formation
sively between Archaea and the eukaryotes yeast or of the eukaryotes, as were bacterial organisms [1,17,18•]. A
Caenorhabditis elegans (25–30% of archeael coding regions), a complementary view suggests that substantial transfers of
large proportion (30% of the archaeal-eukolyotic coding genes among prokaryotes occured prior to the formation of
regions, ‘AE’ genes) encode proteins involved in transcrip- the eukaryotic cell [19•]. The debate has led to proposals
tion, translation, and DNA metabolism [3,13••]. This is that classification methods return in part to morphological
consistent with pre-genome observations. However, the features such as properties of the cell wall, argued com-
other 70% of the AE coding regions encode proteins from pellingly by Gupta [20].
all protein functional categories, and a substantial number
are of unknown function. Also striking is the small size of The completion of the genome sequence from two deep-
the complement of coding regions shared among archaeal branching thermophilic Bacteria, Aquifex aeolicus and
and bacterial genomes to the exclusion of the eukaryotes Thermatoga maritima, provided data to explore the degree
(genes characteristic of prokaryotes, or ‘BA’ genes) and the to which these bacterial organisms exhibit evidence of
mc2509.qxd 11/09/1999 2:41 PM Page 544

544 Genomics

lateral transfer between Archaea and Bacteria [21••]. For Japanese Agency of Industrial Science and Technology.
example, 108 Thermatoga genes have orthologues exclu- Together, these 15 archaeal genomes cover all known
sively in Aquifex and Archaea, 81 of which occur in 15 major branches of the domain Archaea (see
clusters in the genome [22••]. This number is much high- http://genomes.rockefeller.edu/genomelist for links to
er for Aquifex and Thermatoga than for the other bacterial each project).
genomes; these thermophilic genomes encode more pro-
teins that are close to Archaea than do the other bacterial Notable differences among Archaea
genomes. Their close proximity in the genome indicates The completion of sequencing the sulfur-metabolizing
that they may be present by virtue of lateral transfer. A. fulgidus 2.2 Mb genome provided genomic data from the
first non-methanogenic Archaea. Prediction of transmem-
The complete genomes also provide data to infer proper- brane domains in A. fuldigus and a follow-up annotation of
ties of proteins that must have been present in a common M. jannaschii (1.7 Mb) [5,23] indicated that the two organ-
ancestor, as well as properties that may pinpoint the basis isms have substantial differences in their regulatory,
of divergence. For example, the structure of the EgsA transport, and sensory functions. For example, of 61
(enantiomeric glycerophosphate synthetase) protein from Archaeoglobus ORFs annotated as transport and binding
M. thermoautotrophicum and the functionally related protein proteins, 30 transport branched-chain amino acids, 10 are
NAD(P)-linked sn-glycerol-3-phosphate dehydrogenase ABC transporters, six are proline permeases, and four
(G-3-P) from Escherichia coli have no sequence similarity, are spermidine/putrescine transporters. Of the 68 trans-
yet share a structure and have analogous functions [18•]. port-and-binding Methanococcus ORFs, seven transport
Since both proteins participate in membrane formation in branched-chain amino acids, 27 are ABC transporters; and
their respective organisms, one may hypothesize that 24 transport cations. The profile of transporters helps to
Archaea and Bacteria may have differentiated because of characterize precisely the differences between a sulfur
changes in membrane composition. reducer (Archeoglobus) and a methanogen.

The genomes The 1.8 Mb M. thermoautotrophicum genome revealed that


The complete and nearly-complete sequencing of archaeal gene order of orthologous genes is not preserved with
genomes will provide data to examine the dichotomy respect to M. jannaschii. Even so, 19% of the ORFs from
between ‘eukaryotic Archaea’ and ‘bacterial Archaea’ in M. thermoautotrophicum have counterparts in M. jannaschii
much more detail than can currently be done. So far, the with 50% or more sequence identity. Proteins of unknown
four released archaeal genomes have all been euryarchaeal function in this large complement of highly similar pro-
organisms. Two Crenarchaotes genome sequences are teins are likely to be related to methanogenesis.
complete or nearly complete: Sulfolobus solfataricus (3.1 Comparison of P. horikoshii to available sequence from
million basepairs [Mb]), which was initiated in Canada and P. furiosus shows considerable gene rearrangement and
is being completed through a Canadian and European con- divergence in gene content [6••]; even so, orthologous
sortium, and Pyrobaculum aerophilum (2.2 Mb), which is genes are about 70% identical.
being sequenced at Caltech with UCLA. At the time of
writing, 80% of the S. solfatoricus genome sequence is pub- Certain genes encoding tRNA molecules and tRNA syn-
licly available (http://niji.imb.nrc.ca/sulfolobus), and the thetases are substantially different or missing entirely in
Pyrobaculum genome sequence is complete but not yet Archaea. Specifically, lysyl-, asparginyl-, cysteinyl- and
released. In addition to the completely sequenced glutaminyl-tRNA synthetases were absent in several ini-
Euryarchaeote P. horikoshii genome [6••,7], two very closely tial Archaeal genome annotations [24]. The selenocys-
related pyrococci are also nearly completely sequenced: teinyl-tRNA gene could not be identified in
Pyrococcus abyssii (1.8 Mb), by a French genome project, M. thermoautotrophicum [3]. Cysteinyl- and lysyl-tRNA
and P. furiosus (2.1 Mb), mentioned earlier as one of the synthetases unique to Archaea have since been identified
original DOE projects. [25,26]. A surprise was the observation that lysyl-tRNA
synthetase from Borrellia burgdorferi, the bacterium caus-
In addition to these eight archaeal genomes, another seven ing Lyme disease, is of the archaeal class I-type tRNA
archaeal genome sequencing projects are underway. These synthetase, a unique phenomenon so far among the
include Aeropyrum pernix K1 (1.7 Mb) at the Biotechnology Bacteria [27]. The new lysyl-tRNA synthetase was com-
Center in Japan; Crenarchaeum symbiosum (2–3 Mb) at pletely different from other known lysyl-tRNA syn-
Diversa Corporation; Halobacterium sp. NRC-1 (2.5 Mb) at thetases, indicating that other ‘missing’ functions that are
the University of Washington and the University of highly conserved in the Bacteria and eukaryotes may have
Massachusetts; Halobacterium salinarum (4 Mb) and a different molecular origin in Archaea. The discovery of
Thermoplasma acidophilum (1.7 Mb) at the Max-Planck- the archaeal type of lysyl-tRNA synthetase in B. burgdor-
Institute for Biochemistry; Methanosarcina mazei (2.8 Mb) feri, a bacterial organism, indicates the possibility of later-
at the Goettingen Genomics Laboratory; and al transfer. Since the protein is dissimilar from its
Thermoplasma volcanium GSS1, a collaborative effort functional counterparts in other bacterial organisms, it
between seven research institutions and funded by the becomes a candidate for anti-spirochete therapeutics. The
mc2509.qxd 11/09/1999 2:41 PM Page 545

Archaeal genomics Gaasterland 545

archaeal form of lysyl tRNA-synthetase in B. burgdoreri published protein structures, implying biased selection
opens the possibility that alternative forms of this and the among current solved structures.
other ‘missing’ archaeal tRNA synthetases may also have
undergone lateral transfer from archaeal to bacterial organ- The archaeal proteome: two-dimensional gel
isms. Alternatively, both forms may have been present in isolation of proteins
a common ancestor and may be more ubiquitous than pre- As a follow-up to the sequencing of complete archaeal
viously thought or more frequently lost. genomes, the US Department of Energy has established
the Archaeal Proteomics Project as a collaboration between
Novel operons Argonne National Laboratory, the University of Georgia,
A striking feature of the archaeal genomes has been their and the University of Illinois. This project will use two-
novel organization of closely located genes. For example, dimensional gel electrophoresis and matrix-assisted laser
the S. solfataricus genome contains a cluster of nine histi- desorption ionization mass spectrometry to purify and char-
dine biosynthesis genes that occur in a different order from acterize proteins expressed in Archaea under varying
previously known his operons [28,29]. Both Sulfolobus and conditions, with a focus on conditions that will reveal path-
Pyrobaculum contain novel pairs of closely located neigh- ways related to environmental bioremediation [41]. The
boring ORFs that encode members of the same metabolic project uses subcellular fractionation to compartmentalize
pathway [15•,30]. P. furiosus contains a number of new archaeal proteins. In addition to identifying components of
putative operons compared with P. horikoshii, including potential new pathways, this work will help to confirm actu-
gene clusters for maltose and trehalose transport; phos- al coding regions that are unique to archaeal organisms.
phate uptake; the urea and TCA cycle; as well as
tryptophan, aromatic amino acid, arginine, isoleucine, and DNA conformation, control, and repair
valine metabolism. Having the full sequence of a genome enables studies that
relate biochemical or structural properties to patterns or
The archaeal proteome: structural genomics positions genome wide. One example of such a study
Since thermophilic archaeal proteins are stable at high involves the discernment of DNA sequence patterns asso-
temperatures, they are easier to isolate than mesophilic ciated with archaeal nucleosome positioning [42•], which
proteins after over-expression in E. coli. This fact has led revealed that histone assemblies preferentially center on
several structural genomics pilot projects to adopt archaeal (CTG)6 and (CTG)8 repeats. This discovery builds on
genomes as the source organism. The goal of structural earlier work [43] that characterized archaeal histones as
functional genomics is to acheive high-throughput struc- tetramers that bind to approximately 60 basepairs.
ture determination of proteins across entire genomes. A Archaeal nucleosome structure appears homologous to that
pilot project at UC Berkeley and Lawrence Livermore of eukaryal nucleosomes, yet remain stable and functional
National Laboratory seeks to solve structures in M. jan- at high temperatures [44,45••,46]. Computational compar-
naschii [31•,32,33••,34,35]. This project has produced high- ison of 16 prokaryotic genomes has led to the observation
confidence folds for 33 sequences of unknown function in that hairpin formation occurs much less frequently in four
M. genitalium, M. jannaschii, and Haemophilus influenzae archaeal and eight bacterial genomes, supporting the theo-
[31•]. They have also solved the structure for a small heat- ry that the Archaea use hairpins to control transcription
shock protein from M. jannaschii, which will help us to termination differently from E. coli, H. influenzae,
understand how small heat-shock proteins protect Bacillus subtilis and Chlamydia trachomatis [47•]. The
substrates from denaturation [33••]. A structural genomics Archaea exhibit less dinucleotide bias on coding strands
project at UCLA and Los Alamos National Laboratory has than do bacterial organisms [48], indicating possible differ-
grown out of the P. aerophilum genome sequencing project, ences in DNA repair mechanisms and ambiguating the
and recently solved the structure of the translation initia- computational detection of origin of replication.
tion factor 5a [36]. A project in Toronto is focusing on
M. thermoautotrophicum, and a project at Argonne National Conclusions
Laboratory [37], although focused on the discovery of new Each gene in an organism has its own phylogenetic back-
protein folds, uses thermophiles as the sequence source ground. The complete genome sequences are giving us
whenever possible. evidence for collective events as well as individual trans-
fers and cross-overs. For example, the complement of
A key to the success of structural functional genomics will proteins shared most closely between the thermophilic
be the ability to generate high-quality predicted protein Bacteria and the Archaea are arranged in large clusters
structures based on alignments with sequences of known within the bacterial genomes. In contrast, the class I-type
structures. Several groups have initiated databases of pre- lysyl-tRNA synthetase that occurs in several Archaea and
dicted structures, including [38 •] which includes in B. burgdorferi participates in pathways that are otherwise
predictions for M. jannaschii. So far, genome-wide surveys usual for the respective organisms. Key events, such as the
of secondary structure and integral membrane domains differentiation of membrane composition proteins, may
give consistent distributions across all organisms [39••,40•] mark the differentiation between Bacteria and Archaea.
but differ substantially from distributions among The 15 complete archaeal genomes will lead to a much
mc2509.qxd 11/09/1999 2:41 PM Page 546

546 Genomics

deeper understanding of the phylogenetic domain Archaea is very small compared to the complements restricted to Bacteria and
Archaea, or shared among prokaryotes and yeast.
and the evolution of life on Earth.
14. Ragan M, Gaasterland T: A prokaryotic view of the yeast genome.
• J Microb Comp Genomics 1998, 3:219-235.
Using the methodology introduced in [12•], the authors analyze the comple-
Acknowledgements ment of proteins from the yeast that have counterparts in prokaryotic organ-
Many thanks to Christoph W Sensen for his comments, corrections, and isms cover most of central metabolism, yet the non-prokaryotic proportion of
improvements to this paper. yeast is relatively large. The complement of proteins shared exclusively
among Archaea and yeast is overabundant in proteins involved in transcrip-
tion, transplantation and DNA processing. However, these categories cover
only 25 of those proteins. The paper analyzes distribution of bacterial coun-
References and recommended reading terparts to yeast mitochondrial biogenesis nuclear proteins nad yeast mito-
Papers of particular interest, published within the annual period of review, chondrial proteins, and finds that matches to bacterial organisms are broadly
have been highlighted as: distributed and inclusive.
• of special interest 15. Sensen C, Charlebois R, Chow C, Clausen I, Curtis B, Doolittle W,
•• of outstanding interest • Duguet M, Erauso G, Gaasterland T, Garrett R et al.: Completing the
sequence of the Sulfolobus solfataricus P2 genome. Extremophiles
1. Woese C: The Archaea: their history and significance. In The
1998, 2:305-312.
Biochemistry of Archaea (Archaeabacteria). Edited by Kates M. New
This paper lays out the underlying strategy to accomplishing a distributed
York: Elsevier Science Publishers; 1993:v-xxxix.
genome sequencing project and annotating the genome. An analysis of pro-
2. Bult C, White O, Olsen G, Zhou L, Fleischmann R, Sutton G, Blake J, tein function, repeat structure, and insertion elements is given for the largest
FitzGerald L, Clayton R, Gocayne J, Kerlavage A et al.: Complete archaeal genome undergoing sequencing.
genome sequence of the methanogenic archaeon, 16. Koonin E, Mushegian A, Galperin M, Walker D: Comparison of
Methanococcus jannaschii. Science 1996, 273:1058-1073. archaeal and bacterial genomes: computer analysis of protein
3. Smith D, Doucette-Stamm L, Deloughery C, Lee H, Dubois J, sequences predicts novel functions and suggests a chimeric
Aldredge T, Bashirzadeh R, Blakely D, Cook R, Gilbert K et al.: origin for the archaea. Mol Microbiol 1997, 25:617-637.
Complete genome sequence of Methanobacterium 17. Feng D, Cho G, Doolittle R: Determining divergence times with a
thermoautotrophicum deltaH: functional analysis and comparative protein clock: update and reevaluation. Proc Natl Acad Sci USA
genomics. J Bacteriology 1997, 179:7135-7155. 1997, 94:13028-13033.
4. Weiss R, Dunn D, Stump M, Yeh R, Cherry J, Robb F: The genome 18. Koga Y, Kyuragi T, Nishihara M, Sone N: Did archaeal and bacterial
sequence of a hyperthermophilic archaeon: Pyrococcus furiosus. • cells arise independently from noncellular precursors? A
In Microbial Genome Project Section, DOE Human Genome hypothesis stating that the advent of membrane phospholipid
Program Contractor-Grantee Workshop VII: 1999, Jan, 12—16, with enantiomeric glycerophosphate backbones caused the
Oakland, CA. Washington, DC: Department of Energy; 1999:157. separation of the two lines of descent. J Mol Evol 1998, 46:54-63.
5. Klenk H, Clayton R, Tomb J, White O, Nelson K, Ketchum K, Dodson The authors relate the structure of the EgsA (enantiomeric glycerophosphate
R, Gwinn M, Hickey E, Peterson J et al.: The complete genome synthase) protein from M. thermoautotrophicum to the functionally related
sequence of the hyperthermophilic, sulphate-reducing archaeon protein NAD(P)-linked sn-glycerol-3-phosphate (G-3-P) dehydrogenase
Archaeoglobus fulgidus. Nature 1997, 360:364-370. from E. coli. The complete lack of sequence similarity indicates that the
stereostructure of cell membrane phospholipids has been since the last
6. Kawarabayasi Y, Sawada M, Horikawa H, Haikawa Y, Hino Y, common ancestor. This leads to a hypothesis that Archaea and Bacteria dif-
•• Yamamoto S, Sekine M, Baba S, Kosugi H, Hosoyama A et al.: ferentiated by membrane composition: G-1-P in Archaea and G-3-P in
Complete sequence and gene organization of the genome of a Bacteria.
hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3.
DNA Research 1998, 5:55-76. 19. Rivera M, Jain R, Moore J, Lake J: Genomic evidence for two
The complete genome of the hyperthermophilic archae Pyrococcus • functionally distinct gene classes. Proc Natl Acad Sci USA 1998,
horikoshii, presented in this paper, has many duplicated open reading 95:6239-6244.
frames (ORFs). 22% of encoded ORFs have no similiarity to sequences any The complements of proteins from whole genomes in each phylogenetic
other organism. domain are analyzed to search for distinct classes of proteins. The authors
demonstrate that prokaryotic genes diverge into two groups: information
7. Gonzalez J, Masuchi Y, Robb F, Ammerman J, Maeder D, processing genes and non-informational metabolic genes.
Yanagibayashi M, Tamaoka J, Kato C: Pyrococcus horikoshii sp. nov.,
a hyperthermophilic archaeon isolated from a hydrothermal vent 20. Gupta R: What are archaebacteria: life’s third domain or
at the okinawa trough. Extremophiles 1998, 2:123-130. monoderm prokaryotes related to Gram-positive bacteria? A new
proposal for the classification of prokaryotic organisms. Mol
8. Danson M: Central metabolism of the Archaea. In The Biochemistry Microbiol 1998, 29:695-707.
of Archaea (Archaeabacteria). Edited by Kates M. New York: Elsevier
Science Publishers; 1993:1-24. 21. Deckert G, Warren P, Gaasterland T, Young W, Lenox A, Graham D,
•• Overbeek R, Snead M, Keller M, Aujay M et al.: The complete
9. Woese C, Kandler O, Wheelis M: Towards a natural system of genome of the hyperthermophilic bacterium Aquifex aeolicus.
organisms: proposal for the domains archaea, bacteria, and Nature 1998, 392:353-358.
eucarya. Proc Natl Acad Sci USA 1990, 87:4576-4579. The functional annotation of the first complete genome of a hyperther-
mophilic bacterium is presented. The early-branching bacterium Aquifex
10. DeLong E: Archaea in coastal marine environments. Proc Natl aeolicus shares more genes exclusively with archaeal organisms than other
Acad Sci USA 1992, 89:5685-5689. bacterial organisms. Even so, the metabolic and structural genes remain
broadly bacterial.
11. Fuhrman J, McCallum K, Davis A: Novel major archaebacterial
group from marine plankton. Nature 1992, 356:148-149. 22. Nelson K, Clayton R, Gill S, Gwinn M, Daniel R, Haft H, Hickey E,
•• Peterson J, Nelson W, Ketchum K et al.: Evidence for lateral gene
12. Gaasterland T, Ragan M: Constructing multigenome views of transfer between archaea and bacteria from genome sequence of
• whole microbial genomes. J Microb Comp Genomics 1998, Thermotoga maritima. Nat Genet 1999, 399:323-329.
3:177-192. This paper reports on the complete genome of the early-branching ther-
This paper presents a methodology for comparing complements of proteins mophile Thermotoga maritima. A large complement of genes involved in
encoded in whole microbial genomes. The approach enables queries that degradation of sugars and plant polysaccharides are shared exclusively with
seek correlations between conservation of genomic distribution and conser- the thermophilic bacterium Aquifex aeolicus and Archaea. This genome has
vation of functional category. a larger percentage of genes shared exclusively with Archaea than does
13. Gaasterland T, Ragan M: Phyletic and functional patterns of ORF Aquifex aeolicus.
•• distribution among prokaryotes. J Microb Comp Genomics 1998, 23. Kyrpides N, Olsen G, Klenk H, White O, Woese C: Methanococcus
3:199-217. jannaschii genome: revisited. J Microb Comp Genomics 1996,
The methodology introduced in [8] is applied to the analysis of whole 1:329-338.
genomes from prokaryotic organisms, with yeast as an outlier organism. The
analysis reveals that proteins that occur only in Bacteria or only in Archaea 24. Kim H, Vothknecht U, Hedderich R, Celic I, Soll D: Sequence
are not restricted to any functional category. Surprisingly, the complement of divergence of seryl-trna synthetases in archaea. J Bacteriol 1998,
proteins shared exclusively among both Bacteria and Archaea (prokaryotes) 180:6446-6449.
mc2509.qxd 11/09/1999 2:41 PM Page 547

Archaeal genomics Gaasterland 547

25. Brown J, Robb F, Weiss R, Doolittle W: Evidence for the early these yeast proteins had experimentally determined structures. A database
divergence of tryptophanyl- and tyrosyl-trna synthetases. J Mol system makes the results available on the World Wide Web at
Evol 1997, 45:9-16. http://guitar.rockefeller.edu/modbase.
26. Ibba M, Morgan S, Curnow A, Pridmore D, Vothknecht U, Gardner W, 39. Wallin E, von Heijne G: Genome-wide analysis of integral
Lin W, Woese C, Soll D: A euryarchaeal lysyl-trna synthetase: •• membrane proteins from eubacterial, archaean, and eukaryotic
resemblance to class I synthetases. Science 1997, 278:1119- organisms. Protein Science 1998, 7:1029-1038.
1122. This paper presents a statistical analysis of integral membrane proteins
encoded in the complete genomes of organisms from all phylogenetic
27. Ibba M, Bono J, Rosa P, Soll D: Archaeal-type lysyl-trna synthetase domains. One-fifth to one-third of encoded proteins are predicted to be
in the lyme disease spirochete Borrelia burgdorferi. Proc Natl membrane proteins, and most are predicted to have more positively charged
Acad Sci USA 1997, 94:14383-14388. residues in the cytoplasmic segments than in the external segments.
28. Charlebois R, Sensen C, Doolittle W, Brown J: Evolutionary analysis 40. Gerstein M: Patterns of protein-fold usage in eight microbial
of the hisCGABdFDEHI gene cluster from the archaeon • genomes: a comprehensive structural census. Proteins 1998,
Sulfolobus solfataricus P2. J Bacteriol 1997, 179:4429-4432. 33:518-534.
29. Fani R, Mori E, Tamburini E, Lazcano A: Evolution of the structure This paper reports on the distribution of known folds across proteins from
and chromosomal distribution of histidine biosynthetic genes. complete genomes. The approach counted protein sequences with suffi-
Orig Life Evol Biosph 1998, 28:555-570. cient sequence similarity to sequences with known structure to indicate
conservation of structure. The resulting fold distribution is distinct from that
30. Fitz-Gibbon S, Choi A, Miller J, Stetter K, Simon M, Swanson R, Kim in the protein databank (PDB) of protein molecule structures.
U: A fosmid-based genomic map and identification of 474 genes
of the hyperthermophilic archaeon Pyrobaculum aerophilum. 41. Giometti C, Tollaksen S, Liang X, Adams M, Holden J, Menon A, Schut
Extremophiles 1997, 1:36-51. G, Reich C, Olsen G: Archaeal proteomics. In Microbial Genome
Project Section, DOE Human Genome Program Contractor-Grantee
31. Dubchak I, Muchnik I, Kim S: Assignment of folds for proteins of Workshop VII: 1999 Jan 12—16, Oakland, CA. Washington, DC:
• unknown function in three microbial genomes. J Microb Department of Energy; 1999:150.
Comp Genomics 1998, 3:171-175.
The authors use a neural-network based approach to predict protein struc- 42. Sandman K, Reeve J: Archaeal nucleosome positioning by CTG
tures for protein sequences of unknown function encoded in M. genitalium, • repeats. J Bacteriology 1999, 181:1035-1038.
M. jannaschii, and H. influenzae. The method was able to produce high-con- The authors demonstrate that archaeal histones assemble on genomic DNA
fidence folds for 33 sequences of unknown function. preferentially in (CTG)6 and (CTG)8 repeats. Archaeal histones also recog-
nize eukoryotic nucleosome positioning signals.
32. Kim K, Hung L, Yokota H, Kim R, Kim S: Crystal structures of
eukaryotic translation initiation factor 5a from Methanococcus 43. Pereira S, Reeve J: Histones and nucleosomes in Archaea and
jannaschii at 1.8 Å resolution. Proc Natl Acad Sci USA 1998, Eukarya: a comparative analysis. Extremophiles 1998, 2:141-148.
95:10419-10424. 44. Bell S, Jaxel C, Nadal M, Kosa P, Jackson S: Temperature, template
33. Kim K, Kim R, Kim S: Crystal structure of a small heat-shock topology, and factor requirements of archaeal transcription. Proc
•• protein. Nature 1998, 394:595-599. Natl Acad Sci USA 1998, 95:15218-15222.
A new structure for a small heat-shock protein (sHSP) from M. jannaschii 45. Li W, Grayling R, Sandman K, Edmondson S, Shriver J, Reeve J:
has been solved as a part of a high-throughput structural genomics proto- •• Thermodynamic stability of archaeal histones. Biochemistry 1998,
type project. This sHSP structure helps to understand how small heat-shock 37:10563-10572.
proteins protect substrates from denaturation. A comprehensive analysis of the effect of temperature, salt and pH on the
34. Kim K, Yokota H, Kim R, Kim S: Cloning, expression, and stability of archaeal histones yields insights into the relationships between
crystallization of a hyperthermophilic protein that is homologous stability and structural features. Unfolding is found to be 90 reversible with
to the eukaryotic translation initiation factor, eIF5A. Protein recovery of function. The paper discusses how structural features may lead
Science 1997, 6:2268-2270. to differences in stability.

35. Zarembinski T, Hung L, Mueller-Dieckmann H, Kim K, Yokota H, Kim R, 46. Pereira S, Grayling R, Lurz R, Reeve J: Archaeal nucleosomes. Proc
Kim S: Structure-based assignment of the biochemical function of Natl Acad Sci USA 1997, 94:12633-12637.
a hypothetical protein: a test case of structural genomics. Proc 47. Washio T, Sasayama J, Tomita M: Analysis of complete genomes
Natl Acad Sci USA 1998, 95:15189-15193. • suggests that many prokaryotes do not rely on hairpin formation
36. Peat T, Newman J, Waldo G, Berendzen J, Terwilliger T: Structure of in transcription termination. Nucleic Acids Research 1998,
translation initiation factor 5a from Pyrobaculum aerophilum at 26:5456-5463.
1.75 Å resolution. Structure 1998, 6:1207-1214. The authors compute free energies of mRNA hairpin structures near stop
codons across complete annotated bacterial and archaeal genomes. Four of
37. Gaasterland T: Structural genomics: bioinformatics in the driver’s 16 bacterial genomes show a significant increase in strong hairpins at 30 bp
seat. Nat Biotech 1998, 16:625-627. downstream from stop condons. The other bacterial genomes and the
archaeal genomes show no indication of hairpins corresponding to down-
38. Sáchez R, Šali A: Large-scale protein structure modeling of the stream regions of stop codons. This indicates that hairpin formation at tran-
• Saccharomyces cerevisiae genome. Proc Natl Acad Sci USA 1998, scription termination sites may be restricted to a subset of organisms.
95:13597-13602.
The authors have carried out a genome-wide prediction of structures for pro- 48. Mrazek J, Karlin S: Strand compositional asymmetry in bacterial
teins encoded in the yeast genome. They were able to build homology mod- and large viral genomes. Proc Natl Acad Sci USA 1998,
els for the 3D structure for domains in 1071 proteins in yeast. Only 40 of 95:3720-3725.

You might also like