Professional Documents
Culture Documents
Abstract
Introduction
All plastids trace their origin to a primary endosymbiotic
event in which an ancestral cyanobacterium was integrated
into a previously nonphotosynthetic eukaryote more than
a billion years ago. The ancient alga resulting from this primary endosymbiotic event evolved into the Glaucophyta,
Rhodophyta (red algae), and Viridiplantae (green algae and
terrestrial plants), which together are referred to as the
Plantae or Archaeplastida (Reyes-Prieto et al. 2007; Gould
et al. 2008). After the primitive green and red algae were
established, plastids then spread into other lineages of eukaryotes through secondary endosymbiotic events in which
a red or a green alga was integrated into a previously nonphotosynthetic eukaryote (Reyes-Prieto et al. 2007; Gould
et al. 2008).
Reminiscent of their free-living ancestor, plastids have
retained the bulk of their bacterial biochemistry but lost
;90% of the bacterial genome. Over time, many genes
essential to plastid function have relocated from the ancestral plastid genome to the nucleus, and the gene products
are now targeted from the eukaryotic cytosol into
plastids (Timmis et al. 2004). However, certain proteins
of cyanobacterial origin have evolved to function in cellular
compartments other than plastids (Abdallah et al. 2000;
The Author 2009. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: journals.permissions@oxfordjournals.org
581
Research article
Plastids including chloroplasts arose from a cyanobacterial endosymbiont and have retained their own genome, but the
size has been reduced to less than one-tenth of the original bacterial genome. Over time, genes essential to plastid function
have been transferred from the ancestral plastid genome to the nucleus, and the gene products are now targeted into the
plastid from the host cytosol. However, phylogenetic analyses have suggested that the functions of certain original proteins
encoded by the endosymbiont genome have been replaced by nucleus-encoded proteins of noncyanobacterial origin and
that several proteins have been newly added to maintain and control plastids. In order to evaluate the rate and origin of
noncyanobacterial proteins that have contributed to the establishment of the plastid proteome, we performed
phylogenetic analyses of plastid-targeted proteins that are shared by the red alga Cyanidioschyzon merolae (455 proteins)
and the Viridiplanta Arabidopsis thaliana (744 proteins). Our results show that approximately 40% of the plastid proteome
common to red algae and green plants originated from genes of both the ancestral eukaryotic host and various lineages of
bacteria (eubacteria) other than cyanobacteria. The replacement or addition of components was frequently observed for
most of the plastid functions except for the light reaction of photosynthesis and the translation and degradation of
proteins in the plastid. These results suggest that a considerable amount of bacterial metagenomic material, as well as the
genomes of the host and the endosymbiont, has contributed to the establishment of the plastid before the split of the red
and green algae.
MBE
were used as queries against the local database. The sequence set of A. thaliana contained only the 1,226 nucleus-encoded proteins that were detected in the plastid
by proteomics (Zybailov et al. 2008). All PhyloGenie-based
Blast hits with an e-value less than 0.0001 were taken to
build the hidden Markov model alignments. All other parameters were kept the same as a default to automatically
construct a phylogenetic tree for each C. merolae query
protein. When the topology of a phylogenetic tree showed
a species monophyly defined as indicated below (by a local
bootstrap value .50%), the query C. merolae protein was
extracted as a possible counterpart of A. thaliana plastid
targeted proteins for further extensive analyses. The tree
topologies selected were monophyletic for 1) Plantae, 2)
Plantae and stramenopiles other than oomycetes, 3) Plantae and cyanobacteria, 4) Plantae, stramenopiles other than
oomycetes, and cyanobacteria, 5) Plantae and certain bacterial groups other than cyanobacteria, 6) Plantae, stramenopiles other than oomycetes, and certain bacterial groups
other than cyanobacteria, 7) Plantae and archaea, or 8)
Plantae, stramenopiles other than oomycetes, and archaea,
where Plantae proteins should contain those of both C.
merolae and A. thaliana.
The extraction of C. merolae proteins potentially monophyletic with A. thaliana plastid-targeted proteins by Blast
analyses was performed as follows: Blast searches against
the local database were performed using C. merolae
nucleusencoded proteins as queries and hits (e-value
, 0.0001) were extracted. When a query gave no hit or
a small number of hits, a PSI-Blast search (Altschul et al.
1997) against the nonredundant protein sequence database of NCBI was performed. When top hits were occupied
by species as defined above for the PhyloGenie analyses, the
query C. merolae protein was extracted as a possible counterpart of A. thaliana plastidtargeted proteins for further
extensive analyses. The automated PhyloGenie and the
Blast analyses extracted 248 and 327 C. merolae candidate
proteins, respectively (385 in total).
Automated Extraction of Candidates for NucleusEncoded Proteins That Are Shared by Primary
and Secondary PlastidContaining Eukaryotes
The extraction of C. merolae proteins that are potentially
monophyletic with those of primary and secondary
plastidcontaining eukaryotes was performed using PhyloGenie and Blast searches with the local database. The criteria for the extraction were as follows for monophyly in
phylogenetic trees constructed by PhyloGenie (by a local
bootstrap value .50%) or top hits of Blast searches: 1)
Plantae and stramenopiles other than oomycetes, 2) Plantae, stramenopiles other than oomycetes, and cyanobacteria, 3) Plantae, stramenopiles other than oomycetes, and
certain bacterial groups other than cyanobacteria, or 4)
Plantae, stramenopiles other than oomycetes, and archaea,
in which the Plantae proteins contain both C. merolae and
A. thaliana. The automated PhyloGenie and the Blast
searches extracted 288 and 396 C. merolae candidate proteins, respectively (419 in total).
Phylogenetic Analyses
The above automated searches extracted 535 C. merolae
proteins as possible candidates for plastid-targeted proteins shared by A. thaliana. We further conducted extensive phylogenetic analyses on these proteins.
The 535 proteins were analyzed by Blast searches against
the nonredundant NCBI protein database. The Blast or
PSI-Blast results revealed that homologs of 28 proteins
are evident in only Plantae or Plantae and other plastidcontaining eukaryotes (including A. thaliana). The 28 proteins were scored as putative plastid-targeted proteins
specific to plastid-containing eukaryotes (supplementary
table 2, Supplementary Material online). In the Blast or
PSI-Blast results of 215 proteins, the top hits were occupied
by proteins of plastid-containing eukaryotes (including A.
thaliana) and at least 10 cyanobacterial species. The 215
proteins were scored as putative plastid-targeted proteins
of cyanobacterial origin (supplementary table 2, Supplementary Material online).
The remaining 292 C. merolae proteins were subjected to
phylogenetic analyses by both the maximum-likelihood
method and Bayesian inference. We excluded the sequences of parasitic eukaryotes, which often cause long branch
attraction due to unusual nucleotide substitutions. To facilitate samplings of homologous sequences, we prepared
a protein database of 106 species (supplementary table 3,
Supplementary Material online). The database included sequences of 16 cyanobacteria, 58 other bacteria, 11 archaea,
and 21 eukaryotes. The 292 proteins were subjected to PhyloGenie-based Blast to obtain sets of homologous sequences.
In addition, each protein was also subjected to a Blast search
against the nonredundant protein database of NCBI to obtain additional homologous sequences. In order to facilitate
the phylogenetic analyses, repetitive samplings of closely related sequences of closely related species were omitted.
The sequences were aligned by using ClustalX 2.0 (Larkin
et al. 2007). For 16 C. merolae proteins, reasonable alignments were not available because the sequences were
too divergent against the relevant homologs (e.g., RNA
polymerase sigma subunits) or because sequence similarity
was evident only in a limited region, or sequences consisted
of repeats of a short domain (e.g., pentatrico-peptide repeat
proteins). Therefore, these proteins were omitted in further
phylogenetic analyses (supplementary table 2, Supplementary Material online). Sequences that were too divergent or
sequences with long gaps were excluded to avoid artifacts
in tree construction. Alignments were manually refined by
using SeaView (Galtier et al. 1996), and ambiguous sites
were excluded. A Neighbor-Joining tree was constructed
for each alignment by ClustalX 2.0, and sequences with very
long blanches were omitted for further phylogenetic analyses (those within a clade containing putative plastidtargeted proteins are indicated in parentheses in
supplementary table 2, Supplementary Material online).
One hundred replicates of bootstrap analyses by the
maximum-likelihood method were performed using
RaxML 7.0.4 (Stamatakis 2006), with the Whelan and
MBE
MBE
FIG. 1. Schematic representation of the plastid evolution and two strategies to estimate nucleus-encoded plastid proteins that are shared by the
Viridiplanta Arabidopsis thaliana and the red alga Cyanidioschyzon merolae. In the first strategy, we searched C. merolae nucleusencoded
proteins that are monophyletic with A. thaliana nucleusencoded proteins that were identified in plastids by proteomics (Zybailov et al. 2008).
In the second strategy, we searched C. merolae proteins that are shared by A. thaliana and plastid-containing stramenopiles but are missing in
plastid-less stramenopiles.
MBE
FIG. 2. The number of estimated nucleus-encoded plastid-targeted proteins shared by Arabidopsis thaliana and Cyanidioschyzon merolae.
Candidates for C. merolae proteins that are monophyletic with 1,226 A. thaliana proteins detected in plastids (Zybailov et al. 2008) or that are
shared by A. thaliana and plastid-containing stramenopiles were extracted by automated Blast searches (Altschul et al. 1997) and phylogenetic
analyses by PhyloGenie (Frickey and Lupas 2004). The 535 C. merolae proteins extracted by the automated searches were further examined by
phylogenetic analyses using the maximum-likelihood method (RaxML) (Stamatakis 2006) and the Bayesian inference (MrBayes) (Ronquist and
Huelsenbeck 2003). In total, 455 C. merolae proteins corresponding to 744 A. thaliana counterparts were identified as putative plastid-targeted
proteins. Arc, Archaea; Bac, Bacteria other than cyanobacteria; Cyano, Cyanobacteria; Om, Oomycetes; Pl, Plantae; and St, Stramenopiles.
MBE
Plastid
333
277
357
Others
16
12
24
N/D
253
279
363
Total
602
568
744
MBE
FIG. 3. Classification of the estimated origin of the plastid proteome. The putative plastid-targeted proteins shared by Cyanidioschyzon merolae
(446) and Arabidopsis thaliana (728 in total and 355 localized in plastids by experimental supports) were classified by estimated origin. The
percentages in parentheses indicate those in the total identified proteins (446 in C. meroale, 728 in A. thaliana, or 357 in A. thaliana proteins,
which have been shown to be localized in plastids by experimental investigation).
MBE
FIG. 4. Comparison of the estimated origin of nucleus-encoded plastid proteins across functional categories. (A) Four hundred forty-six
Cyanidioschyzon merolae proteins were classified by functional categories (supplementary table 1, Supplementary Material online). For
comparison, the left bar shows the functional classification of proteins encoded in the plastid genome in both C. merolae and Arabidopsis
thaliana (58 proteins). (B) The contents of cyanobacterial and noncyanobacterial proteins in each functional group. Proteins of cyanobacterial
origin are shown in green. Proteins of noncyanobacterial origin are shown in yellow (cyanobacterial homologs were detected by Blast searches.
e-value , 0.0001) or red (cyanobacterial homologs were not detected).
MBE
Supplementary Material
Supplementary tables 14 and Supplementary figure 1
are available at Molecular Biology and Evolution online
(http://www.mbe.oxfordjournals.org/).
Acknowledgments
We thank M. Matsuzaki, S. Maruyama, and K. Misawa for
comments. This work was supported by Grant-in-Aid for
Scientific Research from Japan Society for the Promotion
of Science (to S.M.).
References
Abdallah F, Salamini F, Leister D. 2000. A prediction of the size and
evolutionary origin of the proteome of chloroplasts of
Arabidopsis. Trends Plant Sci. 5:141142.
Adam Z, Clarke AK. 2002. Cutting edge of chloroplast proteolysis.
Trends Plant Sci. 7:451456.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs. Nucleic Acids Res.
25:33893402.
Armbrust EV, Berges JA, Bowler C, et al. (45 co-authors). 2004. The
genome of the diatom Thalassiosira pseudonana: ecology,
evolution, and metabolism. Science 306:7986.
Bowler C, Allen AE, Badger JH, et al. (77 co-authors). 2008. The
Phaeodactylum genome reveals the evolutionary history of
diatom genomes. Nature 456:239244.
589
590
MBE
Moustafa A, Beszteri B, Maier UG, Bowler C, Valentin K,
Bhattacharya D. 2009. Genomic footprints of a cryptic plastid
endosymbiosis in diatoms. Science. 324:17241726.
Moustafa A, Reyes-Prieto A, Bhattacharya D. 2008. Chlamydiae has
contributed at least 55 genes to Plantae with predominantly
plastid functions. PLoS ONE. 3:e2205.
Nozaki H, Takano H, Misumi O, et al. (18 co-authors). 2007. A 100%complete sequence reveals unusually simple genomic features
in the hot-spring red alga Cyanidioschyzon merolae. BMC Biol.
5:28.
Obornik M, Green BR. 2005. Mosaic origin of the heme biosynthesis
pathway in photosynthetic eukaryotes. Mol Biol Evol.
22:23432353.
Race HL, Herrmann RG, Martin W. 1999. Why have organelles
retained genomes? Trends Genet. 15:364370.
Reyes-Prieto A, Bhattacharya D. 2007. Phylogeny of Calvin cycle
enzymes supports Plantae monophyly. Mol Phylogenet Evol.
45:384391.
Reyes-Prieto A, Weber AP, Bhattacharya D. 2007. The origin and
establishment of the plastid in algae and plants. Annu Rev Genet.
41:147168.
Rice DW, Palmer JD. 2006. An exceptional horizontal gene transfer
in plastids: gene replacement by a distant bacterial paralog and
evidence that haptophyte and cryptophyte plastids are sisters.
BMC Biol. 4:31.
Richards TA, Dacks JB, Campbell SA, Blanchard JL, Foster PG,
McLeod R, Roberts CW. 2006. Evolutionary origins of the eukaryotic
shikimate pathway: gene fusions, horizontal gene transfer, and
endosymbiotic replacements. Eukaryot Cell. 5:15171531.
Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian phylogenetic
inference under mixed models. Bioinformatics. 19:15721574.
Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based
phylogenetic analyses with thousands of taxa and mixed models.
Bioinformatics 22:26882690.
Sun Q, Zybailov B, Majeran W, Friso G, Olinares PD, van Wijk KJ.
2009. PPDB, the Plant Proteomics Database at Cornell. Nucleic
Acids Res. 37:D969D974.
Tanaka R, Tanaka A. 2007. Tetrapyrrole biosynthesis in higher
plants. Annu Rev Plant Biol. 58:321346.
Timmis JN, Ayliffe MA, Huang CY, Martin W. 2004. Endosymbiotic
gene transfer: organelle genomes forge eukaryotic chromosomes.
Nat Rev Genet. 5:123135.
Tyler BM, Tripathy S, Zhang X, et al. (53 co-authors). 2006.
Phytophthora genome sequences uncover evolutionary origins
and mechanisms of pathogenesis. Science 313:12611266.
Tyra HM, Linka M, Weber AP, Bhattacharya D. 2007. Host origin of
plastid solute transporters in the first photosynthetic eukaryotes. Genome Biol. 8:R212.
Vothknecht UC, Soll J. 2005. Chloroplast membrane transport:
interplay of prokaryotic and eukaryotic traits. Gene 354:
99109.
Yang Y, Glynn JM, Olson BJ, Schmitz AJ, Osteryoung KW. 2008.
Plastid division: across time and space. Curr Opin Plant Biol.
11:577584.
Yoon HS, Hackett JD, Pinto G, Bhattacharya D. 2002. The single,
ancient origin of chromist plastids. Proc Natl Acad Sci U S A.
99:1550715512.
Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT.
2006. Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res.
16:10991108.
Zybailov B, Rutschow H, Friso G, Rudella A, Emanuelsson O, Sun Q,
van Wijk KJ. 2008. Sorting signals, N-terminal modifications and
abundance of the chloroplast proteome. PLoS ONE. 3:e1994.