You are on page 1of 10

Eukaryotic and Eubacterial Contributions to the

Establishment of Plastid Proteome Estimated by Large-Scale


Phylogenetic Analyses
Kenji Suzuki and Shin-ya Miyagishima*
Initiative Research Program, Advanced Science Institute, RIKEN, Wako, Saitama, Japan

K.S. and S.M. equally contributed to this study.


*Corresponding author: E-mail: smiyagi@riken.jp.
Associate editor: Charles Delwiche

Abstract

Key words: plastid, endosymbiosis, proteome, horizontal gene transfer.

Introduction
All plastids trace their origin to a primary endosymbiotic
event in which an ancestral cyanobacterium was integrated
into a previously nonphotosynthetic eukaryote more than
a billion years ago. The ancient alga resulting from this primary endosymbiotic event evolved into the Glaucophyta,
Rhodophyta (red algae), and Viridiplantae (green algae and
terrestrial plants), which together are referred to as the
Plantae or Archaeplastida (Reyes-Prieto et al. 2007; Gould
et al. 2008). After the primitive green and red algae were
established, plastids then spread into other lineages of eukaryotes through secondary endosymbiotic events in which
a red or a green alga was integrated into a previously nonphotosynthetic eukaryote (Reyes-Prieto et al. 2007; Gould
et al. 2008).
Reminiscent of their free-living ancestor, plastids have
retained the bulk of their bacterial biochemistry but lost
;90% of the bacterial genome. Over time, many genes
essential to plastid function have relocated from the ancestral plastid genome to the nucleus, and the gene products
are now targeted from the eukaryotic cytosol into
plastids (Timmis et al. 2004). However, certain proteins
of cyanobacterial origin have evolved to function in cellular
compartments other than plastids (Abdallah et al. 2000;

Timmis et al. 2004). For example, some terrestrial plants


possess plastid and cytosolic isoenzymes of phosphoglycerate kinase both of which are descended from the cyanobacterial ancestor (Martin and Schnarrenberger 1997).
Conversely, a portion of the proteins that are targeted to plastids do not seem to have been acquired from cyanobacteria,
but rather, to have evolved from the host eukaryotic genome
(Abdallah et al. 2000; Timmis et al. 2004). Such evolutionary
chimerism has been found in metabolic pathways in the plastid (e.g., the Calvin cycle, shikimate pathway for amino acid
synthesis, and methylerythritol 4-phosphate pathway for
isoprenoid synthesis) (Martin and Schnarrenberger 1997;
Richards et al. 2006; Reyes-Prieto and Bhattacharya 2007;
Matsuzaki et al. 2008). In these cases, the function of certain
cyanobacterial proteins has been replaced by preexisting and
functionally equivalent host-encoded proteins, a process
known as endosymbiotic gene replacement. In addition,
new components have been added to plastids since the endosymbiotic event and are involved in protein targeting into
plastids from the cytosol (Vothknecht and Soll 2005), regulating plastid division (Yang et al. 2008) and transporting metabolites across the plastid envelope membranes (Tyra et al.
2007).
The above features of the plastid proteome suggest that
the proteome has undergone rearrangement by a reduction

The Author 2009. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: journals.permissions@oxfordjournals.org

Mol. Biol. Evol. 27(3):581590. 2010 doi:10.1093/molbev/msp273

Advance Access publication November 12, 2009

581

Research article

Plastids including chloroplasts arose from a cyanobacterial endosymbiont and have retained their own genome, but the
size has been reduced to less than one-tenth of the original bacterial genome. Over time, genes essential to plastid function
have been transferred from the ancestral plastid genome to the nucleus, and the gene products are now targeted into the
plastid from the host cytosol. However, phylogenetic analyses have suggested that the functions of certain original proteins
encoded by the endosymbiont genome have been replaced by nucleus-encoded proteins of noncyanobacterial origin and
that several proteins have been newly added to maintain and control plastids. In order to evaluate the rate and origin of
noncyanobacterial proteins that have contributed to the establishment of the plastid proteome, we performed
phylogenetic analyses of plastid-targeted proteins that are shared by the red alga Cyanidioschyzon merolae (455 proteins)
and the Viridiplanta Arabidopsis thaliana (744 proteins). Our results show that approximately 40% of the plastid proteome
common to red algae and green plants originated from genes of both the ancestral eukaryotic host and various lineages of
bacteria (eubacteria) other than cyanobacteria. The replacement or addition of components was frequently observed for
most of the plastid functions except for the light reaction of photosynthesis and the translation and degradation of
proteins in the plastid. These results suggest that a considerable amount of bacterial metagenomic material, as well as the
genomes of the host and the endosymbiont, has contributed to the establishment of the plastid before the split of the red
and green algae.

Suzuki and Miyagishima doi:10.1093/molbev/msp273

of the endosymbiont genome and the introduction of host


genome products. Recent studies have also suggested that
dozens of genes acquired by horizontal gene transfer
(HGT), especially from Chlamydiae (Huang and Gogarten
2007, 2008; Moustafa et al. 2008), contributed to the
rearrangement of the plastid proteome in addition to eukaryotic genes. Although it is uncertain how the noncyanobacterial genes have been integrated into the nuclear
genome of Plantae, phagocytic digestion of bacteria and
subsequent integration of bacterial genes into the host genome or an ancient endosymbiotic association between
Plantae and bacteria are proposed (Huang and Gogarten
2007, 2008; Moustafa et al. 2008). These observations led
us to examine the ratio, origin, and functions of noncyanobacterial proteins that have been introduced into the
plastid proteome.
In this study, we analyzed the evolutionary origin of
nucleus-encoded plastid proteins that are shared by the
red alga Cyanidioschyzon merolae and the Viridiplanta Arabidopsis thaliana to evaluate the ancient rearrangement of
the plastid proteome. The plastid-targeted proteins were
estimated based on results of biochemical proteomics of
A. thaliana and comparative analyses of the primary and
secondary plastidcontaining eukaryotes. Phylogenetic
analyses show that approximately one-half of the plastid
proteome was built by replacement or addition early in
plastid evolution. It is also suggested that the extent of
the rearrangement varies among functions and that certain
components of translation and protein degradation have
been protected from the replacement. The analyses show
that the majority of noncyanobacterial proteins have been
introduced by HGT from various groups of bacteria, suggesting that proteins encoded in the nuclear genome, regardless of origin, had the opportunity to acquire functions
in the plastid.

Materials and Methods


Automated Extraction of Nucleus-Encoded
C. merolae Proteins Potentially Monophyletic
with A. thaliana PlastidTargeted Proteins
Cyanidioschyzon merolae nucleusencoded proteins
(amino acid sequences) that are potentially monophyletic
with A. thaliana nucleusencoded plastid proteins were
extracted by Blast (Altschul et al. 1997) searches and automated phylogenetic analyses using PhyloGenie (Frickey and
Lupas 2004) with a local database containing 1,226 A. thaliana proteins, which had been detected in the plastid fraction by large-scale proteome analyses (Zybailov et al. 2008).
The local database contained deduced amino acid sequences of 28 bacteria, including 16 cyanobacteria, 7 archaea,
5 primary plastidcontaining eukaryotes (Plantae), including A. thaliana, the red alga C. merolae and 5 stramenopiles
including 2 oomycetes, and 10 other eukaryotes without
plastids (supplementary table 1, Supplementary Material
online).
For the analyses with PhyloGenie, C. merolae nucleus
encoded proteins (4,773 proteins) (Matsuzaki et al. 2004)
582

MBE
were used as queries against the local database. The sequence set of A. thaliana contained only the 1,226 nucleus-encoded proteins that were detected in the plastid
by proteomics (Zybailov et al. 2008). All PhyloGenie-based
Blast hits with an e-value less than 0.0001 were taken to
build the hidden Markov model alignments. All other parameters were kept the same as a default to automatically
construct a phylogenetic tree for each C. merolae query
protein. When the topology of a phylogenetic tree showed
a species monophyly defined as indicated below (by a local
bootstrap value .50%), the query C. merolae protein was
extracted as a possible counterpart of A. thaliana plastid
targeted proteins for further extensive analyses. The tree
topologies selected were monophyletic for 1) Plantae, 2)
Plantae and stramenopiles other than oomycetes, 3) Plantae and cyanobacteria, 4) Plantae, stramenopiles other than
oomycetes, and cyanobacteria, 5) Plantae and certain bacterial groups other than cyanobacteria, 6) Plantae, stramenopiles other than oomycetes, and certain bacterial groups
other than cyanobacteria, 7) Plantae and archaea, or 8)
Plantae, stramenopiles other than oomycetes, and archaea,
where Plantae proteins should contain those of both C.
merolae and A. thaliana.
The extraction of C. merolae proteins potentially monophyletic with A. thaliana plastid-targeted proteins by Blast
analyses was performed as follows: Blast searches against
the local database were performed using C. merolae
nucleusencoded proteins as queries and hits (e-value
, 0.0001) were extracted. When a query gave no hit or
a small number of hits, a PSI-Blast search (Altschul et al.
1997) against the nonredundant protein sequence database of NCBI was performed. When top hits were occupied
by species as defined above for the PhyloGenie analyses, the
query C. merolae protein was extracted as a possible counterpart of A. thaliana plastidtargeted proteins for further
extensive analyses. The automated PhyloGenie and the
Blast analyses extracted 248 and 327 C. merolae candidate
proteins, respectively (385 in total).

Automated Extraction of Candidates for NucleusEncoded Proteins That Are Shared by Primary
and Secondary PlastidContaining Eukaryotes
The extraction of C. merolae proteins that are potentially
monophyletic with those of primary and secondary
plastidcontaining eukaryotes was performed using PhyloGenie and Blast searches with the local database. The criteria for the extraction were as follows for monophyly in
phylogenetic trees constructed by PhyloGenie (by a local
bootstrap value .50%) or top hits of Blast searches: 1)
Plantae and stramenopiles other than oomycetes, 2) Plantae, stramenopiles other than oomycetes, and cyanobacteria, 3) Plantae, stramenopiles other than oomycetes, and
certain bacterial groups other than cyanobacteria, or 4)
Plantae, stramenopiles other than oomycetes, and archaea,
in which the Plantae proteins contain both C. merolae and
A. thaliana. The automated PhyloGenie and the Blast
searches extracted 288 and 396 C. merolae candidate proteins, respectively (419 in total).

Plastid Proteins of Noncyanobacterial Origin doi:10.1093/molbev/msp273

Phylogenetic Analyses
The above automated searches extracted 535 C. merolae
proteins as possible candidates for plastid-targeted proteins shared by A. thaliana. We further conducted extensive phylogenetic analyses on these proteins.
The 535 proteins were analyzed by Blast searches against
the nonredundant NCBI protein database. The Blast or
PSI-Blast results revealed that homologs of 28 proteins
are evident in only Plantae or Plantae and other plastidcontaining eukaryotes (including A. thaliana). The 28 proteins were scored as putative plastid-targeted proteins
specific to plastid-containing eukaryotes (supplementary
table 2, Supplementary Material online). In the Blast or
PSI-Blast results of 215 proteins, the top hits were occupied
by proteins of plastid-containing eukaryotes (including A.
thaliana) and at least 10 cyanobacterial species. The 215
proteins were scored as putative plastid-targeted proteins
of cyanobacterial origin (supplementary table 2, Supplementary Material online).
The remaining 292 C. merolae proteins were subjected to
phylogenetic analyses by both the maximum-likelihood
method and Bayesian inference. We excluded the sequences of parasitic eukaryotes, which often cause long branch
attraction due to unusual nucleotide substitutions. To facilitate samplings of homologous sequences, we prepared
a protein database of 106 species (supplementary table 3,
Supplementary Material online). The database included sequences of 16 cyanobacteria, 58 other bacteria, 11 archaea,
and 21 eukaryotes. The 292 proteins were subjected to PhyloGenie-based Blast to obtain sets of homologous sequences.
In addition, each protein was also subjected to a Blast search
against the nonredundant protein database of NCBI to obtain additional homologous sequences. In order to facilitate
the phylogenetic analyses, repetitive samplings of closely related sequences of closely related species were omitted.
The sequences were aligned by using ClustalX 2.0 (Larkin
et al. 2007). For 16 C. merolae proteins, reasonable alignments were not available because the sequences were
too divergent against the relevant homologs (e.g., RNA
polymerase sigma subunits) or because sequence similarity
was evident only in a limited region, or sequences consisted
of repeats of a short domain (e.g., pentatrico-peptide repeat
proteins). Therefore, these proteins were omitted in further
phylogenetic analyses (supplementary table 2, Supplementary Material online). Sequences that were too divergent or
sequences with long gaps were excluded to avoid artifacts
in tree construction. Alignments were manually refined by
using SeaView (Galtier et al. 1996), and ambiguous sites
were excluded. A Neighbor-Joining tree was constructed
for each alignment by ClustalX 2.0, and sequences with very
long blanches were omitted for further phylogenetic analyses (those within a clade containing putative plastidtargeted proteins are indicated in parentheses in
supplementary table 2, Supplementary Material online).
One hundred replicates of bootstrap analyses by the
maximum-likelihood method were performed using
RaxML 7.0.4 (Stamatakis 2006), with the Whelan and

MBE

Goldman (WAG) matrix of the amino acid replacements


assuming a proportion of the invariant positions and four
gamma-distributed rates (WAG I G4 model). The
Bayesian inference was performed with the program
MrBayes 3.1.2 (Ronquist and Huelsenbeck 2003) with
the WAG I G4 model. For each MrBayes consensus
tree, 500,0002,000,000 generations were completed.
As a result, 186 groups of C. merolae proteins (corresponding to 212 proteins) were supported by the criteria
as described above (i.e., monophyly with A. thaliana
plastidlocalized proteins or monophyly with proteins of
primary and secondary plastidcontaining eukaryotes)
by both local bootstrap values .80 and Bayesian posterior
probabilities .0.95. In total, 455 C. merolae proteins (corresponding to 744 A. thaliana counterparts) were identified as putative plastid-targeted proteins.

Evaluation of the A. thaliana Protein Localization


Data
Experimental support for the subcellular localization for
the 744 A. thaliana proteins was searched in the Plant Proteomics Database (Sun et al. 2009) and the Arabidopsis
Information Resource (http://www.arabidopsis.org/). Because proteomic analyses by biochemical fractionation often are contaminated by nonplastid proteins (although
some combinatorial approaches remove such contamination; Sun et al. 2009), the localization of A. thaliana proteins
was evaluated as follows: 1) Immunostaining, assays using
the green fluorescent protein or other tags conjugated either to the entire or N-terminal portion (containing a putative transit peptide) of the protein, in vitro import assays
to orgenelles, or immunoblotting (showing enrichment of
the protein in an isolated organelle fraction). When one or
more these experiments suggest the localization of a protein in a cellular compartment or a set of compartments,
the localization is provided for the A. thaliana protein (supplementary table 2, Supplementary Material online). 2)
When such data were not available, proteomic results
based on cell fractionation/organelle isolation were
searched. When two or more independent analyses suggest
the same localization, the localization was assigned to the
protein. 3) When these data were not available and when
experimental data indicated that the activity of the protein
was localized exclusively in plastids, the protein was indicated as CpA in supplementary table 2 (Supplementary
Material online), but these were not counted for the estimation of contamination.
In the case that one or more A. thaliana proteins
monophyletic with of a C. merolae protein were localized
in compartments other than plastids and no other A. thaliana members were localized in plastids, these proteins
were considered as nonplastid proteins (nine C. merolae
proteins; supplementary table 2, Supplementary Material
online). Even when one or more A. thaliana proteins
monophyletic with a C. merolae protein were localized
in compartments other than plastids, when other A. thaliana members were localized in plastids, these proteins
were included in the classifications.
583

Suzuki and Miyagishima doi:10.1093/molbev/msp273

MBE

FIG. 1. Schematic representation of the plastid evolution and two strategies to estimate nucleus-encoded plastid proteins that are shared by the
Viridiplanta Arabidopsis thaliana and the red alga Cyanidioschyzon merolae. In the first strategy, we searched C. merolae nucleusencoded
proteins that are monophyletic with A. thaliana nucleusencoded proteins that were identified in plastids by proteomics (Zybailov et al. 2008).
In the second strategy, we searched C. merolae proteins that are shared by A. thaliana and plastid-containing stramenopiles but are missing in
plastid-less stramenopiles.

Results and Discussion


Estimation of Plastid-Targeted Proteins That Are
Shared by the Red Alga C. merolae and the
Viridiplanta A. thaliana
In order to explore the extent to which noncyanobacterial
proteins have been integrated into the plastid proteome,
we performed exhaustive phylogenetic analyses of plastidtargeted nucleus-encoded proteins. To detect the integration early in plastid evolution, we analyzed proteins shared
by red algae and Viridiplantae that branched soon after the
establishment of the plastid (Yoon et al. 2002). Currently,
most of the available experimental data on protein subcellular localization is from terrestrial plants, especially A.
thaliana, and C. merolae is the only red alga for which
the complete genome sequence is available (Matsuzaki
et al. 2004; Nozaki et al. 2007). Therefore, we searched
for the putative plastid-targeted proteins that are shared
by A. thaliana and C. merolae by two approaches (fig. 1).
In the first approach, we searched in C. merolae nucleus
encoded proteins (4,773 amino acid sequences) for counterparts of A. thaliana nucleusencoded proteins that were
identified in plastids by proteomics (Zybailov et al. 2008)
(1,226 proteins). As a result, we identified 359 C. merolae
proteins monophyletic with 414 of the 1,226 A. thaliana
proteins (fig. 2). In addition, the phylogenetic searches
(phylogenetic trees of the C. merolae 359 proteins, supplementary table 2; supplementary fig. 1, Supplementary
Material online) suggested that the 359 C. meroale proteins
584

are also monophyletic with an additional 188 A. thaliana


proteins, which were not included in the original 1,226 proteins (fig. 2; the details are described in Materials and Methods). These 188 proteins were also treated as putative
plastid-targeted proteins for further analyses.
In the second approach, we predicted plastid proteins
based on secondary endosymbiosis using information from
the whole genome of stramenopiles (fig. 1; Armbrust et al.
2004; Tyler et al. 2006; Bowler et al. 2008). Evolutionary
studies have suggested that the plastids of stramenopiles,
such as diatoms and brown algae, were obtained through
a secondary endosymbiosis in which a red alga had been
integrated into another nonphotosynthetic eukaryote,
and the red algal contents other than the plastid were
degraded (Reyes-Prieto et al. 2007; Gould et al. 2008).
Similarly to the primary endosymbiosis event, the red algal
genes were transferred to the nuclear genome of stramenopiles, and the gene products are now targeted into
the plastid (Li et al. 2006). Certain stramenopiles, such
as oomycetes, do not have plastids (or nonphotosynthetic
plastids), suggesting that secondary endosymbiosis occurred in an ancestor of the plastid-containing stramenopiles after the plastid-less stramenopiles branched or
that some stramenopiles have secondarily lost the plastid
(Cavalier-Smith 2002; Harper and Keeling 2003; ReyesPrieto et al. 2007) (i.e., a common ancestor of stramenopiles
had plastids). Recent whole-genome analyses of oomycetes
(Phytophthora sojae and Phytophthora ramorum) identified a large number of genes with a putative heritage from

Plastid Proteins of Noncyanobacterial Origin doi:10.1093/molbev/msp273

MBE

FIG. 2. The number of estimated nucleus-encoded plastid-targeted proteins shared by Arabidopsis thaliana and Cyanidioschyzon merolae.
Candidates for C. merolae proteins that are monophyletic with 1,226 A. thaliana proteins detected in plastids (Zybailov et al. 2008) or that are
shared by A. thaliana and plastid-containing stramenopiles were extracted by automated Blast searches (Altschul et al. 1997) and phylogenetic
analyses by PhyloGenie (Frickey and Lupas 2004). The 535 C. merolae proteins extracted by the automated searches were further examined by
phylogenetic analyses using the maximum-likelihood method (RaxML) (Stamatakis 2006) and the Bayesian inference (MrBayes) (Ronquist and
Huelsenbeck 2003). In total, 455 C. merolae proteins corresponding to 744 A. thaliana counterparts were identified as putative plastid-targeted
proteins. Arc, Archaea; Bac, Bacteria other than cyanobacteria; Cyano, Cyanobacteria; Om, Oomycetes; Pl, Plantae; and St, Stramenopiles.

a red algal ancestor, supporting the latter scenario (Tyler


et al. 2006). Whatever the case, we expected that the stramenopile proteins that are descended from primary
plastidcontaining algae, which are missing in the plastid-less members, such as oomycetes, are functional in
the plastid. Therefore, we collected the proteins that are
shared by C. merolae, A. thaliana, and the plastid-containing stramenopiles (whole-genome information of the stramenopiles Thalassiosira pseudonana, Phaeodactylum
tricornutum, and Aureococcus anophagefferens [Armbrust
et al. 2004; Bowler et al. 2008] were integrated into the analyses). Because recent genome comparison analyses sug-

gested that most endosymbiont-derived diatom proteins


appear to be of a green algal origin instead of a red algal
origin (Moustafa et al. 2009), the branching order of red
algal, Viridiplantae and stramenopile proteins were not
considered in this study. As a result, we identified 366 C.
merolae nucleusencoded proteins corresponding to 568
A. thaliana that are monophyletic with proteins of plastid-containing stramenopiles (fig. 2). It should be noted
that oomycetes might have lost proteins of primary algal
origin that have functioned in cellular compartments other
than plastids. Thus, the above method might have also extracted nonplastid proteins, but it has turned out that the
585

MBE

Suzuki and Miyagishima doi:10.1093/molbev/msp273


Table 1. Subcellular Localization of the 744 Arabidopsis thaliana
Proteins.
Method for Identification
Based on proteomics
Comparison with stramenopiles
Total

Plastid
333
277
357

Others
16
12
24

N/D
253
279
363

Total
602
568
744

NOTE.Localization of the 744 A. thaliana proteins were evaluated based on


reported experimental information. The 357 plastid proteins contain those
targeted to both plastids and other cellular compartments by dual targeting.
Others indicate that proteins have been localized in cellular compartments other
than plastids. N/D, not determined.

ratio of the nonplastid protein contamination is quite low


as described below.
In total, of the two approaches, we collected 455 C. merolae proteins (9.5% of the total nucleus-encoded proteins)
corresponding to 744 A. thaliana counterparts as putative
plastid-targeted proteins (fig. 2). The number corresponds
to 2535% of the 2,0003,000 A. thaliana proteins that are
predicted to be targeted into the plastid (by prediction of
the existence of plastid transit peptides) (Abdallah et al.
2000). The ratio is consistent with the above results that
414 (;30%) of the 1,226 A. thaliana proteins (detected
in isolated plastids) have C. merolae counterparts. These
results suggest that at least ;30% of the genes encoding
A. thaliana plastid proteome had already existed in the
common ancestor of Viridiplantae and Rhodophyta, although there is likely to be some underestimation due
to lineage-specific gene loss.
Of 744 A. thaliana proteins collected, the localization of
381 proteins was assigned with experimental support, and
357 proteins (94%) were localized in plastids (including
those targeted to both plastids and other cellular compartments by dual targeting) (table 1). These results indicate
that the pool of the proteins collected above contains
a low level (6%) of nonplastid protein contamination, at
least for A. thaliana. In addition, in 103 sequence alignments (the putative plastid proteins of bacterial origin with
N-terminal ends that are well conserved in bacteria), we
detected an N-terminal extension (.10 amino acid residues) in 98 sequences, suggesting that most of the proteins
in our data set are also targeted into the plastid and/or
mitochondria in C. merolae.

Evolutionary Origin of Plastid-Targeted Proteins


The above searches identified 455 C. merolae proteins (corresponding to 744 A. thaliana proteins) (fig. 2). Of these,
nine C. merolae proteins were considered to be a contamination of nonplastid proteins based on the experimental
information of the A. thaliana counterparts (supplementary table 2, Supplementary Material online), and these
proteins were omitted from further analyses and classification. Among the remaining 446 C. merolae (corresponding
to 728 A. thaliana proteins) proteins that were analyzed by
phylogenetic analyses (supplementary table 2; supplementary fig. 1, Supplementary Material online), 177 proteins
(40%) were suggested to be of noncyanobacterial origin
(by maximum-likelihood bootstrap values .80 and Bayes586

ian posterior probabilities .0.95), and 54 proteins (12%)


were found to have no cyanobacterial homologs (Blast
search, e-value , 0.0001) (Altschul et al. 1997) (fig. 3). Similar results were obtained for the 728 A. thaliana proteins as
well as the 357 A. thaliana proteins, which have been
shown to be localized in plastids by experimental investigation (fig. 3). Thus, the result is little influenced by contamination with nonplastid proteins or by species
differences.
The C. merolae proteins of noncyanobacterial origin
were further classified into those of bacterial (23%) and eukaryotic (8%) origin or those specific to plastid-containing
eukaryotes (6%). Only two proteins (1%) were suggested to
be of archaeal origin (fig. 3). The only difference between
C. merolae and A. thaliana is that the proteins specific to
plastid-containing eukaryotes are more evident in A. thaliana plastid proteome (21% or 34% of noncyanobacterial
proteins in total or experimentally plastid-localized proteins, respectively). This difference between C. merolae
and A. thaliana largely relies on the difference in the number of light-harvesting proteins of the photosystems. These
results suggest that a significantly greater number of proteins of bacterial origin than eukaryotic origin were recruited to function in plastids after endosymbiosis.
Consistent with previous reports detecting 15 (Huang
and Gogarten 2007) or 20 (Moustafa et al. 2008) nucleusencoded plastid proteins of chlamydial origin in red and
green algae, our analyses suggest that 17 kinds of proteins
(18 C. merolae and 27 A. thaliana proteins) are of chlamydial origin. In addition, our results further show that the chlamydial contribution is 4% of the plastid proteome shared
by red algae and Viridiplantae, and 7% of the proteins are
contributed by proteobacteria.

Distribution of Noncyanobacterial Proteins


Across Functional Categories
To explore whether there has been functional selection on
the rearrangement of the plastid proteome, the distribution of noncyanobacterial proteins was examined across
functional categories. The rearrangement is the most evident in the metabolic pathways, where approximately twothirds are suggested to be of noncyanobacterial origin in
pathways related to lipids, amino acids, and carbohydrates
(fig. 4). Although the phylogenetic analyses showed integration of noncyanobacterial proteins in these metabolic pathways, the analyses also indicate that cyanobacteria have
homologs of these proteins, genes of which have not been
retained in Plantae after the endosymbiotic event (supplementary table 2; supplementary fig. 1, Supplementary
Material online). Thus, nucleus-encoded noncyanobacterial
proteins have replaced endosymbiont genomeencoded
cyanobacterial proteins in these metabolic pathways.
The only exception for the replacement in metabolic
pathways is in synthetic pathways of photosynthetic pigments. This conservation of cyanobacterial genes is simply
because cyanobacteria are the only bacteria performing oxygenic photosynthesis. In chlorophyll synthesis, replacement was detected in the part of the process shared by

Plastid Proteins of Noncyanobacterial Origin doi:10.1093/molbev/msp273

MBE

FIG. 3. Classification of the estimated origin of the plastid proteome. The putative plastid-targeted proteins shared by Cyanidioschyzon merolae
(446) and Arabidopsis thaliana (728 in total and 355 localized in plastids by experimental supports) were classified by estimated origin. The
percentages in parentheses indicate those in the total identified proteins (446 in C. meroale, 728 in A. thaliana, or 357 in A. thaliana proteins,
which have been shown to be localized in plastids by experimental investigation).

heme synthesis, as previously suggested (Obornik and


Green 2005), whereas all the proteins for the subsequent
chlorophyll synthesis (the Chl branch) (Tanaka and Tanaka
2007) are suggested to be of cyanobacterial origin (fig. 4;
supplementary table 2, Supplementary Material online).
Similarly, the replacement is evident in the methylerythritol
4-phosphate pathway in the plastid for several types of
isoprenoid synthesis (Lange et al. 2000; Matsuzaki et al.
2008), but most of the proteins involved in the subsequent
carotenoid synthesis are of cyanobacterial origin (fig. 4;
supplementary table 2, Supplementary Material online).
Consistent with these results, prokaryotic proteins other
than cyanobacteria was not found in the components
of photosystems, although there are proteins specific to
photosynthetic eukaryotes (35%), such as the chlorophyllbinding proteins of the light-harvesting complexes
(Durnford et al. 1999).
Previous studies showed the replacement of genes by
HGT to occur less frequently in informational genes (those
involved in transcription, translation, and related processes)
than operational genes (those involved in housekeeping)
in prokaryotes, including cyanobacteria (Jain et al. 1999;
Zhaxybayeva et al. 2006). Our analyses of the plastid proteome showed the conservative nature of the translation,
processing and degradation of plastid proteins, in which
the majority of the proteins are of cyanobacterial origin
(fig. 4, the only exception in the case of translation is translation elongation factor G, which is of alpha-proteobacterial

origin). In contrast, a great many replacements by proteins of


noncyanobacterial origin were detected in proteins involved
in RNA processing and aminoacyl-tRNA synthesis. Most of
these aminoacyl-tRNA synthetases were shown to be dually
targeted to plastids and mitochondria (Duchene et al. 2005),
raising the possibility that mitochondrial proteins in the
nonphotosynthetic eukaryotic ancestor have the capacity
to function in plastids. However, our phylogenetic studies
suggest that three aminoacyl-tRNA synthetases (those for
valine, leucine, and glycine) were acquired in the ancestor
of Plantae by HGT of bacterial genes (supplementary table
2, Supplementary Material online).
In contrast to proteins directly involved in translation,
proteins that are well conserved in prokaryotes with putative functions that have been implicated in ribosomal biogenesis (Comartin and Brown 2006) (not included in fig. 4)
appear to have been largely replaced after the endosymbiotic event. For example, our phylogenetic studies suggest
that PrmA, which methylates the ribosomal proteins
L11 and three (Hflx, Obg, and BipA/TypA) of six small
ribosome-binding GTPases, originated from bacteria other
than cyanobacteria (YchF, YlqF, and EngA are suggested to
be of cyanobacterial origin; supplementary table 2, Supplementary Material online).

Summary and Implications


More than one half of the plastid proteins of noncyanobacterial origin have cyanobacterial homologs (figs. 3 and 4),
587

Suzuki and Miyagishima doi:10.1093/molbev/msp273

MBE

FIG. 4. Comparison of the estimated origin of nucleus-encoded plastid proteins across functional categories. (A) Four hundred forty-six
Cyanidioschyzon merolae proteins were classified by functional categories (supplementary table 1, Supplementary Material online). For
comparison, the left bar shows the functional classification of proteins encoded in the plastid genome in both C. merolae and Arabidopsis
thaliana (58 proteins). (B) The contents of cyanobacterial and noncyanobacterial proteins in each functional group. Proteins of cyanobacterial
origin are shown in green. Proteins of noncyanobacterial origin are shown in yellow (cyanobacterial homologs were detected by Blast searches.
e-value , 0.0001) or red (cyanobacterial homologs were not detected).

suggesting that there has been considerable replacement in


the plastid proteome, with similar biochemical activities retained. Overall, our analyses suggest that proteins encoded
in the nuclear genome, regardless of origin, had the opportunity to acquire functions in the plastid before the split of
the red algae and Viridiplantae, during the course of the
reduction of the plastid genome contents (i.e., genes of
eukaryotic, cyanobacterial origin and those acquired by
HGT). The exception to this is the conservation of cyanobacterial genes in the nuclear genome for the light reaction
of photosynthesis. This is simply because oxygenic photosynthesis is specific to cyanobacteria and plastids. However,
our analyses indicate that the translation, folding, and deg588

radation of plastid proteins also exhibit relatively conserved


cyanobacterial traits (fig. 4). Similarly, most of the genes
that are still encoded in the plastid genome encode proteins for translation and photosystems (Race et al. 1999;
fig. 4A). Because protein synthesis, protein degradation,
and their balance are important for fine-tuning the
amounts of photosystems on environmental changes
(Adam and Clarke 2002; Miura et al. 2007), the conservation of cyanobacterial proteins in the translation and protein degradation might have been important to integrate
the oxygenic photosynthesis in the ancestral eukaryote.
It should be noted that the analyses are based on information obtained from extant organisms. Therefore, the

Plastid Proteins of Noncyanobacterial Origin doi:10.1093/molbev/msp273

genomic content of the bacterial ancestor of the plastid


might have been considerably different, and the difference
might possibly have resulted in a miscategorization as plastid proteins of noncyanobacterial origin in this study. However, studies have shown a monophyletic pattern for most
of the plastid-encoded genes and extant cyanobacteria,
with only a few exceptions horizontally introduced in certain specific lineages after the establishment of the plastid
(Rice and Palmer 2006) (e.g., rbcL in red algae, rpl36 in cryptophtes and haptophytes). The results apparently suggest
little difference in the gene content of the plastid ancestor
and extant cyanobacteria. Another point that should be
considered is that most of the genes in the plastid genome
encode proteins for translation and the light reaction. The
C. merolae plastid genome contains 17 metabolic genes
with counterparts encoded in the nuclear genome in A. thaliana (supplementary table 4, Supplementary Material online).
Our analyses showed 13 genes to be of cyanobacterial origin
and the other 4 genes (24% of 13) that are involved in the
same operon (the men operon for phylloquinon synthesis) to
be of other origin, as shown previously (Gross et al. 2008)
(supplementary table 4, Supplementary Material online). It
is still not clear whether these four genes were horizontally
transferred into the plastid genome after the establishment
of the plastid or represent a difference between the bacterial
ancestor of the plastid and extant cyanobacteria. Even if the
latter were the case, a much higher ratio of noncyanobacterial proteins was detected in the nuclear genome for the plastid metabolic pathways (fig. 4). These results suggest that
most of the noncyanobacterial plastid proteome has been
acquired since the endosymbiotic event.
More than one-half of noncyanobacterial plastid proteins are suggested to be of bacterial origin (fig. 3). Some
plastid proteins of alpha-proteobacterial origin are likely
derived from the bacterial ancestor of mitochondria. However, these proteins are absent from plastid-less eukaryotes,
suggesting that these proteins are likely obtained by the
ancestral Plantae by HGT. In addition, our results show that
many other proteins are descended from bacterial groups
other than cyanobacteria or alpha-proteobacteria by HGT.
It is well established that HGT events between different bacterial groups have contributed the prokaryotic evolution,
whereas analyses of genomic sequences have indicated that
bacterial genes have been rarely transferred into genomes of
multicellular eukaryotes (Timmis et al. 2004; Keeling and
Palmer 2008). By contrast, most of the protist genomes that
have been examined contain a significant number of genes
that have been transferred from bacteria (Keeling and
Palmer 2008), suggesting that phagocytosis and digestion
of bacteria would lead to transfer of bacterial genes into nuclear genome (you are what you eat hypothesis) (Doolittle
1998). In this regard, we have identified nine possible HGT of
noncyanobacterial plastid proteins into the choanoflagellate
Monosiga brevicollis, which feeds on algae and bacteria (supplementary table 2, Supplementary Material online; corresponding to 5% of 178 groups of proteins in total).
However, extant Plantae are not capable phagocytosis
although their single-celled ancestor might have fed on

MBE

bacteria. In addition, there would have been other events


to incorporate bacterial genes into nuclear genome such
as viral infection and bacterial type IV secretion that can
deliver DNA into eukaryotes (Fronzes et al. 2009).
Although the mechanism by which the bacterial genes
have been introduced into plant nuclear genomes remains
to be elucidated, our results suggest that a significant
number of bacterial genes were recruited into the plastid
during the early stage of plant evolution. The organelles-tonucleus DNA transfer and the DNA transfer from Agrobacterial species into the nuclear genome are still operating in
extant terrestrial plants (Timmis et al. 2004; Keeling and
Palmer 2008). However, the transferred DNA fragments
are rapidly lost by mutation and fragmentation without
activation of the genes (Timmis et al. 2004; Keeling and
Palmer 2008). By contrast, during establishment of plastids,
products of horizontally transferred bacterial DNA, as well
as the cyanobacterial endosymbiontderived nuclear DNA,
would have a chance to function in the cyanobacterial endosymbiont in the eukaryotic cell in parallel with massive
reduction of the endosymbiont genome. The chance for
the integrated sequences to get a promoter and a transit
peptide would simply have relied on the frequency of the
integration and subsequent relocation in the nuclear genome before subsequent fixation and persistence of the
genes. Because our results suggest that the ratio of noncyanobacterial proteins versus cyanobacteria-derived proteins
varies among function, there should have been functional
selections on the fixation of the nuclear genes of plastidtargeted proteins.

Supplementary Material
Supplementary tables 14 and Supplementary figure 1
are available at Molecular Biology and Evolution online
(http://www.mbe.oxfordjournals.org/).

Acknowledgments
We thank M. Matsuzaki, S. Maruyama, and K. Misawa for
comments. This work was supported by Grant-in-Aid for
Scientific Research from Japan Society for the Promotion
of Science (to S.M.).

References
Abdallah F, Salamini F, Leister D. 2000. A prediction of the size and
evolutionary origin of the proteome of chloroplasts of
Arabidopsis. Trends Plant Sci. 5:141142.
Adam Z, Clarke AK. 2002. Cutting edge of chloroplast proteolysis.
Trends Plant Sci. 7:451456.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs. Nucleic Acids Res.
25:33893402.
Armbrust EV, Berges JA, Bowler C, et al. (45 co-authors). 2004. The
genome of the diatom Thalassiosira pseudonana: ecology,
evolution, and metabolism. Science 306:7986.
Bowler C, Allen AE, Badger JH, et al. (77 co-authors). 2008. The
Phaeodactylum genome reveals the evolutionary history of
diatom genomes. Nature 456:239244.

589

Suzuki and Miyagishima doi:10.1093/molbev/msp273


Cavalier-Smith T. 2002. Chloroplast evolution: secondary symbiogenesis and multiple losses. Curr Biol. 12:R62R64.
Comartin DJ, Brown ED. 2006. Non-ribosomal factors in ribosome
subunit assembly are emerging targets for new antibacterial
drugs. Curr Opin Pharmacol. 6:453458.
Doolittle WF. 1998. You are what you eat: a gene transfer ratchet
could account for bacterial genes in eukaryotic nuclear
genomes. Trends Genet. 14:307311.
Duchene AM, Giritch A, Hoffmann B, Cognat V, Lancelin D,
Peeters NM, Zaepfel M, Marechal-Drouard L, Small ID. 2005.
Dual targeting is the rule for organellar aminoacyl-tRNA
synthetases in Arabidopsis thaliana. Proc Natl Acad Sci U S A.
102:1648416489.
Durnford DG, Deane JA, Tan S, McFadden GI, Gantt E, Green BR.
1999. A phylogenetic assessment of the eukaryotic lightharvesting antenna proteins, with implications for plastid
evolution. J Mol Evol. 48:5968.
Frickey T, Lupas AN. 2004. PhyloGenie: automated phylome
generation and analysis. Nucleic Acids Res. 32:52315238.
Fronzes R, Christie PJ, Waksman G. 2009. The structural biology of
type IV secretion systems. Nat Rev Microbiol. 7:703714.
Galtier N, Gouy M, Gautier C. 1996. SEAVIEW and PHYLO_WIN:
two graphic tools for sequence alignment and molecular
phylogeny. Comput Appl Biosci. 12:543548.
Gould SB, Waller RF, McFadden GI. 2008. Plastid evolution. Annu
Rev Plant Biol. 59:491517.
Gross J, Meurer J, Bhattacharya D. 2008. Evidence of a chimeric genome
in the cyanobacterial ancestor of plastids. BMC Evol Biol. 8:117.
Harper JT, Keeling PJ. 2003. Nucleus-encoded, plastid-targeted
glyceraldehyde-3-phosphate dehydrogenase (GAPDH) indicates
a single origin for chromalveolate plastids. Mol Biol Evol.
20:17301735.
Huang J, Gogarten JP. 2007. Did an ancient chlamydial endosymbiosis facilitate the establishment of primary plastids? Genome
Biol. 8:R99.
Huang J, Gogarten JP. 2008. Concerted gene recruitment in early
plant evolution. Genome Biol. 9:R109.
Jain R, Rivera MC, Lake JA. 1999. Horizontal gene transfer among
genomes: the complexity hypothesis. Proc Natl Acad Sci U S A.
96:38013806.
Keeling PJ, Palmer JD. 2008. Horizontal gene transfer in eukaryotic
evolution. Nat Rev Genet. 9:605618.
Lange BM, Rujan T, Martin W, Croteau R. 2000. Isoprenoid
biosynthesis: the evolution of two ancient and distinct pathways
across genomes. Proc Natl Acad Sci U S A. 97:1317213177.
Larkin MA, Blackshields G, Brown NP, et al. (13 co-authors). 2007.
Clustal W and Clustal X version 2.0. Bioinformatics. 23:29472948.
Li S, Nosenko T, Hackett JD, Bhattacharya D. 2006. Phylogenomic
analysis identifies red algal genes of endosymbiotic origin in the
chromalveolates. Mol Biol Evol. 23:663674.
Martin W, Schnarrenberger C. 1997. The evolution of the Calvin
cycle from prokaryotic to eukaryotic chromosomes: a case study
of functional redundancy in ancient pathways through
endosymbiosis. Curr Genet. 32:118.
Matsuzaki M, Kuroiwa H, Kuroiwa T, Kita K, Nozaki H. 2008. A cryptic
algal group unveiled: a plastid biosynthesis pathway in the oyster
parasite Perkinsus marinus. Mol Biol Evol. 25:11671179.
Matsuzaki M, Misumi O, Shin IT, et al. (42 co-authors). 2004.
Genome sequence of the ultrasmall unicellular red alga
Cyanidioschyzon merolae 10D. Nature 428:653657.
Miura E, Kato Y, Matsushima R, Albrecht V, Laalami S, Sakamoto W.
2007. The balance between protein synthesis and degradation in
chloroplasts determines leaf variegation in Arabidopsis yellow
variegated mutants. Plant Cell. 19:13131328.

590

MBE
Moustafa A, Beszteri B, Maier UG, Bowler C, Valentin K,
Bhattacharya D. 2009. Genomic footprints of a cryptic plastid
endosymbiosis in diatoms. Science. 324:17241726.
Moustafa A, Reyes-Prieto A, Bhattacharya D. 2008. Chlamydiae has
contributed at least 55 genes to Plantae with predominantly
plastid functions. PLoS ONE. 3:e2205.
Nozaki H, Takano H, Misumi O, et al. (18 co-authors). 2007. A 100%complete sequence reveals unusually simple genomic features
in the hot-spring red alga Cyanidioschyzon merolae. BMC Biol.
5:28.
Obornik M, Green BR. 2005. Mosaic origin of the heme biosynthesis
pathway in photosynthetic eukaryotes. Mol Biol Evol.
22:23432353.
Race HL, Herrmann RG, Martin W. 1999. Why have organelles
retained genomes? Trends Genet. 15:364370.
Reyes-Prieto A, Bhattacharya D. 2007. Phylogeny of Calvin cycle
enzymes supports Plantae monophyly. Mol Phylogenet Evol.
45:384391.
Reyes-Prieto A, Weber AP, Bhattacharya D. 2007. The origin and
establishment of the plastid in algae and plants. Annu Rev Genet.
41:147168.
Rice DW, Palmer JD. 2006. An exceptional horizontal gene transfer
in plastids: gene replacement by a distant bacterial paralog and
evidence that haptophyte and cryptophyte plastids are sisters.
BMC Biol. 4:31.
Richards TA, Dacks JB, Campbell SA, Blanchard JL, Foster PG,
McLeod R, Roberts CW. 2006. Evolutionary origins of the eukaryotic
shikimate pathway: gene fusions, horizontal gene transfer, and
endosymbiotic replacements. Eukaryot Cell. 5:15171531.
Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian phylogenetic
inference under mixed models. Bioinformatics. 19:15721574.
Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based
phylogenetic analyses with thousands of taxa and mixed models.
Bioinformatics 22:26882690.
Sun Q, Zybailov B, Majeran W, Friso G, Olinares PD, van Wijk KJ.
2009. PPDB, the Plant Proteomics Database at Cornell. Nucleic
Acids Res. 37:D969D974.
Tanaka R, Tanaka A. 2007. Tetrapyrrole biosynthesis in higher
plants. Annu Rev Plant Biol. 58:321346.
Timmis JN, Ayliffe MA, Huang CY, Martin W. 2004. Endosymbiotic
gene transfer: organelle genomes forge eukaryotic chromosomes.
Nat Rev Genet. 5:123135.
Tyler BM, Tripathy S, Zhang X, et al. (53 co-authors). 2006.
Phytophthora genome sequences uncover evolutionary origins
and mechanisms of pathogenesis. Science 313:12611266.
Tyra HM, Linka M, Weber AP, Bhattacharya D. 2007. Host origin of
plastid solute transporters in the first photosynthetic eukaryotes. Genome Biol. 8:R212.
Vothknecht UC, Soll J. 2005. Chloroplast membrane transport:
interplay of prokaryotic and eukaryotic traits. Gene 354:
99109.
Yang Y, Glynn JM, Olson BJ, Schmitz AJ, Osteryoung KW. 2008.
Plastid division: across time and space. Curr Opin Plant Biol.
11:577584.
Yoon HS, Hackett JD, Pinto G, Bhattacharya D. 2002. The single,
ancient origin of chromist plastids. Proc Natl Acad Sci U S A.
99:1550715512.
Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT.
2006. Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res.
16:10991108.
Zybailov B, Rutschow H, Friso G, Rudella A, Emanuelsson O, Sun Q,
van Wijk KJ. 2008. Sorting signals, N-terminal modifications and
abundance of the chloroplast proteome. PLoS ONE. 3:e1994.

You might also like