Professional Documents
Culture Documents
The Hornwort Genome and Early Land Plant Evolution. 2020
The Hornwort Genome and Early Land Plant Evolution. 2020
https://doi.org/10.1038/s41477-019-0588-4
Hornworts, liverworts and mosses are three early diverging clades of land plants, and together comprise the bryophytes. Here,
we report the draft genome sequence of the hornwort Anthoceros angustus. Phylogenomic inferences confirm the monophyly of
bryophytes, with hornworts sister to liverworts and mosses. The simple morphology of hornworts correlates with low genetic
redundancy in plant body plan, while the basic transcriptional regulation toolkit for plant development has already been estab-
lished in this early land plant lineage. Although the Anthoceros genome is small and characterized by minimal redundancy,
expansions are observed in gene families related to RNA editing, UV protection and desiccation tolerance. The genome of
A. angustus bears the signatures of horizontally transferred genes from bacteria and fungi, in particular of genes operating in
stress-response and metabolic pathways. Our study provides insight into the unique features of hornworts and their molecular
adaptations to live on land.
L
and plants (Embryophyta) probably originated in the early CO2-concentrating pyrenoids, which have not been found in any
Palaeozoic1, initiating the colonization of the terrestrial habi- other land plants but are widespread among green algae10. Other
tat. Because bryophytes (hornworts, liverworts and mosses) unusual features of hornworts include the persistent basal meristem
emerged from the early split in the diversification of land plants, they in the sporophyte and mucilage-filled cavities for colonial symbi-
are key to the study of early land plant evolution (Supplementary onts on the gametophyte11. Most hornworts form tight symbiotic
Note 1.1). Unlike other extant land plants, the vegetative body of relationships with cyanobacteria12 and fungal endophytes (espe-
bryophytes is the haploid gametophyte, the sporophyte is always cially Glomeromycota and Mucoromycotina)13.
unbranched and permanently attached to the maternal plant, and Here, we present the draft genome of A. angustus Steph.
both generations lack lignified vascular tissue2. Bryophytes occur in (Anthocerotaceae) (see Methods, Supplementary Figs. 1 and 2, and
nearly all terrestrial habitats on all continents but are absent from Supplementary Note 1.2). Completion of this high-quality horn-
marine environments3. wort genome complements previously sequenced representatives
With only 200–250 species worldwide, the diversity of hornworts of the mosses (Physcomitrella patens14) and liverworts (Marchantia
is much lower than that of the other six extant lineages of embryo- polymorpha15) and provides a unique opportunity to revisit bryo-
phytes (angiosperms, gymnosperms, ferns, lycophytes, mosses and phyte phylogeny, early land plant evolution and the adaptation of
liverworts)4. Long considered sister to all other land plants, or sister plants to live on land.
to all extant vascular plants, hornworts have recently been resolved
as sister to the setaphytes (that is, the mosses and liverworts) within Genome assembly and annotation
monophyletic bryophytes1,5–8. Still, hornworts possess a series of dis- We sequenced the genome of A. angustus (a single individual of
tinct features9. For instance, most hornworts have chloroplasts with unknown sex from the dioecious species) using a combination
1
State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China. 2University of Chinese Academy
of Sciences, Beijing, China. 3PubBio-Tech Services Corporation, Wuhan, China. 4Key Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical
Garden, Shenzhen & Chinese Academy of Science, Shenzhen, China. 5BGI-Shenzhen, Shenzhen, China. 6Key Laboratory of National Forestry and Grassland
Administration for Orchid Conservation and Utilization at College of Landscape Architecture, Fujian Agriculture and Forestry University, Fuzhou, China.
7
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium. 8VIB Center for Plant Systems Biology, Ghent, Belgium. 9Department
of Biology, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA. 10Department of Ecology and Evolutionary Biology,
University of Connecticut, Storrs, CT, USA. 11Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of
Sciences, Kunming, China. 12Center for Plant Diversity and Systematics, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China.
13
Sino–Africa Joint Research Center, Chinese Academy of Sciences, Wuhan, China. 14College of Forestry and Landscape Architecture, South China Agricultural
University, Guangzhou, China. 15Center for Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, Pretoria, South Africa.
16
College of Horticulture, Nanjing Agricultural University, Nanjing, China. 17Fujian Colleges and Universities Engineering Research Institute of Conservation and
Utilization of Natural Bioresources, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, China. 18These authors contributed equally: Jian
Zhang, Xin-Xing Fu, Rui-Qi Li, Xiang Zhao, Yang Liu, Ming-He Li, Arthur Zwaenepoel. 19These authors jointly supervised this work: Shou-Zhou Zhang, Yves Van
de Peer, Zhong-Jian Liu, Zhi-Duan Chen. *e-mail: shouzhouz@126.com; yves.vandepeer@psb.vib-ugent.be; zjliu@fafu.edu.cn; zhiduan@ibcas.ac.cn
Duplication events
121 390
0 3
363
527 Anthoceros angustus versus Marchantia polymorpha
1099 94 400
238 Anthoceros angustus versus Physcomitrella patens
1,421 3 2
302
42 Marchantia polymorpha versus Physcomitrella patens
4,263 300
911
1 4
48
497
262 33 200
135
5
28
28
18
102
16 18 551
551 1,542 100
73
Anthoceros angustus 80
0
5,440 Charophyta
0 1 2 3 4 5
KS
Chlorophyta
b Species Gene families Orphans Genes
+1,078/–875
Arabidopsis thaliana 12,301 4,206 27,411
+43/–713
12,098
+258/–2,296
+495/–643 Genlisea aurea 10,060 4,434 17,685
Gene families 12,768
gain/loss
+593/–962
Vitis vinifera 12,399 2,172 25,676
+988/-210
12,916 +1,300/–1,170
Oryza sativa 11,950 4,647 27,912
+130/–421
11,820
Tracheophyta
+1,091/–1,271
+127/–932 Phalaenopsis equestris 11,640 9,156 29,431
+1,923/–409 12,111
12,138
+581/–2,222
Zostera marina 10,470 3,888 20,421
+1,228/–710
10,624 +1,092/–1,430
Amborella trichopoda 11,800 8,326 26,846
+376/–580
10,106 +1,689/–3,492
Picea abies 8,821 5,748 26,437
+1,660/–2,181
Selaginella moellendorffii 9,585 4,384 22,273
+2,017/–296
10,310
+1,334/–1,248
Physcomitrella patens 9,566 9,722 35,796
+138/–447 9,480
Bryophyta
+565/–1,101
+497/–370 +238/–759 Marchantia polymorpha 8,944 6,292 19,287
8,589 9,789
+497/–2,145
Anthoceros angustus 8,141 2,997 14,629
+2,042/–0
8,462
+618/–2,955
Chara braunii 6,252 5,220 22,776
Charophyta
+809/–969
Klebsormidium nitens 8,302 3,807 16,044
+391/–780
6,420 Volvox carteri 7,934 3,978 14,434
+2,045/–524
8,323
+639/–293
+179/–845 Chlamydomonas reinhardtii 8,669 5,991 17,741
6,802
Chlorophyta
+467/–1,906
Ulva mutabilis 5,363 5,289 12,924
+1,048/–0
7,468
+251/–1,156
Coccomyxa subellipsoidea 5,850 2,614 9,629
+175/–888 6,755
+245/–1,143
Chlorella variabilis 5,857 2,629 9,780
Fig. 1 | Comparative genomic analysis of A. angustus and 18 other plant species. a, Comparison of the number of gene families identified by OrthoMCL.
The Venn diagram shows the shared and unique gene families in A. angustus, Setaphyta, Tracheophyta, Charophyta and Chlorophyta. The gene-family number
is listed in each of the components. b, Gene-family gain (+)/loss (−) among 19 green plants. The numbers of gained (blue) and lost (red) gene families are
shown above the branches. The boxed number indicates the gene-family size at each node. The number of gene families, orphans (single-copy gene families)
and number of predicted genes is indicated next to each species. c, Comparison of whole paranome, anchor pair and one-to-one orthologue distribution of the
number of synonymous substitutions per synonymous site (KS) across the three bryophyte species (P. patens, M. polymorpha and A. angustus).
to that of the other two bryophyte genomes (Supplementary Fig. 15, of streptophytes and the other with the transition to land15,21. In
Supplementary Table 21 and Supplementary Note 4.1). The diver- plants, genes encoding TFs are among the most highly retained
sity of TF genes in extant plants is rather stable (Supplementary following polyploidy22, a pattern reflected in the comparison of
Fig. 15) and resulted from two ancient bursts of TF families during the three bryophyte genomes14,15. A. angustus and M. polymorpha,
the diversification of green plants: one concomitant with the origin whose genome did not undergo WGDs hold a small number of
TF compared to P. patens, which experienced at least one WGD in in genes composing the network underlying the development of its
its ancestry, resulting in a substantially larger number of TF genes body plan, the TF gene families linked to responses to terrestrial
(Supplementary Fig. 15). It supports the hypothesis that the WGD is environmental stimuli exhibit lineage-specific gene expansions in
an important mechanism for expansion of TF families23. A. angustus, namely, the LISCL genes for mycorrhizal signalling in
Phylogenetic analyses of 24 gene families contributing to the the GRAS gene family39 (Supplementary Fig. 53) and the clade SIP1
development of plant body plans or adaptation to the terrestrial envi- for ABA signalling under water stress in the Trihelix gene family40
ronment, including 16 TF gene families24,25 (Fig. 2a, Supplementary (Supplementary Fig. 54).
Figs. 16–54, Supplementary Table 22 and Supplementary Note 4.2),
confirm that a considerable number of genes, such as genes involved Gene-family expansion
in gametophyte or sporophyte development, haploid–diploid tran- Besides two TF gene families, the A. angustus genome harbours a
sition, meristem development, filamentous growth, photomorpho- variety of other uniquely expanded gene families (Supplementary
genesis and auxin signalling (Fig. 2), composed the genetic toolkit Fig. 55). The genome comprises an very large number of pen-
of plants before the conquest of land26. In particular, the TF genes for tatricopeptide repeat (PPR) genes for plant organellar RNA pro-
filamentous growth and auxin signalling arose in charophyte green cessing41, accounting for approximately 7.90% of the predicted
algae27,28 (Fig. 2b), which are thought to be the closest living rela- protein-coding genes. The expanded PPR genes are PLS-class PPR
tives to extant land plants, implying the preliminary establishment genes (Supplementary Fig. 55, Supplementary Tables 23 and 24
of relatively more complex body plan in these basal streptophytes and Supplementary Note 5.1). Most of the PLS-class PPR proteins
for plant terrestrial adaptation29. Furthermore, a set of genes under- in A. angustus were predicted to be localized in the mitochon-
lying key morphological innovations for terrestrial adaptation prob- drion or chloroplast (Supplementary Table 24). The expansion of
ably evolved along with the colonization of land30,31 (Fig. 2b), such the PLS-class PPR genes correlates with the large number of RNA
as SMF and ICE for stomatal development (Supplementary Figs. 29 editing sites estimated in the organellar genomes of A. angustus
and 30), APB, CLE and CLV1 for 3D growth (Supplementary (Supplementary Table 23). Our findings add further support to
Figs. 36 and 50–52), and VNS for water-conducting-cell develop- the hypothesis that an increase in the number of both RNA editing
ment (Supplementary Fig. 38). The sporophyte morphology of bryo- sites and PPR genes (especially the PLS-class PPR) occurred after
phytes is relatively simple, and many of the genes involved in the the separation of land plants from green algae41,42 (Supplementary
elaborate regulation of embryogenesis32, such as FUS3, LEC1, LEC2, Table 23). The reduced number of PPR genes and absence of RNA
NF-YA1/9 and NF-YA3/5/6/8 are absent in A. angustus, Marchantia editing in marchantiid liverworts are most probably secondary
and Physcomitrella (Fig. 2a and Supplementary Figs. 39–41). The losses (Supplementary Table 23), as the organellar RNA editing and
ABI3 genes that mainly function in embryo maturation and seed plant-specific extensions of PPR genes were also found in junger-
desiccation tolerance in flowering plants are present in bryophytes, manniid liverworts43. Through RNA editing, the PPR proteins could
and have roles in desiccation tolerance in their vegetative tissues33. act as ‘repair’ factors that alleviate DNA damage caused by increased
In A. angustus, most genes involved in the development of plant UV exposure in terrestrial environments41. Other stress-response
body plans have a single copy, and a few A. angustus TF gene fami- gene families have also expanded in A. angustus, such as cupin and
lies even lost a subset of duplicates (Fig. 2a and Supplementary cytochrome P450 (CYP) (Supplementary Fig. 55). Two groups of
Figs. 16–52). For example, in the bHLH family, the class I RSL gene cupin (PF00190) proteins—that is, monocupins and bicupins—
that controls the development of rhizoids and root hairs, thought can be recognized on the basis of the number of cupin domains44.
to have been important for the colonization of land34, is present in In A. angustus, the cupin gene family has undergone a signifi-
the A. angustus genome, whereas the class II RSL genes respon- cant expansion (Supplementary Table 25) such that it comprises
sible for regulating protonema differentiation in P. patens or root more bicupin genes than any other plant (Fig. 3a, Supplementary
hair elongation in A. thaliana by auxin35 are absent (Supplementary Figs. 56 and 57, Supplementary Table 25 and Supplementary
Fig. 27 and Supplementary Note 4.2). The lack of class II RSL genes Note 5.2). Expansion of the cupin gene family in A. angustus resulted
in A. angustus might be related to the morphological simplifica- mainly from tandem gene duplications (Fig. 3b,c and Supplementary
tion of this species with respect to tip-growing filamentous struc- Note 5.2). Since bicupins (that is, 11S and 7S seed storage proteins)
tures2. For the KNOX genes from the homeobox gene family, the are desiccation-tolerant proteins in higher land plants44, the large
A. angustus genome retains one class II KNOX gene for haploid- number of bicupin genes in A. angustus could indicate adaptation for
to-diploid morphological transition36, but lacks class I KNOX genes coping with drought stress in the terrestrial environment. The large
(Supplementary Fig. 23), whose activity is necessary for seta exten- number of A. angustus-specific monocupin genes are homologous
sion in the sporophytes in P. patens37. The absence of this gene might to the P. patens PpGLP6 gene (XP_001782709.1) (Supplementary
be linked to the absence of setae in hornworts2. The genome of Fig. 57 and Supplementary Note 5.2), which encodes a protein
A. angustus also holds few type II MIKCC MADS-box, class B ARF, with manganese-containing extracellular superoxide dismutase
NCARF and short PIN genes, as a result of gene losses suggested (SOD) activity to respond to oxidative stress in terrestrial environ-
by our phylogenetic analysis (Supplementary Figs. 17, 42, 45 and ments45. The CYP genes for primary and secondary metabolism
Supplementary Note 4.2). The class II RSL, class B ARF, NCARF and have also expanded in A. angustus (Supplementary Fig. 55 and
short PIN genes all have auxin-related functions (Supplementary Supplementary Note 5.3). For instance, genes belonging to the sub-
Note 4.2). Since these auxin-related genes were consistently lost families CYP71 and CYP85 contain 56 and 46 genes, respectively
in A. angustus, this hornwort species possesses the simplest auxin (Supplementary Figs. 58–61 and Supplementary Tables 26 and 27).
molecular toolkit among all investigated land plants so far38. Thus, The A. angustus CYP genes were assigned to 28 KEGG pathways,
like the liverwort M. polymorpha15, A. angustus exhibits low redun- of which ‘flavonoid 3′-monooxygenase/flavonoid 3′,5′-hydroxy-
dancy for genes shaping the plant body plan (Fig. 2b). Such a lim- lase’ and ‘abscisic acid 8′-hydroxylase’ were the most representative
ited toolkit may be characteristic of the ancestor to bryophytes and (Supplementary Table 28). Within the CYP71 gene subfamily, genes
hence, perhaps, of the earliest land plants with a dominant thalloid homologous to flavonoid 3′-hydroxylase (monooxygenase) (F3'H)
gametophyte, and provide the foundation to explaining the architec- or flavonoid 3′,5′-hydroxylase (F3′5′H) genes that are involved
tural simplicity of these plants. By contrast, the genome of P. patens, in flavonoid biosynthesis46 are highly expanded in A. angustus
which develops a leafy stem, has the most TF genes involved in (Supplementary Fig. 59 and Supplementary Note 5.3). Because fla-
the development of plant body plans among the compared bryo- vonoids have an important role in UV-B protection46, the expan-
phytes (Fig. 2b). Although the genome of A. angustus seems poor sion of flavonoid biosynthesis related genes in A. angustus might
Vascular plants
Chlorophytes
Charophytes
Liverworts
Hornworts
Mosses
0 1 2 3 ≥4
S. moellendorffii
M. polymorpha
A. trichopoda
C. reinhardtii
A. angustus
A. thaliana
P. patens
C. braunii
V. carteri
K. nitens
Family Clade Function
MIKCc 39 20 3 6 1 0
3 1 0 1
MADS-box MIKC* 7 2 3 11 1 1
M 63 12 13 7 0 11 0 0 1 1 Gametophyte/sporophyte
TCP-I 13 6 5 4 1 1 1 0 0 0 development
TCP
TCP-II 11 9 5 2 1 1 1 0 0 0
LFY 1 1 1 2 1 1 1 1 0 0
RWP-RK RKD 5 3 2 1 1 1 2 5 5 3
BELL 13 6 2 4 1 1 1 1 1 1 Haploid–diploid transition
KNOX-II 4 3 2 2 1 1 0 1
1 1
Homeobox KNOX-I 4 4 3 3 1 0 1 1
WOX 16 9 9 3 1 6 0 1 0 0 Meristem development
HD-Zip-III 5 3 3 5 1 1 1 1 0 0
RSL-I 2 1 3 2 1 1 0 0 0 0
RSL-II 4 1 2 5 1 0 0 0 0 0 Filamentous growth
bHLH
LRL 5 3 3 2 1 2 1 3 0 0
PIF 15 5 4 4 1 1 1 1 0 0
bZIP HY5 2 2 2 2 1 1 1 1 1 2 Photomorphogenesis
NF-YC NF-YC2/3/9 3 2 3 6 2 2 1 1 1 0
ARF-A 5 5 3 8 1 1 0 0 0 0
B3-ARF ARF-B 15 4 2 4 1 0 0 0 0 0 Auxin signalling
ARF-C 3 4 2 3 1 3 1 0 0 0
SMF 3 3 3 2 1 1 0 0 0 0
bHLH Stomatal development
ICE 2 2 3 3 2 1 0 0 0 0
AP2 euANT/APB 8 5 2 4 1 1 0 0 0 0 3D growth
NAC VNS 13 5 4 8 1 1 0 0 0 0 Water-conducting cell development
ABI3 1 1 5 8 2 1 0 0 0 0
B3-LAV FUS3 1 0 0 0 0 0 0 0 0 0
LEC2 1 1 0 0 0 0 0 0 0 0
Embryogenesis
NF-YB LEC1 2 1 1 0 0 0 0 0 0 0
NF-YA1/9 2 1 0 0 0 0 0 0 0 0
NF-YA
NF-YA3/5/6/8 4 2 0 0 0 0 0 0 0 0
Single-copy TFs
Green algal Hornworts Gene loss
ancestor (A. angustus) Simple gametophyte
Bryophytes
Multicellular sporophyte
Embryogenesis
3D patterning
Stomata
Move to land Vascular plants
Vasculature
Dominant sporophyte
Fig. 2 | Major TFs for plant body plan and evolutionary innovations within plants. a, Overview for the number of major TFs for plant body plan in ten
green plants. Colour key on the upper left of the heatmap denotes the TF numbers. b, Major innovations in plants and evolutionary features of three
bryophyte lineages.
a
Species No. of bicupins No. of monocupins Total
Klebsormidium nitens 9 15 24
Anthoceros angustus 31 48 79
Physcomitrella patens 0 20 20
Selaginella moellendorffii 13 50 63
Picea abies 15 31 46
Amborella trichopoda 8 36 44
Arabidopsis thaliana 10 33 43
Oryza sativa 21 43 64
AANG004394
AANG004395
AANG009253
AANG009255
AANG009258
AANG009264
AANG009265
AANG013591
AANG009342
AANG009343
AANG008040
AANG013594
AANG008042
AANG008775
AANG008776
AANG008777
950 990 kb 280 300 kb 20 60 kb 740 760 kb 1,140 1,160 kb 550 630 kb
Scaffold64
160 Scaffold24 Scaffold22
AANG011559 910
AANG006112
AANG006111
AANG011560 AANG006497
AANG011562 AANG006501
AANG011563 950 kb
AANG011564 Scaffold6
240 kb
AANG001908 100
880 900 kb
AANG001910
Scaffold15 AANG001913
AANG005009 660 AANG001914
AANG005010 AANG001916
0.6 AANG001917
AANG005011 180 kb
700 kb
Fig. 3 | Expansion of cupin gene family in A. angustus. a, A summary of the number of cupin genes from nine species based on a Pfam search of cupin_1
domain (PF00190). b,c, Phylogenetic trees show cupin genes in nine plant genomes: bicupins (b) and monocupins (c). The colour of each branch
corresponds to the background colour for each species in a. The tandem duplicated gene clusters are ordered and shown on scaffolds of the A. angustus
genome. The scale bars in the trees show the number of amino acid substitutions per site.
again represent a molecular adaptation to life in the terrestrial other plant genomes or transcriptomes with reference to the CCM
environment. Among the CYP85 genes, the genes homologous to genes from chlorophyte green algae Chlamydomonas reinhardtii49,50
abscisic acid 8′-hydroxylase genes involved in abscisic acid catabo- (Supplementary Figs. 62–71 and Supplementary Note 6.2).
lism during drought stress response47 are also uniquely abundant in A. angustus and all other green plants harbour orthologues of
A. angustus (Supplementary Fig. 60 and Supplementary Note 5.3), CAH1/2 whose expression is modulated by external inorganic car-
and may account for the high desiccation tolerance of A. angustus. bon concentration; of CemA, which maintains stromal pH balance;
Like the cupin gene family, many of the above expanded gene fami- of LCI11, which mediates the entry of HCO3− in the thylakoid lumen;
lies occur in tandem arrays (Supplementary Table 29). At least 9.82% and of RCA1 and RBCS1/2, which regulate CO2 fixation by Rubisco
of protein-coding genes in A. angustus form ‘tandem’ clusters in the (Supplementary Figs. 62, 65 and 69–72). By contrast, orthologues
genome (Supplementary Table 30 and Supplementary Note 5.4), of CCP1/2, which mediate the entry of HCO3- into the chloroplast
compared with only 1% in P. patens14 and 5.9% in M. polymorpha15. stroma and of EPYC1, which regulate CO2 fixation by Rubisco were
only present in chlorophyte green algae (Supplementary Figs. 67
CO2-concentrating mechanism and 72 and Supplementary Note 6.2). The three inorganic car-
Hornworts are the only extant land plant lineage harbouring a bon transporters (HLA3, LCI1 and LCIA-like genes) only occur
pyrenoid-based CO2-concentrating mechanism (CCM) similar in bryophytes and green algae, whereas the A. angustus genome
to that of green algae9,48 (Supplementary Note 6.1), for which the lacks the related orthologues (Supplementary Figs. 63, 66 and 72
key components have been identified49. To clarify whether the and Supplementary Note 6.2). Unexpectedly, the three kinds of car-
CCM components of green algae have orthologues in hornworts bonic anhydrases (CAH3, CAH9 and LCIB/C), which are essen-
and other land plants, we searched the A. angustus genome and tial components of CCM, are conserved in non-angiosperm land
Fig. 4 | Phylogenetic affinities of genes horizontally transferred to A. angustus. a, Phylogenetic tree of glyoxalase (PF13468). b, Phylogenetic tree of
NAD-binding dehydrogenase (PF08635). c, Phylogenetic tree of glucuronyl hydrolase (PF07470). d, Phylogenetic tree of DNA methyltransferase
(PF02870 and PF01035). The stars indicate that the Anthoceros sequence or bryophyte sequences formed a monophyletic clade with homologues of
putative HGT donor, reflecting Anthoceros-specific or bryophyte-specific HGT events. Maximum-likelihood bootstrap support values ≥50% are shown
above the branches. Red, hornworts and other bryophytes; cyan, green algae; grey, metazoan; orange, stramenopiles; blue, bacteria; yellow, fungi; purple,
archaea. The homologues from the kingdom other than the one that HGT donors are involved in are used as the outgroup. The scale bars in the trees show
the number of amino acid substitutions per site.
plants and green algae (Supplementary Figs. 62, 64, 68 and 72). The Horizontal gene transfer
A. angustus genome retains the orthologues of both LCIB/C and Horizontal gene transfer (HGT) from bacteria or fungi has been
CAH3 genes, but has no copy of CAH9 (Supplementary Fig. 72). reported for both the moss P. patens51 and the liverwort M. polymor-
Besides green algae, the essential CCM components occur in both pha15. Consistent with those observations, the taxonomic distribu-
hornworts and other non-angiosperm land plants that lack pyre- tion of BLASTP hits following careful phylogenetic analysis and
noids (Supplementary Fig. 72). It implies that the CCM could be an manual inspection suggested that 19 genes from 14 families origi-
ancestral mechanism of CO2 fixation by plants, and pyrenoids for nated from HGTs from either bacteria or fungi (Supplementary
CCM are homologous between hornworts and green algae, whereas Fig. 6 and Supplementary Note 7.1). Bacterial donors are distrib-
both CCM components and pyrenoids have undergone multiple uted among nine families: Actinobacteria (three gene families),
losses in land plants in response to atmospheric changes in terres- Alphaproteobacteria (two gene families), Bacteroidetes (two gene
trial environments10,48. families), Firmicutes (one gene family) and Verrucomicrobia
(one gene family). Five families were acquired from fungi, The sporangium was opened and the spores were homogenized and spread onto
belonging to Ascomycota, Basidiomycota, hornwort-symbiotic the 1/2 KnopII agar medium57 in Petri dishes (Supplementary Fig. 1b). The culture
temperature was between 21 °C and 25 °C. Spores germinated within a couple of
Chytridiomycota or Mucoromycota13 (Fig. 4a,b, Supplementary days, and then the sporelings started to grow. After approximately three to four
Figs. 73–84 and 86 and Supplementary Table 31). The detection weeks, the gametophyte started to grow (Supplementary Fig. 1c,d). Since spores
of specific HGT in all three fully sequenced bryophytes is remark- are aposymbiotic, we did not find the phenomenon of mucilage-filled cavities
able, and is probably related to the fact that these organisms form colonization by cyanobacteria on the A. angustus gametophyte during the sterile
culture. A gametophyte from a single spore was selected and cultured by asexual
symbioses with diverse bacteria and fungi, which, together with
propagation. The tissue yielded from subculture was used for genome and RNA
the weakly protected tissues in the early developmental stages in sequencing. We tried to induce sexual reproduction by dropping the growth
the life cycle of these plants, provide the possibility for HGT51. temperature of gametophyte cultures to 10 °C and 16 °C, respectively; however
In addition, we found that two families originating from HGT until now they have not yet produced reproductive organs. Therefore, the
from bacteria are shared by the three bryophyte lineages, and sequenced A. angustus is indeed a single-sex individual, which is sequenced at the
gametophyte phase of its life cycle.
one originating from a HGT from fungi is shared between horn- Genomic DNA was isolated using the Plant DNAzol reagent for genomic
worts and liverworts only (Fig. 4c,d, Supplementary Figs. 85 and DNA extraction (Life Technologies) according to the manufacturer’s protocols.
86, Supplementary Table 31 and Supplementary Note 7.2). The For whole-genome shotgun sequencing, ten sequencing libraries with insert sizes
HGT genes mentioned above (SCUO value 0.2127) exhibit a sig- ranging from 170 bp to 40 kb were generated (Supplementary Table 1). Sequencing
nificantly more biased codon-usage pattern than non-HGT genes libraries were constructed using a library construction kit (Illumina). All libraries
were sequenced on the Illumina HiSeq 2000 platform. Raw sequencing reads were
(SCUO value 0.1595) (Supplementary Fig. 87a), which may be trimmed with Trimmomatic (v.0.33)58. Only high-quality reads with a total length
linked to their higher GC content (57.58%) than non-HGT genes of 126,532,381,412 bp were used for further analysis (Supplementary Table 1). For
(53.26%) (Supplementary Fig. 87b). Oxford Nanopore sequencing, we constructed a genomic DNA library using the
The HGT-derived genes in A. angustus mainly contribute to ONT 1D ligation sequencing kit (SQK-LSK108) according to the manufacturer’s
metabolic processes, oxidation–reduction and stress response instructions. The sequencing used a single 1D flow cell on a PromethION
sequencer (Oxford Nanopore Technologies). A total of 63,614,292,295 bp raw reads
(Supplementary Table 31). Some transferred genes related to were generated, of which 36,070,452,175 bp were retained for further analysis after
carbohydrate metabolism are predicted to encode glucuronyl filtering and trimming (Supplementary Table 3).
(AANG011893) and glycosyl hydrolases (AANG004297) (Fig. 4c, Total RNA was extracted using the PureLink Plant RNA reagent (Life
Supplementary Fig. 79 and Supplementary Table 31), which func- Technologies) and further purified using TRIzol reagent (Invitrogen). For
tion in cell wall synthesis and modification and might extend the transcriptome sequencing (RNA sequencing), libraries with insert sizes ranging
from 200 bp to 500 bp were constructed using the mRNA-Seq Prep Kit (Illumina)
metabolic flexibility of A. angustus in changing environments52. and then sequenced using the Illumina HiSeq 2000 platform. For small-RNA
The Alphaproteobacteria-derived gene AANG004679 encodes sequencing, the library was generated from RNA sample using the Truseq
glyoxalase, which is related to drought stress tolerance53 (Fig. 4a). Small RNA Preparation kit (Illumina) and sequenced on the Illumina
The Actinobacteria-derived DNA methyltransferase genes that are HiSeq 2500 platform.
present only in the three groups of bryophytes are related to DNA
repair54 (Fig. 4d). The hornworts and liverworts share the fungi- Decontamination. The GC content versus k-mer frequency distribution pattern
derived terpene synthase-like (MTPSL) genes (Supplementary of the Illumina raw reads (Supplementary Table 1) after trimming presented two
Fig. 85). Terpene synthases are pivotal enzymes for the biosynthesis large groups: one group with a low k-mer frequency (<50) and a wide GC content
distribution range (median number at 0.7), and the other group with a high k-mer
of terpenoids, which serve as chemical defences against herbivores frequency (60–165) and a concentrated GC content distribution range (median
and pathogens55. Some horizontally transferred genes in A. angustus, number at 0.5) (Supplementary Fig. 2a). The BLASTN results against the NCBI
such as NAD-binding dehydrogenase (Fig. 4b) and MTPSL genes nucleotide database revealed that the former sequences were mainly from a variety
(Supplementary Fig. 85), underwent subsequent gene duplications. of bacteria and the latter were the real genome sequences of A. angustus. We also
investigated the k-mer distributions of the raw reads from the other two published
The results suggest that the acquisition of foreign genes might have
hornwort genomic sequences, A. agrestis (accession: ERX714368)59 and
provided additional means for environmental adaptation during Anthoceros punctatus (accession: SRX538621)60, and found a similar distribution
evolution of the hornwort lineage. pattern as that of A. angustus, containing two groups, one for the contaminant
sequences and the other for sequences of the plant itself (Supplementary
Conclusions Fig. 2c,d). Because external bacterial contaminations from the laboratory cause
A. angustus to turn yellow and die during culturing, and all three Anthoceros species
As land pioneers, the three bryophyte groups form a well-sup- through axenic cultures still have the same bacterial contamination problems
ported monophyletic lineage, with hornworts sister to liverworts (Supplementary Fig. 2a,c,d), we infer that these bacterial contaminations are
and mosses. The genome of hornwort A. angustus shows no evi- from symbiotic bacteria of Anthoceros that might accompany spores hiding in the
dence of WGDs and low genetic redundancy for networks under- sterilized sporangium. Furthermore, we performed the DAPI staining analysis61 to
lying plant body plan, which may be congruent with an overall investigate the distribution of symbiotic bacteria in A. angustus. The gametophytes
were stained by 0.2 mg l−1 DAPI (4′,6-diamidino-2-phenylindole dihydrochloride;
simple body plan. Hornworts have retained the essential compo- Sigma, cat. no. D9564) for five minutes. The stained gametophytes were washed
nents of CCM found in green algae in response to the atmospheric three times, and then observed using confocal microscopy. The bacterial micro-
changes in terrestrial environments. Meanwhile, the gene inven- colonies were observed on the outer surface, as well as in the intercellular space of
tory in A. angustus expanded mainly through tandem duplication the gametophytes of A. angustus (Supplementary Fig. 3). Based on the GC content
and HGT. In particular, the expansion of specific gene families versus k-mer frequency distribution pattern of the Illumina raw reads and the result
of the DAPI staining, we could imagine that there is a certain amount of bacterial
and the acquisition of foreign genes have provided additional met- sequences remaining in the genome sequencing data of A. angustus. In order to
abolic abilities in hornworts that probably facilitated their survival isolate them, we performed a series of decontamination steps. After generating the
in a terrestrial environment. Together, our results indicate how the k-mer frequency, we chose the high-abundance k-mer depth (60–165) and retained
draft genome of A. angustus provides a useful model for studying the corresponding reads for further analysis. This treatment yielded filtered reads
early land plant evolution and the mechanism of plant terrestrial with a total length of 17,099,027,576 bp (Supplementary Table 2). The distribution
pattern of GC content versus k-mer frequency of the A. angustus filtered reads is
adaptation. depicted in Supplementary Fig. 2b, which shows an entire group with a sequencing
depth of approximately 150×. Furthermore, we performed error correction for
Methods filtered Nanopore reads using decontaminated Illumina reads by Nextdenovo
Sample preparation and sequencing. The natural populations of A. angustus (v.2.0)62, resulting in 9,247,957,448 bp corrected reads (Supplementary Table 3).
Steph. were collected from Jinping County, Yunnan Province, China. The voucher Through MEGABLAST against the NCBI nucleotide database,
specimen has been deposited at the herbarium, Institute of Botany, Chinese we further removed 5,463,972,682 bp prokaryotic sequences or organellar
Academy of Sciences, Beijing, China with collection number W1879-2010-01-18. sequences, and finally got 3,783,984,766 clean reads with a sequencing depth of
The sporophytes of A. angustus were detached from the gametophytes, sterilized in approximately 35× (Supplementary Table 3). A total of approximately
10% sodium hypochlorite and subsequently rinsed with distilled water56. 185× coverage was obtained finally.
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis Software used are listed as follows:BLASTP (ncbi-BLAST v2.2.28), BLASTN (ncbi-BLAST v2.2.28), TBLASTN (ncbi-BLAST v2.2.28),
Nextdenovo (V2.0), Pilon (v1.22), SSPACE (v3.0), BUSCO (v3), Trimmomatic (v0.33), Trinity (v2.5.1), TransDecoder (v5.0.2), Tandem
Repeats Finder (v4.09), RepeatMasker (v4.1.0), LTR_FINDER (v1.0.2), PILER (v1.3.4.), RepeatModeler (v1.0.3), AUGUSTUS (v2.5.5),
GlimmerHMM (v3.0.1), GeneWise (v2.4.1), MAKER (v1.0), TopHat (v2.1.1), Cufflinks (v2.2.1), miREvo (v1.2), tRNAscan-SE (v1.3.1),
INFERNAL (v1.1), OrthoMCL (v2.0), MAFFT (version 7), TranslatorX (v0.9), RAxML (v7.2.3), PAML (v4.7), PHYLIP (v3.695), wgd (v3.0), I-
ADHoRe 3.0, iTAK (version 1.7), Mesquite (version 3.51), HMMER (v 3.1b2), CIPRES Science Gateway (V. 3.3), Genesis (v3.0), CodonO.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
October 2018
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
The A. angustus genome project has been deposited at the NCBI under the BioProject number PRJNA543716. The genome sequencing data were deposited in the
Sequence Read Archive (SRA) database under the accession number SRR9696346. The A. angustus transcriptome project has been deposited at the NCBI under
BioProject PRJNA543724. The transcriptome sequencing data were deposited in the Sequence Read Archive (SRA) database under the accession number
SRR9662965. The assembled genome sequences, gene models, miRNA data are available via DRYAD (https://doi.org/10.5061/dryad.msbcc2ftv). All data that
1
support the findings of this study are also available from the corresponding authors upon request.
Data exclusions Lines 416-456, 466-472: The prokaryotic sequences and organellar sequences were removed from sequencing data and pre-assembled
genome data. There are prokaryotic sequences and organellar sequences that involved in the genome sequencing data. Exclusion of the
contamination from foreign DNA sequences and organellar sequences is the prerequisite for accurate genome assembly. Through choose of
high-abundance k-mer reads, error-correction and MEGABLAST check, 3,78 Gb high-quality clean reads of Nanopore sequencing remained for
A. angustus genome assembly.
Lines 521-522: We excluded annotations only characterized as hypothetical/predicted protein, since these proteins could not be treated as
really functionally annotated ones.
Lines 542-543: During the comparative analysis, we chose the longest transcript to represent each gene and removed mitochondrial and
chloroplast genes, since the used genome datasets include multiple transcripts and organellar genes that might complicate the comparative
analysis.
Lines 617-618: The mean gene family size was calculated for all gene families, excluding orphans and species-specific families, since these
genes are unique to individual species and do not have orthologs in other species for comparison.
Lines 624-627: During the gene family expansion identification, transposon-derived gene families were removed, since the distribution of such
families is likely to be a consequence of the gene models derived from a repeat-masked genome sequence and therefore may be artefactual.
Lines 638-640: The sequences without support of transcript evidence were excluded from the HGT candidates, since these sequences might
be contaminated ones but not real HGT genes.
Replication The spore germination experiment was repeated three times independently. The DAPI staining experiment was repeated three times
independently.
Randomization We picked up spores randomly for germination experiments. We selected regions of the gametophytes randomly for DAPI staining.
Blinding We sequenced a single hornwort plant, and no control group is referred here. Blinding is not applicable in this study.
Clinical data