You are on page 1of 17

The Plant Journal (2018) 95, 487–503 doi: 10.1111/tpj.

13964

Structural variation and rates of genome evolution in the


grass family seen through comparison of sequences of
genomes greatly differing in size
Jan Dvorak1,*, Le Wang1, Tingting Zhu1, Chad M. Jorgensen1, Karin R. Deal1, Xiongtao Dai2, Matthew W. Dawson2,
Hans-Georg Mu € ller2, Ming-Cheng Luo1, Ramesh K. Ramasamy1, Hamid Dehghani1,3, Yong Q. Gu4 , Bikram S. Gill5,
Assaf Distelfeld , Katrien M. Devos , Peng Qi , Frank M. You , Patrick J. Gulick and Patrick E. McGuire1
6 7,8 7,8 9 10

1
Department of Plant Sciences, University of California, Davis, CA, USA,
2
Department of Statistics, University of California, Davis, CA, USA,
3
Department of Plant Breeding, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran,
4
Crop Improvement & Genetics Research, USDA-ARS, Albany, CA, USA,
5
Department of Plant Pathology, Kansas State University, Manhattan, KS, USA,
6
School of Plant Sciences and Food Security, Tel Aviv University, Tel Aviv, Israel,
7
Institute of Plant Breeding, Genetics and Genomics (Department of Crop & Soil Sciences), University of Georgia, Athens,
GA, USA,
8
Department of Plant Biology, University of Georgia, Athens, GA, USA,
9
Agriculture & Agri-Food Canada, Morden, MB, Canada, and
10
Department of Biology, Concordia University, Montreal, QC, Canada

Received 18 February 2018; revised 4 May 2018; accepted 8 May 2018; published online 16 May 2018.
*For correspondence (e-mail jdvorak@ucdavis.edu).

SUMMARY
Homology was searched with genes annotated in the Aegilops tauschii pseudomolecules against genes
annotated in the pseudomolecules of tetraploid wild emmer wheat, Brachypodium distachyon, sorghum
and rice. Similar searches were performed with genes annotated in the rice pseudomolecules. Matrices of
collinear genes and rearrangements in their order were constructed. Optical BioNano genome maps were
constructed and used to validate rearrangements unique to the wild emmer and Ae. tauschii genomes.
Most common rearrangements were short paracentric inversions and short intrachromosomal transloca-
tions. Intrachromosomal translocations outnumbered segmental intrachromosomal duplications. The densi-
ties of paracentric inversion lengths were approximated by exponential distributions in all six genomes.
Densities of collinear genes along the Ae. tauschii chromosomes were highly correlated with meiotic recom-
bination rates but those of rearrangements were not, suggesting different causes of the erosion of gene
collinearity and evolution of major chromosome rearrangements. Frequent rearrangements sharing break-
points suggested that chromosomes have been rearranged recurrently at some sites. The distal 4 Mb of the
short arms of rice chromosomes Os11 and Os12 and corresponding regions in the sorghum, B. distachyon
and Triticeae genomes contain clusters of interstitial translocations including from 1 to 7 collinear genes.
The rates of acquisition of major rearrangements were greater in the large wild emmer wheat and
Ae. tauschii genomes than in the lineage preceding their divergence or in the B. distachyon, rice and sor-
ghum lineages. It is suggested that synergy between large quantities of dynamic transposable elements
and annual growth habit have been the primary causes of the fast evolution of the Triticeae genomes.

Keywords: synteny, collinearity, annual, translocation, inversion, chromosome rearrangement.

INTRODUCTION
The rates of gene evolution are typically measured in the et al., 1985). Measuring the rate of change at the level of
numbers of nucleotides substituted per unit of time, which the entire nuclear genome is more complicated because
is embodied in the molecular clock concept (Hasegawa the eukaryotic genome is not a homogeneous entity. It

© 2018 The Authors 487


The Plant Journal © 2018 John Wiley & Sons Ltd
488 Jan Dvorak et al.

consists of genes, transposable elements, satellites and wheat genome for comparative genomics because they
other components, each evolving with its own pace and have not been impacted by artificial selection that accom-
mechanism. panied wheat domestication and improvements.
Even evolution of gene space may not be homogeneous. The tetraploid genome of wild emmer wheat consists of
Individual genes are subjected to duplications, deletions, the A genome, which was contributed by diploid einkorn
errors in recombination and transpositions leading to gene wheat T. urartu and the B genome, which was contributed
copy number variation (Massa et al., 2011) and loss of by an extinct or undiscovered diploid species closely
collinearity of individual genes between homoeologous related to Ae. speltoides (Dvorak and Zhang, 1990; Dvorak
chromosomes (Luo et al., 2017). Homoeologous chromo- et al., 1993). The hexaploid genome of bread wheat is thus
somes are also subjected to rearrangements of blocks of a product of convergence of three diploid Triticeae lin-
collinear genes, such as inversions, translocations, dele- eages, those of the A, B and D genomes. Their divergence
tions and duplications, resulting in structural variation. We was estimated to have occurred between 1.4 and 4.5 mil-
will consider erosion of collinearity and chromosome rear- lion years ago (MYA) (Huang et al., 2002; Dvorak and
rangements as two discrete forms of genome evolution Akhunov, 2005; Gornicki et al., 2014; Middleton et al.,
throughout this paper. 2014; Bernhardt et al., 2017), although a time as ancient as
Initial studies of plant genome evolution were based on 7 MYA has also been suggested (Marcussen et al., 2014).
comparative genetic maps. Although the utility of genetic An average divergence time of 3 MY will be used through-
maps was limited by their low resolution, they neverthe- out this study. An array of approaches has been used to
less provided many important insights into genome evolu- reconstruct the phylogeny of the three lineages. Phyloge-
tion in plants (Gale and Devos, 1998; Ming et al., 1998; netic trees placing the A-genome divergence basally (Hsiao
Brubaker et al., 1999; Paterson et al., 2004; Luo et al., 2009, et al., 1995; Yamane and Kawahara, 2005; Li et al., 2015;
2013; Wu and Tanksley, 2010). Today, genetic and physical Bernhardt et al., 2017; Jorgensen et al., 2017) dominate
maps have been largely replaced by genome sequences these efforts.
(Bowers et al., 2003; Tang et al., 2008; Paterson et al., Comparative genomics has suggested three variables
2009; Town et al., 2011). However, with the sole exception that may affect the rates of plant genome evolution: (i) the
of conifer genomes (Birol et al., 2013; Nystedt et al., 2013; genome size and the content of dynamic transposable ele-
Neale et al., 2014, 2017b; Stevens et al., 2016), all ments (TEs), (ii) homologous recombination (HR) and (iii)
sequenced genomes have until recently been relatively generation length (life history):
small. For example, the largest grass genome with a refer-
ence-quality genome sequence has been the paleote- (i) The comparison of a high-resolution genetic map of
traploid, 2.3-Gb maize genome (Schnable et al., 2009). The Ae. tauschii, a species with a large genome (4.3 Gb),
recent publications of genome sequences of barley with the sequences of small genomes of rice (Oryza
(Mascher et al., 2017), Aegilops tauschii (Luo et al., 2017; sativa) and sorghum (Sorghum bicolor), 0.43 and
Zhao et al., 2017) and tetraploid wild emmer wheat (Triti- 0.73 Gb, respectively, (Goff et al., 2002; Paterson et al.,
cum turgidum ssp. dicoccoides, genomes AABB) (Avni 2009) uncovered 50 inversions and translocations (Luo
et al., 2017), with genome sizes ranging from 4.3 to 12 Gb, et al., 2009). Of these, 2, 8 and 40 occured in the rice,
represent a paradigm shift in genome sequencing. These sorghum and Ae. tauschii lineages, respectively, indi-
genome sequences make possible comparisons of related cating that the large Ae. tauschii genome has been
genomes greatly differing in size with the hope that such acquiring inversions and translocations more quickly
comparisons may provide insights into plant genome than the smaller rice and sorghum genomes. Compar-
evolution. isons of the complete genome sequence of the three
Barley, Ae. tauschii and wild emmer wheat are members species produced similar results, although many more
of the tribe Triticeae of the grass family (Poaceae). The lat- rearrangements were discovered (Luo et al., 2017). In
ter two are the closest wild relatives of common (bread) these comparisons, the numbers of rearrangements
wheat (T. aestivum, genomes AABBDD). Aegilops tauschii mirrored the genome sizes and it was therefore tempt-
is the progenitor of the wheat D genome (Kihara, 1944; ing to attribute the rate of genome evolution to gen-
McFadden and Sears, 1946; Nesbitt and Samuel, 1996; ome size (Luo et al., 2009). It was recently pointed out
Wang et al., 2013) and wild emmer wheat is the progenitor that if this were true the conifer genomes, which are
of domesticated tetraploid wheat (T. turgidum, genomes even larger than the Triticeae genomes, should be
AABB). Hybridization of domesticated tetraploid wheat evolving faster than the Triticeae genomes (Luo et al.,
with Ae. tauschii produced hexaploid bread wheat (Dvorak 2017). However, conifers are well known for slow gen-
et al., 2012). The genomes of wild emmer wheat and ome evolution (Nystedt et al., 2013; Neale et al.,
Ae. tauschii together are therefore equivalent to the gen- 2017a). Revealing in this context was the comparison
ome of bread wheat but are preferable over the bread of unique k-mer plots of the Ae. tauschii genome with

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
Rates of genome evolution in grasses 489

those of other genomes including that of the much lar- genomes represented three grass subfamilies: Panicoideae
ger pine genome (Luo et al., 2017). The fast evolving (sorghum), Oryzoideae (rice) and Pooideae (B. distachyon,
Ae. tauschii genome had a lower percentage of unique Ae. tauschii and wild emmer wheat) and the study covered
k-mer sequences than any other genome including the an evolutionary time frame spanning more than 50 million
pine genome, indicating that Ae. tauschii TEs show an years.
unprecedented homogeneity. This analysis suggested When needed, rearrangements were subjected to valida-
that it is the large content of homogeneous TEs, i.e. tion with optical BioNano genome (BNG) maps (Hastie
TEs that have been subjected to recent amplification, et al., 2013). A BNG map consists of contigs assembled
which is critical for the rate of genome evolution (Luo from overlaps among high-molecular-weight DNA mole-
et al., 2017). cules digested with a single-strand restriction endonucle-
(ii) Meiotic HR is distally located in the Triticeae genomes ase (nickase). The nicks are labeled with fluorescent
and its rate precipitously declines in the proximal nucleotides and distances between them are optically mea-
direction (Akhunov et al., 2003; Luo et al., 2013, 2017; sured on stretched DNA molecules (Xiao et al., 2007).
Avni et al., 2017). The distal, high HR regions of Trit- Because a BNG map is produced independently of genome
iceae genomes are enriched for non-collinear genes sequencing, it can be used to validate a genome sequence
that originated by gene duplications and gene dele- assembly (Hastie et al., 2013; Luo et al., 2017).
tions (Luo et al., 2013, 2017). Meiotic HR rate and the Validated rearrangements were assigned to branches of
content of Ae. tauschii genes collinear with genes in a grass phylogenetic tree (International Brachypodium
the B. distachyon, rice and sorghum genomes were Genome Initiative, 2010) and rates with which rearrange-
negatively correlated (Luo et al., 2013, 2017). Likewise, ments were accumulated in individual branches per MY
the rates of gene deletions and gene duplications were computed. To assess the extent to which these rate
along the centromere–telomere axes of the chromo- estimates were affected by the phylogenetic distances
somes of diploid and polyploid Triticeae species were among compared genomes, a dataset of homology
correlated with recombination rates (Dvorak and Akhu- searches with 42 004 HC genes annotated in the rice pseu-
nov, 2005). domolecules against genes annotated in the Ae. tauschii,
(iii) The relationship between generation length and the B. distachyon and sorghum pseudomolecules produced
rate of the molecular clock is well known (Kohne, 1970; earlier (Luo et al., 2017) and augmented here with similar
Martin and Palumbi, 1993; Andreasen and Baldwin, homology searches against genes annotated in the wild
2001; Tuskan et al., 2006; Smith and Donoghue, 2008; emmer pseudomolecules (Avni et al., 2017) was analyzed.
Luo et al., 2015). The same variable may also affect the The rates of genomic changes in terms of the accumula-
rate of genome evolution. Luo et al. (2015) compared tion of major rearrangements per MY were computed and
several criteria of genome divergence in three dicotyle- compared with those obtained using the Ae. tauschii HC
donous woody perennials and three dicotyledonous genes as queries in homology searches. The two sets of
herbs. The genomes of the woody perennials appeared estimates of rates of genomic changes were jointly consid-
to be diverging more slowly than those of the herbs ered in relation to genome size, recombination rate and life
for each criterion used. history including generation time.

Here, we used 38 775 high-confidence (HC) genes anno- RESULTS


tated in the Ae. tauschii pseudomolecules (Luo et al., 2017)
Genome comparisons
in homology searches against genes annotated in the
pseudomolecules of the A and B genomes of the wild In each genome comparison it was determined for each
emmer wheat (Avni et al., 2017) in order to assess gene target gene if it was in a collinear position relative to its
collinearity and discover rearrangements that have taken neighbors on the pseudomolecule. If the determination
place in these genomes since their divergence. Based on a was affirmative, the gene was considered to be at a colli-
wild emmer genome size of 12 Gb (Avni et al., 2017) and near location in that genome comparison. If a block of col-
the sizes of individual wheat chromosome arms (Dvorak linear genes along homoeologous chromosomes was in a
et al., 1984), we estimate the size of the wheat A genome different orientation or place, that genomic change was
to be about 5.2 Gb and that of the B genome about 6.3 Gb. considered a chromosome rearrangement and its cause
The results of these homology searches were combined (inversion, translocation, duplication, etc.) was determined.
with similar searches against the small B. distachyon gen-
Gene collinearity
ome (0.27 Gb; International Brachypodium Genome Initia-
tive, 2010) and those of rice and sorghum (Luo et al., To quantify gene collinearity, 38 775 HC genes annotated
2017), to quantify changes in gene collinearity and chro- in the Ae. tauschii pseudomolecules v4.0 (Luo et al., 2017)
mosome rearrangements among the six genomes. The were used as queries in homology searches against genes

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
490 Jan Dvorak et al.

annotated in the pseudomolecules of the A and B genomes in the proximal, low-recombination regions of the
of wild emmer wheat and results were combined with sim- Ae. tauschii pseudomolecules, about 70% of the genes and
ilar searches against genes annotated in the pseudo- declined towards the distal, high-recombination regions to
molecules of B. distachyon, rice and sorghum performed about 20 to 30% of the genes (Figure 1a). The numbers of
earlier (Luo et al., 2017). The top hits in wild emmer wheat, collinear genes along the wild emmer wheat pseudo-
B. distachyon, rice and sorghum were recorded and genes molecules were negatively correlated with meiotic HR rates
collinear with genes along the Ae. tauschii pseudo- computed by Luo et al. (2017). For the wild emmer A and
molecules in these five genomes were color coded in a B genomes, Pearson correlation coefficients were
resulting matrix (Table S1). The highest numbers of genes r = 0.59 and 0.58, respectively (P < 0.0001, N = 181). In
collinear with the Ae. tauschii HC genes were in the wild homoeologous groups 1, 2 and 3 and the long arms of 7A
emmer wheat A and B genomes, respectively, 19 370 and and 7B, the numbers of genes collinear with the
19 415 genes (50.0 and 50.0%) and the lowest number was Ae. tauschii pseudomolecules were significantly greater in
in the sorghum genome, 13 066 (33.7%) genes (Table 1). the B genome than in the A genome (Figure 1a). The differ-
To assess to what extent these numbers were affected ences between the A- and B-genome homoeologues in the
by the choice of the query genome, a dataset based on remaining homoeologous groups were not statistically
homology searches with 42 004 HC genes annotated in the significant.
rice pseudomolecules v7.0 against genes annotated in the Dot plots were constructed to visualize synteny of the
Ae. tauschii, B. distachyon and sorghum pseudomolecules Ae. tauschii pseudomolecules with those of wild emmer
was downloaded from Luo et al. (2017) and combined with wheat (Figure 1b). Except for 4A and the tips of 5AL and
similar searches against genes annotated in the wild 7BS, which harbor reciprocal translocations, dot plots
emmer pseudomolecules performed here. Results aligned the homoeologous pseudomolecules along their
obtained with this dataset (Table S2) and those obtained entire lengths. Paracentric inversions appeared to be the
using genes annotated in the Ae. tauschii pseudo- principal type of chromosome rearrangement shown by
molecules as queries (Table S1) were similar (Table 1). the dot plots. They were apparent in arms 1AS, 1BL, 2AL,
The numbers of collinear genes obtained when the rice 4AL, 4BL, 5BL, 7AL and 7BS. Duplications were apparent
genes were used as queries and the A-, B- and Ae. tauschii between the long arms of the chromosomes of homoeolo-
genome genes were used as targets ranged from 13 520 gous groups 1 and 3 and 2 and 6 and were consistent with
(32.2%) to 12 828 (30.5%). These numbers were close to major self-synteny blocks within the Ae. tauschii genome
13 401 (34.6%) genes found to be collinear in the homol- (Luo et al., 2017) due to a pan-grass whole genome dupli-
ogy searches using the Ae. tauschii genes as queries and cation (WGD) (Paterson et al., 2004).
rice genes as targets (Table 1). In total, 1989 rearrangements were detected in the six
The numbers of genes collinear with the Ae. tauschii genomes. For 1126 of these, the presence of the rearrange-
genes along wild emmer pseudomolecules were quantified ment could be assessed in all genomes, making it possible
in non-overlapping windows of 200 genes. Expressing to assign it to a specific phylogenetic branch (Table S1).
gene collinearity per gene rather than per Mb made the This could not be accomplished for the remaining 863 rear-
collinearity estimates along chromosomes independent of rangements and these were not assigned to a phylogenetic
the variation in gene density that exists in the Ae. tauschii branch. A vast majority of these unassigned rearrange-
genome (Luo et al., 2017). Disregarding the centromeric ments involved two (686) or three (80) genes. Only 46 rear-
regions (gaps in the profiles), the highest collinearity was rangements that could not be assigned to a branch were
major rearrangements, involving four or more genes.
Table 1 Numbers and percentages of genes collinear in the target
Based on this analysis, only rearrangements involving four
genomes with genes annotated in the Ae. tauschii or O. sativa or more genes were considered in the estimation of gen-
pseudomolecules ome evolution rates.
The rearrangements assigned to branches of the phylo-
Ae. tauschii
genetic tree were classified as inversions, intrachromoso-
genes O. sativa genes
mal translocations and interstitial or terminal
Target genome No. % No. % interchromosomal translocations, duplications and nested
chromosome insertions (NCIs) and by the number of colli-
Wild emmer A 19 370 50.0 13 013 31.0
Wild emmer B 19 415 50.0 12 828 30.5 near genes involved (2, 3 and >3). An NCI is an insertion of
Ae. tauschii – – 13 520 32.2 an entire chromosome into another chromosome (Luo
B. distachyon 14 601 37.9 16 863 40.1 et al., 2009). For each inversion and translocation, the rear-
O. sativa 13 401 34.6 – – rangement type, the starting nucleotides of the first and
S. bicolor 13 066 33.7 16 503 39.3
last Ae. tauschii HC gene involved in the rearrangement
Query genes (total) 38 775 42 004
and the branch of the grass phylogenetic tree in which the

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
Rates of genome evolution in grasses 491

Figure 1. Gene collinearity.


(a) Quantification of collinearity along the Ae. tauschii pseudomolecules (horizontal axes) with genes annotated in the A- and B-genome pseudomolecules of
wild emmer wheat. Each profile presents the number of collinear genes in a series of 200-gene non-overlapping windows (vertical axis). The P-value (paired t-
test) for the entire chromosome is indicated at the lower right for each pair of pseudomolecule profiles. In addition, for the profiles of 7A and 7B, the P-values of
the short and long arms are also shown. If P < 0.05, the B-genome profile was more closely related to that of the Ae. tauschii genome profile than was the A-
genome profile. The gaps in the profiles are the locations of centromeric regions, in which collinearity was not investigated.
(b) Dot plots comparing the 14 wild emmer wheat pseudomolecules with the seven of Ae. tauschii. Each dot consists of a sequence of three or more collinear
genes. The plots are oriented with the tips of the short arms to the left (x-axis) and bottom (y-axis). Large gaps in the profiles are centromeric regions.

rearrangement originated can be found in Table S1. There date, Ae. tauschii rearrangements have been validated
were 176, 158 and 125 rearrangements unique to the A and with four BNG maps.
B genomes of wild emmer wheat and the genome of Of the 421 wild emmer wheat and Ae. tauschii rear-
Ae. tauschii, respectively. rangements subjected to validation with BNG maps, 297
To determine if the large numbers of rearrangements were validated (Figure 2b). Rearrangements that failed vali-
unique to the three Triticeae genomes were real or errors dation were left in Table S1, but their failure to be vali-
in genome sequence assemblies, unique rearrangements dated was indicated and they were excluded from further
involving three or more collinear genes in the A, B and analyses. Altogether, 938 inversions, 161 translocations, 13
Ae. tauschii genomes were validated with BNG maps (Fig- NCIs, three deletions and 11 segmental duplications
ure 2a). Rearrangements involving only two collinear remained assigned to individual branches of the grass phy-
genes could not be validated because they were usually logenetic tree (Tables S1 and S3). The length measure-
below the resolution of the BNG map. They were excluded ments (Table S1) of all these rearrangements among the
from the subsequent analyses. A BNG map for wild emmer five taxa in the phylogenetic tree were gauged on the dis-
wheat accession Zavitan, used in wild emmer wheat tances between genes in the Ae. tauschii pseudo-
sequencing, was constructed (Table 2) to validate wild molecules.
emmer wheat rearrangements. BNG maps for Ae. tauschii
Inversions
accessions AS75 and CIae 1 were constructed (Table 2)
and added to the existing BNG maps for Ae. tauschii Of the inversions involving ≥3 collinear genes, 167
accessions AL8/78 and CIae 23 (Luo et al., 2017). Thus, to involved three collinear genes and 460 involved >3

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
492 Jan Dvorak et al.

Figure 2. Validation of inversions with a BioNano genome (BNG) map.


(a) Examples of confirmed (top) and not confirmed (bottom) inversions in the wild emmer wheat A genome. If an inversion deduced by comparison with other
genome sequences is confirmed, the sequence scaffold and BNG map will be identical as the inversion will be present in both. If an inversion is not confirmed,
the scaffold and BNG map will differ, since the inversion is an artefact of assembly and is present only in the sequence scaffold. The green rectangles symbolize
genome fragments and the blue rectangles symbolize BNG contigs. The red lines between the sequence fragments and the BNG contigs connect corresponding
restriction sites and the blue numbers are sequence coordinates in Mb. The four vertical red arrows in each alignment indicate the positions of genes used for
rearrangement validation. The two internal genes are within a putative inversion close to the breakpoints and the two external genes are outside of the inver-
sion. In the top alignment, the order of restriction sites in the region of the 4A pseudomolecule marked by the four genes was collinear with the order of restric-
tion sites in the BNG contig, thus validating the inversion detected in the sequence. The bottom alignment shows a disagreement between the sequence and
BNG map. The order of restriction sites in the region of the pseudomolecule marked by the four genes was inverted relative to their order in the BNG contig,
indicating an error in the assembly of the 3A pseudomolecule.
(b) Percentages of validated inversions in the wild emmer wheat A and B genomes and the Ae. tauschii (Aet) genome. Percentage values sharing the same letter
(above each column) are not significantly different at the 5% significance level (paired t-test with Bonferroni correction).

Table 2 Characteristics of BNG maps of


Parameter Zavitan AS75 CIae 1 wild emmer wheat and Ae. tauschii
Nickase enzyme Nt.BspQI Nt.BspQI Nt.BspQI
No. DNA molecules put into contigs 3 338 157 1 197 206 1 179 029
N50 DNA molecule length (Kb) 340.5 349.8 343.3
Minimum DNA molecule length (Kb) 180 180 180
Total length of DNA molecules (Gb) 1101 403.8 393.2
No. of genome equivalents 110 96 93
No. contigs 7098 3603 5507
Total length of map (Mb) 10 675 4292 4690
N50 of map (Mb) 2.23 1.72 1.12

Zavitan is an accession of wild emmer wheat and AS75 (China) and CIae 1 (Pakistan) are
accessions of Ae. tauschii ssp. tauschii.

collinear genes (Table S1). There were 115, 91, 55, 103, 36 shorter in the large genomes than in the small genomes,
and 50 paracentric inversions involving ≥3 collinear genes but the sample size was insufficient to test statistically the
unique to the A and B genomes of wild emmer wheat and hypothesis. The densities of the genome-unique paracen-
the genomes of Ae. tauschii, B. distachyon, rice and sor- tric inversion lengths were closely approximated by expo-
ghum, respectively (Tables S3 and 3). The mean lengths of nential distributions in all six genomes (Figure 3). In the
these genome-unique inversions ranged from 8.0 Mb (me- rice genome, this relationship was accounted for by a sin-
dian = 1.0 Mb) in the sorghum genome to 1.6 Mb (me- gle distribution and in the remaining five genomes two dis-
dian = 0.6 Mb) in the Ae. tauschii genome, indicating that tributions were necessary (Figure 3).
most were short (Figure 3). These measurements were The following analyses were made to assess the distri-
based on the Ae. tauschii pseudomolecules and are there- bution of paracentric inversions in relation to meiotic HR
fore comparable. The median inversion lengths were rates. First, the location of the most distal gene within each
© 2018 The Authors
The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
Rates of genome evolution in grasses 493

Table 3 Numbers and rates of inversions and intrachromosomal translocations for branches of the grass phylogenetic tree

Intrachr. transloc.
Branch Time (MY) Inv. 2 genes Inv. 3 genes Inv. > 3 genes >3 genes All minor rearrang. All major rearrang.

A 3 36 (12.00b,c) 25 (8.33b,c) 91 (30.33c) 13 (4.33b) 69 (23.00b) 107 (35.67c)


B 3 32 (10.67a,b) 25 (8.33b) 65 (21.67c) 11 (3.67b) 76 (25.67b) 81 (27.00c)
Aet 3 49 (16.33c) 11 (3.67a,b) 44 (14.67b) 7 (2.33a,b) 64 (21.33b) 61 (20.33b)
AB ? 3 2 6 0 5 8
AetA ? 3 2 4 0 5 4
AetB ? 1 0 10 1 1 12
AetBA 32 28 (0.88a) 32 (1.00a) 76 (2.38a) 9 (0.28a) 69 (2.16a) 98 (3.06a)
AetABBd 12 12 (1.00a) 15 (1.25a) 29 (2.42a) 4 (0.33a) 28 (2.33a) 36 (3.00a)
Bd 35 61 (1.73a,b) 21 (0.60a) 82 (2.34a) 15 (0.43a) 91 (2.60a) 105 (3.00a)
Os 47 28 (0.60a) 16 (0.34a) 20 (0.43a) 4 (0.06a) 46 (0.98a) 28 (0.60a)
Sb 53 58 (1.09a) 17 (0.32a) 33 (0.62a) 16 (0.30a) 77 (1.45a) 54 (1.02a)
Total 309 158 444 77 520 592

inv., inversion; intrachr., intrachromosomal; transloc., translocation; rearrang., rearrangements.


Rates are per million years (in parentheses) for each type of structural change. Rates sharing the same letter suffix are not significantly dif-
ferent at the 5% significance level. See Figure 4 for identity of phylogenetic tree branches. Time estimates (column 2) are means of ranges
reported in (International Brachypodium Genome Initiative, 2010). For the divergence time of the A, B and Ae. tauschii genomes, see
Introduction.

inversion involving >3 collinear genes in the Ae. tauschii of a breakpoint to the level of nucleotide sequences. To
and the wild emmer wheat A and B genomes was deter- study this phenomenon, we scored double inversions con-
mined on the Ae. tauschii pseudomolecules. The number sisting of two inversions, one nested within the other, while
of inversion starting points in the three genomes per 200- sharing a breakpoint. For instance, a region of wild emmer
gene window was counted and used as a variable in com- pseudomolecule 1A involving markers AET1Gv20576000
puting the Pearson correlation coefficient. The other vari- and AET1Gv20583900 is inverted and markers that are
able was the average HR rate in the 200-gene window (Luo ascending on the 1D pseudomolecule are descending on the
et al., 2017). No correlation between HR rate and the distri- 1A pseudomolecule. We name this inversion InvA
bution of the inversions in the three genomes was (AET1Gv20576000-83900) using a naming convention
detected (r = 0.03, P = 0.64, N = 181). described in Experimental procedures. The progression of 3
The problem with this analysis is that there were 863 and 4 collinear markers at each end of this inversion is
rearrangements, mostly inversions, which could not be ascending indicating the presence of two additional inver-
assigned to a lineage. The failure to include these inver- sions. Each of these inversions, InvA (AET1Gv20576000-200)
sions into the correlation analysis could have biased the and InvA (AET1Gv20583000-900), share one breakpoint with
data and have caused the absence of correlation. To assess inversion InvA(AET1Gv20576000-83900) suggesting that the
that, the unassigned inversions that appeared to be located breakpoints that produced the major inversion also pro-
in the B. distachyon, rice and sorghum genomes were duced the two additional inversions. We analyzed 363 inver-
excluded and the rest was subdivided to those involving sions involving >3 collinear genes for the existence of these
two collinear genes and >2 collinear genes, both groups double inversions with the restriction that the nested inver-
were combined with the original dataset and Pearson cor- sion involved ≥3 collinear genes. We found 22 (6%) of such
relation coefficients were re-computed. For the dataset cases (Table 4).
including unassigned inversions involving two collinear
Translocations
genes r = 0.32 (P < 0.0001, N = 181) and for the dataset
including inversions involving >2 collinear genes r = 0.15 In total, 79 interstitial intrachromosomal translocations, 19
(P = 0.04, N = 181). The increase in the size of correlation interstitial interchromosomal translocations, 9 terminal
coefficients indicated that long inversions (4 or more interchromosomal translocations and 13 NCIs were allo-
genes) do not correlate with HR but short inversions, prin- cated to branches of the phylogenetic tree (Table S3). Most
cipally those involving two genes, do correlate with HR. of the intrachromosomal translocations involving 3 or
The rearrangement dataset in Table S1 showed cases in more collinear genes were short. In the A and B genomes
which two different inversions shared a common break- of wild emmer wheat and the Ae. tauschii genome, they
point, defined as an interval between neighboring collinear averaged about 7.6 Mb (median = 0.6 Mb), 2.1 Mb (me-
genes, one external and one internal to an inversion break- dian = 0.5 Mb) and 1.7 Mb (median = 0.1 Mb), respectively
point. Because the data were based on comparison of (too few of them were detected in the remaining genomes
diverged genomes it was impossible to refine the location to make their statistical treatment meaningful).

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
494 Jan Dvorak et al.

A B Aet
125
μ1 = 0.93 Mb μ1= 0.93 Mb μ1 = 0.80 Mb
μ2 = 8.87 Mb μ2 =11.91 Mb μ2= 8.44 Mb
α = 0.71 α = 0.87 α = 0.89
100

75

3.22 Mb b 2.48 Mb ab 1.61 Mb a


50 1.05 Mb 0.88 Mb 0.63 Mb

25

0
Count

Bd Os Sb
125
μ1 = 0.21 Mb μ1 = 0.96 Mb
μ2 =24.49 Mb μ2 =36.01 Mb
α = 0.98 α = 0.80
100

75

2.60 Mb b 2.37 Mb ab 8.01 Mb b


50
1.25 Mb 1.28 Mb 1.03 Mb

25

0
0 50 100 150 0 50 100 150 0 50 100 150
Inversion length in Mb

Figure 3. Paracentric inversion lengths.


Densities of the lengths of inversions involving 3 or more collinear genes measured in Mb in the wild emmer wheat A and B genomes and the genomes of Aegi-
lops tauschii (Aet), Brachypodium distachyon (Bd), rice (Os) and sorghum (Sb). The inversion lengths were scaled and gauged on the distances between genes
in the Ae. tauschii pseudomolecules. The curves in red show mixtures of two exponential distributions fitted to the data. A single exponential distribution fit the
rice data. The mean inversion length (top number) and the median (bottom number) are given for each graph. A Kolmogorov–Smirnov test indicated that the
data deviated significantly from normality (P ≤ 0.01). They were therefore transformed (see Experimental procedures) and analyzed with ANOVA for a completely
randomized design which showed a significant genome effect (P ≤ 0.10). Means followed by the same letter do not differ at the 5% significance level (Duncan’s
test). Also given in each graph are the means (l1 and l2) for the mixture components and the mixture proportion (a).

An intrachromosomal translocation of a chromosome TAetBA (AET4Gv20018000-9500; AET5Gv20867400-600), was


segment could either leave the original segment in place shared by the three Triticeae genomes. The 4AL/5AL and
or not, depending on the nature of the translocation pro- 4AL/7BS reciprocal translocations, TA(AET4Gv20754000-
cess and subsequent recombination of the chromosome. 5100; AET5Gv21126000-200) and TAB(AET5Gv21238100-
The former results in a segmental intrachromosomal dupli- 9800; AET7Gv20264500-800), respectively, were unique to
cation whereas the latter is a true translocation. Resolution the wild emmer genome.
was sufficient only for the analysis of translocations Clusters of translocations were observed between rice
unique to the wild emmer wheat and the Ae. tauschii gen- chromosomes Os11 and Os12, sorghum chromosomes
omes. Of the 26 characterized, 7 (28%) were intrachromo- Sb5 and Sb8, Triticeae chromosomes 4 and 5 and within
somal duplications whereas the rest were true B. distachyon chromosome Bd4. The translocations
translocations. involved regions corresponding to the distal 4 Mb of rice
There were three reciprocal translocations 4DS/5DL, 4AL/ chromosomes Os11 and Os12. The numbers of the translo-
5AL and 4AL/7BS, all terminal, in the wild emmer and cations ranged from 84 in sorghum to four in the three Trit-
Ae. tauschii genomes. The 4DS/5DL reciprocal translocation, iceae genomes (Table 5). The translocated segments were

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
Rates of genome evolution in grasses 495

Table 4 Double inversions indicating recurrent chromosome and the A genome, the Ks values should not differ. This
breaking in individual branches of the phylogenetic tree hypothesis was tested using two AetA inversions involving
Branch N Recurrent Percent eight genes and seven AetB inversions involving 107 genes
(Table S5). No evidence was found for differences between
A 77 7 9.1 the Ks values in the AetA inversions. For the AetB inver-
B 59 2 3.4
sions, Ks values between the Ae. tauschii and the B-gen-
Aet 39 1 2.6
AB 4 0 0.0 ome haplotypes were significantly smaller than the Ks
AetA 1 0 0.0 values between the Ae. tauschii haplotypes and the corre-
AetB 9 1 11.1 sponding A-genome alleles (Table 6). The Ks values were
AetBA 57 6 10.5 therefore consistent with the divergence of the A, B and
AetABBd 22 1 4.5
Ae. tauschii genomes by bifurcations as shown in
Bd 56 2 3.6
Os 13 0 0.0 Figure 4.
Sb 26 2 7.9
Rates of genome evolution
Total 341 22
Inversions, translocations and duplications involving >3
collinear genes and NCIs were combined into a class of
uniformly small, involving from 1 to 7 collinear genes major rearrangements (Tables 3 and S3). The greatest num-
(Table S4). The density of the number of genes per translo- ber of them was present in the A genome (107) and the
cation was approximated by Weibull distribution smallest number was in the rice genome (28) (Table 3). To
(P < 0.01). Distribution fit for individual chromosomes are estimate the rate of change in each branch of the phyloge-
in Table S4. The number of genes per translocated seg- netic tree, the number of major rearrangements was
ment decreased in the proximal direction. Pearson correla- divided by the length of a branch in MY (Table 3). The high-
tion coefficients of the number of collinear genes per est rates were in the A-, B- and Ae. tauschii genome
translocated segment and the distance from the terminus branches, 35.7, 27.0 and 20.3 rearrangements per MY,
ranged from 0.38 to 0.17 (Table S4). These transloca- respectively and the lowest rates were in the rice and sor-
tions were considered exceptional and were not included ghum branches, 0.6 and 1.0 major rearrangement per MY,
into rearrangement quantifications. respectively (Figure 4). The rates in the subfamily Pooideae
were higher than those in Panicoideae and Oryzoideae (Fig-
Phylogeny of the A, B and Ae. tauschii genomes
ure 4 and Table 3). The rate in the B. distachyon branch
We employed 460 inversions involving >3 collinear genes was 3.0 major rearrangements per MY, which was similar
in the reconstruction of the phylogeny of the three Trit- to the rate in the Pooideae branches preceding the diver-
iceae genomes. The resulting tree had a topology similar gence of Brachypoideae and Triticeae (3.0 major rearrange-
to the tree reported earlier (International Brachypodium ments per MY) and preceding the radiation of the A, B and
Genome Initiative, 2010) with the difference that this one Ae. tauschii genomes (3.1 major rearrangements per MY).
included three Triticeae genomes. The branching indicated The slowest rates of genomic change were in the
that the A genome diverged prior to the divergence of the branches most distant from Ae. tauschii (Figure 4). This
B and Ae. tauschii genomes (Figure 4). fact raised a legitimate concern whether the ability to
This pattern of dichotomous genome divergence pre- assign a rearrangement to a phylogenetic branch did not
dicted that the numbers of synonymous nucleotide substi- decline with the phylogenetic distance and was not the
tutions (Ks) of the Ae. tauschii alleles within inversions cause of the variation in the rates among the branches of
shared with the B-genome (category AetB) will be smaller the tree. We therefore reversed the direction of the analysis
than Ks between the same Ae. tauschii alleles and the A- and analyzed collinearity of Ae. tauschii, wild emmer,
genome alleles. For inversions shared by the Ae. tauschii B. distachyon and sorghum genes using the 42 004 rice

Table 5 Numbers of translocations and translocated genes in chromosome segments corresponding to the distal ends of short arms of rice
chromosomes Os11 and Os12 in the three Triticeae genomes and the genomes of B. distachyon, rice and sorghum

Translocations (no.) Genes (no.)

Genome Os12 homoeologue Os11 homoeologue Os12 homoeologue Os11 homoeologue

Triticeae 0 4 0 4
B. distachyum 26 11 35 13
Rice 28 47 54 104
Sorghum 35 49 59 103

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
496 Jan Dvorak et al.

rearrangements assigned to the branches of the tree using


the Ae. tauschii genes as queries (P = 0.001, two-tailed
paired t-test, N= 11). The rates were the most different in
the three Triticeae branches (Figure 4). In the wild emmer
A-genome branch and the Ae. tauschii branch the reduc-
tions were statistically significant (P = 0.008 and P = 0.01,
respectively, two-tailed paired t-test, N = 7). In the rest of
the tree, the differences were minor and not statistically
significant except for the branch preceding the divergence
of Triticeae and Brachypodieae (branch AetABBd)
(P = 0.01, two-tailed paired t-test, N = 7). Despite these dif-
ferences, the analyses were consistent in showing more
than an order of magnitude faster rate of genomic change
in the Triticeae genomes compared to the rice and sor-
ghum genomes and faster rate in the Pooideae branches
than in the Panicoideae and Oryzoideae branches.

DISCUSSION
Figure 4. Rates of genome evolution.
Rates of genomic change in terms of the number of major chromosome rear- Rates of genome evolution
rangements per MY during the grass family radiation. The phylogenetic tree
is from International Brachypodium Genome Initiative (2010) and rebuilt to Searches for homology using the Ae. tauschii HC genes as
include the wild emmer and Ae. tauschii genomes. A and B represent the queries uncovered 594 major rearrangements that could
wild emmer wheat genomes and Aet, Bd, Os and Sb are the Ae. tauschii,
B. distachyon, rice and sorghum genomes, respectively. Bootstrap confi- be allocated into specific branches of the grass phyloge-
dence is specified by bold black numbers to the right of branches. The num- netic tree. Of them, 247 (41.7%) were unique to one of the
bers in red are the rates of acquisition of major rearrangements by individual three Triticeae genomes and have evolved in the past 3
branches. The top red numbers are rates based on using genes annotated in
the Ae. tauschii genome as queries in homology searches and the bottom MY. Similar results were obtained using rice genes as
red numbers are rates based on using genes annotated in the rice genome queries in searches for homology. They uncovered 415
as queries in homology searches. Divergence time in MY is given for each major rearrangements that could be allocated into the
node. The grass subfamily is indicated for each branch.
branches of the phylogenetic tree and 139 (33.5%) of them
were unique to one of the three Triticeae genomes.
genes as queries in homology searches, as described in Because major rearrangements unique to the Triticeae
the section Gene collinearity (Table S2). In total, 415 major genomes were validated with BNG maps in both datasets,
rearrangements were assigned to the phylogenetic tree it is unlikely that this excess of rearrangements in the Trit-
branches, which was significantly fewer than the 594 major iceae genomes was caused by errors in sequence

Table 6 Average Ks values for genes within shared inversions

Ks

Inversion Chrom. Start (bp) End (bp) Genes (no.) A versus B D versus A D versus B

AetA 1D 4 200 706 4 378 346 5 0.165 0.105 0.110


AetA 1D 60 675 934 60 758 258 3 0.088 0.080 0.083
AetB 1D 54 678 481 55 665 632 8 0.091 0.083 0.088
AetB 2D 77 874 727 78 733 204 4 0.082 0.068 0.062
AetB 2D 629 018 334 629 304 845 4 0.084 0.087 0.084
AetB 3D 96 990 346 109 275 480 46 0.128 0.116 0.089
AetB 3D 109 641 478 118 437 345 41 0.115 0.100 0.067
AetB 4D 17 690 222 17 830 447 5 0.135 0.125 0.095
AetB 7D 73 724 397 73 980 729 2 0.161 0.124 0.082
AetA Mean 0.137a 0.096a 0.100a
AetB Mean 0.118a 0.105a 0.073b

The inversions are identified by chromosome location and starting and ending coordinates. The Ks values are for collinear A-, B- and Aet-
genome genes located within the inversions shared by the Ae. tauschii branch and the A-genome branch (AetA) and by the Ae. tauschii
branch and the B-genome branch (AetB). Means with the same letter suffix in a row are not statistically significant at the 5% probability
level.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
Rates of genome evolution in grasses 497

assembly. Furthermore, the same excess of major rear- annuals whereas their progenitors were perennial (Hsiao
rangements in the Ae. tauschii lineage was evident from et al., 1995). In annual grasses, the generation time is typi-
genetic maps (Luo et al., 2009, 2013). cally 1–2 years (Wilson et al., 1990). Estimates of the gen-
In a previous study (Luo et al., 2017), the rearrange- eration time in perennial grasses are hard to obtain and
ments unique to the Ae. tauschii genome were allocated to undoubtedly vary among species and among populations.
the 35-MY time span, covering the time since the diver- An anecdotal number for a population of Brachypodium
gence of Triticeae and Brachypodieae to the present. That sylvaticum, a perennial grass, suggests that the generation
produced a rate of 5.1 major rearrangements per MY. By time is at least an order of magnitude greater than that of
inserting the wild emmer wheat into the analysis, the 35 its annual relatives (Haeggstrom and Skyten, 1996). Long
MY-timeline was subdivided into 32 MY prior to and 3 MY generation time is accompanied by a slow molecular clock
after the radiation of the A, B and Ae. tauschii genomes. rate (Wilson et al., 1990; Tuskan et al., 2006; Smith and
The rate prior to the radiation was 3.1 major rearrange- Donoghue, 2008; Luo et al., 2015) and a slow rate of gen-
ments per MY (2.5 using rice genes as queries). A similar ome evolution (Luo et al., 2015). A dependence of genome
rate, 3.0 major rearrangements per MY (1.9 using rice evolution on generation length was also observed in mam-
genes as queries), was estimated in the branch preceding mals. The quickest rate of the accumulation of chromo-
the divergence of the Triticeae and Brachypodieae lin- some breaks was observed in the mouse lineage, i.e., in a
eages. However, the rate since the radiation was 20.3 lineage involving species with a short generation time
major rearrangements per MY (15.0 using rice genes as (Murphy et al., 2005). Quantification of meiotic chromo-
queries) in the Ae. tauschii lineage. Even if the time of radi- some pairing in Triticeae interspecific hybrids indicated
ation of the three genomes is pushed back to 7 MYA, using that genomic change has been slow in perennial Triticeae
a conservative time estimate (Marcussen et al., 2014), the compared to that in the annual Triticeae (Dvorak and
rate (9.4 major rearrangements per MY) would still be Zhang, 1992). Another factor that could have affected the
greater than the rate preceding radiation of the three gen- rate of genome evolution in the three Triticeae genomes is
omes. Equally fast rates, 35.7 and 27.0 major rearrange- self-pollination, although no relevant data on this relation-
ments per MY (16.3 and 15.0, respectively, using rice ship are available. Based on currently available evidence
genes as queries), were observed in the wild emmer wheat we suggest that the rapid genome evolution in Ae. tauschii
A and B genomes, respectively. and wild emmer was caused by synergy between large
The sizes of the three Triticeae genomes ranged from quantities of active TEs (Luo et al., 2017) and annual repro-
4.3 Gb for Ae. tauschii to 6.3 Gb for the wild emmer B gen- ductive cycle.
ome. The Ae. tauschii and wild emmer genomes were esti- This hypothesis may also account for the relatively quick
mated to contain about 84.4 and 82.2% of TEs (Avni et al., rate of evolution of the small B. distachyon genome. The
2017; Luo et al., 2017). Yet, it seems unlikely that the gen- preceding and present analyses of rearrangements in the
ome sizes and the content of TEs were the sole causes of B. distachyon genome showed that the rate of change of
the great increase in the rates of genome evolution after the B. distachyon genome was intermediate between the
their radiation 3 MYA. All Triticeae genomes are large quick rate in the Ae. tauschii lineage and the slow rate in
(Dvorak, 2009) and likely contain similar percentages of the rice and sorghum lineages (International Brachy-
TEs. Fast genome evolution would therefore be expected podium Genome Initiative, 2010, Luo et al., 2013). Evidence
for the entire tribe if the rates were exclusively a function was provided that the small B. distachyon genome was
of genome size and TE content. If the rate recorded for the derived from a larger genome (Luo et al., 2013). Brachy-
Ae. tauschii branch, 20.3 major rearrangements per MY, podium distachyon is a self-pollinating annual (Gordon
were applicable to the entire tribe, the Ae. tauschii lineage et al., 2015) and the intermediate rate of evolution of its
would be expected to accumulate 203 major rearrange- genome is thus consistent with this hypothesis. However,
ments over the 10-MY period, the estimated divergence rice species having the A genome, which includes
time of the wheat and barley lineages and the age of Trit- O. sativa, are annual (Kellogg, 2009). So are many sor-
iceae (Dvorak and Akhunov, 2005; Bernhardt et al., 2017). ghum species (Price et al., 2005). Yet, rice and sorghum
This number exceeds the total number of 175 major rear- genomes are evolving slowly. This shows that annual
rangements that the Ae. tauschii lineage accumulated reproductive cycle alone leads to rates that are slower than
since the divergence of Triticeae and Brachypodieae 35 those observed in the A-, B- and D-genome Triticeae.
MYA. This simple reasoning suggests that the rate of gen- Synthesis of these observations and those reviewed in
ome evolution observed in the Ae. tauschii genome accel- Introduction suggests that genome size matters only if it is
erated recently, as the rate estimates suggest. accompanied by large quantities of active TEs. Likewise,
What factor(s) in addition to genome size could have generation length (annual versus perennial life cycle)
caused this rate acceleration? Aegilops tauschii, wild seems to play a role if it is accompanied by a large and
emmer wheat and the donors of its A and B genomes are dynamic pool of TEs.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
498 Jan Dvorak et al.

rearrangements rather than inversions in yeast (VanHulle


Causes of genome evolution
et al., 2007).
Of the 592 major rearrangements detected using Mechanism (iii), alternative transposition, takes place
Ae. tauschii genes as queries and assigned to the phyloge- between termini of two DNA transposons (Zhang et al.,
netic tree, 444 (75%) were inversions, almost all paracentric 2006; Huang and Dooner, 2008) located in the same chro-
and the rest were translocations, duplications and NCIs. Of matid. If one TE is defective and the termini of the two
translocations, 77 (13%) were intrachromosomal. Since transposons face in opposite orientation, transposition
paracentric inversions are by definition intrachromosomal results in translocations and inversions (Zhang et al.,
rearrangements, 88% of all rearrangements were intrachro- 2009). A ring intermediate (Zhang et al., 2009) can be rein-
mosomal events. A mechanism by which rearrangements serted resulting in intrachromosomal duplications and
originate must be consistent with this reality. Intrachromo- translocations. Another important aspect of alternative
somal rearrangements may hypothetically originate by: (i) transposition as the prime mechanism for the origin of
two double-strand DNA breaks (DSBs) joined by the non- major chromosome rearrangements is its independence
homologous end-joining (NHEJ) mechanism in an inverted from meiotic HR. In contrast, erosion of collinearity corre-
orientation; (ii) repair of single DSB by intrachromosomal lated highly with meiotic HR in this and a previous study
HR; or (iii) alternative transposition. (Luo et al., 2017). We therefore suggest that meiotic HR
If the DSBs in mechanism (i) were independent of each errors are the primary cause of collinearity erosion. Intra-
other and homogeneously distributed along a chromo- chromosomal HR and alternative transposition are the pri-
some arm, the average length of a paracentric inversion mary causes of major chromosome rearrangements. The
and intrachromosomal translocation would be about a recurrent breaking of the chromosome due to alternative
third of the chromosome arm, since two random uniform transposition (Zhang et al., 2009) is consistent with our
breaks divide a chromosome arm into three interchange- finding inversions with common breakpoints.
able segments. This expected length is much longer than These considerations do not apply to clusters of translo-
what we found. Thus (i) seems an unlikely mechanism to cations present in the distal 4 Mb of rice chromosomes
account for the lengths of most of the inversions and Os11 and Os12, since they evolved by a different process.
translocations. The two chromosomes are homoeologous and originated
An attractive feature of mechanism (ii) is that it could by the pan-grass WGD (Paterson et al., 2004). In contrast
account for the origin of both paracentric inversions and with the rest of the chromosomes duplicated by the WGD,
intrachromosomal translocations. Intrachromosomal HR of this homoeologous region recurrently recombined produc-
repeats in the opposite orientation is expected to result in ing clusters of interchromosomal translocations. Similar
an inversion and intrachromosomal HR of repeats in the clusters of translocations are detectable between sorghum
same orientation is expected to result in excision of a cir- chromosomes Sb5 and Sb8, which correspond to Os11
cular intermediate that can be inserted somewhere else. and Os12, respectively and within B. distachyon chromo-
Thus, if a DSB would occur in a TE and if the members of some Bd4, which is a compound chromosome partly corre-
the TE family were distributed homogeneously along a sponding to Os11 and Os12 (International Brachypodium
chromosome arm and targeted for intrachromosomal HR Genome Initiative, 2010). It was originally suggested that
sequentially (first the closest, next the second closest, etc.), the translocations originated by recurrent gene conver-
the likelihood of a TE being chosen for recombination sions (Wang et al., 2011). Most of these translocations con-
would be multiplicative and exponentially decline with the tain a single gene, which is consistent with gene
distance along the chromosome arm. The fact that mix- conversions, but some contain from two up to seven colli-
tures of two exponential distributions provided signifi- near genes. Translocations involving several genes are too
cantly better fits to the inversion length data than a single long to be produced by gene conversion, which has led to
distribution in five of the six genomes would be consistent suggesting crossovers as the mechanism of their origin
with this hypothesis only if inversions were generated by (Wicker et al., 2015). Since the densities of translocations
mechanism (ii) and TEs were non-homogeneously dis- with 1–7 genes can be approximated by either Logistic or
tributed along the chromosome arm, which has been Weibull functions it is possible that translocations involv-
shown (Luo et al., 2017). A difficulty with this hypothesis is ing several genes originated by multiple independent gene
that it requires HR taking place within the same chromatid. conversions. A decline in the number of genes per translo-
However, DSB repair by HR appear to be limited to S and cation with distance from the chromosome terminus is
G2 phases in the cell cycle and uses sister chromatids; apparent in the rice, sorghum and B. distachyon genomes.
DSB in G1 are repaired by NHEJ (Rothkamm et al., 2003; This pattern is more consistent with gene conversions
Ira et al., 2004; Knoll et al., 2014; Orthwein et al., 2014; Shi- rather than crossovers as the cause, since lower crossover
bata, 2017). HR of inverted duplicated sequences were frequencies in the proximal regions are expected to pro-
shown to produce dicentric chromosomes and complex duce larger translocations, not smaller as was observed.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
Rates of genome evolution in grasses 499

Rearrangement dataset and its practical utility sorghum (Sorghum bicolor), v3.1 (Paterson et al., 2009). The
methodology used for gene collinearity analysis has been
Most inversions are unique evolutionary events that described earlier (Luo et al., 2017) and only essential informa-
rarely revert to the ancestral state. That makes inversions tion will be provided here. The amino acid sequences of 38 775
an exceptional and currently largely unexploited tool for HC genes located on the Ae. tauschii pseudomolecules (Luo
et al., 2017) were downloaded from (http://aegilops.wheat.ucda
phylogenetic reconstructions. We show here that the
vis.edu/ATGSP/annotation/) and the amino acid sequences of
boundaries of many inversions are clearly distinguishable 65 012 HC genes of wild emmer wheat were downloaded from
even after 50 MY of evolution. In inversion heterozygotes, http://wewseq.wixsite.com/consortium (Avni et al., 2017). Those
meiotic recombination by crossovers is suppressed of B. distachyon v3.1, rice v7.0 and sorghum v3.1 were down-
within the inverted haplotype because odd-number cross- loaded from Phytozome (https://phytozome.jgi.doe.gov/pz/portal.
html). BLASTP homology searches using amino acid sequences
over products are inviable. Even number crossover
of the Ae. tauschii HC genes as queries and amino acid
recombination produces viable products only if it sequences of wild emmer wheat, B. distachyon, rice and sor-
involves two of the four chromatids. Since multiple ghum used as targets were performed. A default BLASTP
crossovers are largely precluded by crossover interfer- parameter setting was used. The top alignment score was
ence in short genetic intervals, recombination between recorded.
the inverted and structurally wild-type haplotypes is Collinearity of top hits in the wild emmer wheat A and B
genomes and the genomes of B. distachyon, rice and sorghum,
effectively suppressed in short inversions. The haplotypes
ordered according to the order of HC genes along a
of the same inversion shared by different phylogenetic Ae. tauschii pseudomolecule, was analyzed as follows. Three or
lineages must therefore be of the same age and all dif- more genes were considered collinear if the starting
ferences between them must have evolved since the nucleotides of the top hits follow an ascending or descending
inversion origin. These properties of inversions are order and distances between them were <0.5 Mb on the
B. distachyon, rice and sorghum pseudomolecules or <5 Mb on
important for timing recent speciation events and were
the Ae. tauschii and wild emmer wheat pseudomolecules.
employed here to validate the phylogeny of the three Noncolinear genes interrupting a sequence of collinear
Triticeae genomes. genes were allowed. If an Ae. tauschii gene was homologous
To date, DNA phylogenetic analyses have primarily relied to tandem duplicated genes on a target pseudomolecule, only
on analyses of nucleotide sequences, with its set of assets one of the duplicated genes was recorded as collinear,
provided that it was in a collinear position on the pseudo-
and drawbacks. The rapidly increasing number of refer-
molecule.
ence-quality genome sequences opens the door to the use A spreadsheet dataset showing collinearity of each of the
of structural variation in phylogenetic studies. The dataset 38 775 Ae. tauschii HC genes along the wild emmer wheat A-
of structural rearrangements provided here (Tables S1 and and B-genome pseudomolecules and B. distachyon, rice,
S2) will be useful for other types of analyses, such as phylo- sorghum pseudomolecules was constructed (Table S1). Cells
genetic reconstructions in the grass family, comparative containing collinear genes were colored whereas those that were
not collinear were left colorless. Changes in gene order due to
functional genomics (by indicating putative orthologues inversions or translocations were thus indicated by changes in
across the grass family) and comparative gene mapping. cell color. For each rearrangement, three things were recorded:
Absence of structural variation is a critical prerequisite its starting gene in bp and ending gene on the Ae. tauschii pseu-
for success in projects involving recombination between domolecule, the branch of the grass phylogenetic tree (Figure 4)
homoeologous chromosomes. Tribe Triticeae involves over in which the rearrangement had taken place and the type of rear-
rangement. Rearrangements were coded as follows: A – inversion
300 species (Lo€ ve, 1984). Most of them can be hybridized of two genes, B – inversion of three genes, C – inversion of >3
with wheat and represent therefore a valuable genetic genes, D – translocation of two genes within a chromosome, E –
resource for wheat genetics and improvement. Structural translocation of three genes within a chromosome, F – transloca-
variation between genomes is the greatest obstacle to gene tion of >3 genes within a chromosome, iT – interstitial transloca-
introgression in the tribe. Inspection of the dataset of struc- tion between chromosomes, T – terminal translocation, Dup –
duplication of a segment, Del – deletion of a segment.
tural rearrangements provided in Tables S1 and S2 and
A similar analysis was performed using 42 004 HC genes anno-
results of their validation with BNG maps may assist target- tated on the rice pseudomolecules v7.0 as queries in searches
ing specific genes and specific regions for introgression to against genes annotated on the Ae. tauschii, wild emmer wheat,
wheat. B. distachyon and sorghum pseudomolecules (Table S2).
To quantify collinearity, Ae. tauschii genes collinear with a
EXPERIMENTAL PROCEDURES specific genome were counted and expressed as percent of all
Ae. tauschii genes. Collinearity along a pseudomolecule was
Gene collinearity and structural chromosome analyses expressed as the number of Ae. tauschii genes collinear with
genes on a target pseudomolecule per non-overlapping window
The following genome sequences were analyzed: Ae. tauschii v4.0 of 200 Ae. tauschii genes. Paired t-tests with Bonferroni correction
(Luo et al., 2017), wild emmer wheat (Avni et al., 2017), B. dis- using the percentages of collinear genes in the same 200-gene
tachyon, v3.1 (International Brachypodium Genome Initiative, intervals as variables were used to test statistical significance of
2010), rice (Oryza sativa) v7.0 (Chantret et al., 2005) and differences between genomes (Figure 1a).

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
500 Jan Dvorak et al.

Inversion and translocation lengths Phylogenetic tree and bootstrap confidence on its nodes
As the compared genomes differed in their sizes, the length of The presence/absence of each paracentric inversion involving >3
each rearrangement involving 3 or more collinear genes was collinear genes (category C) among the six genomes was con-
expressed in terms of the length of the rearrangement on the verted to binary format, 1 for the presence and 0 for the absence
Ae. tauschii pseudomolecule. The lengths were measured from of an inversion in a genome. Dice dissimilarity index was imple-
the position of the first to the last Ae. tauschii collinear gene mented in the DARwin v6.0.15 program (Perrier and Jacquemoud-
located within the rearrangement. Collet, 2006) to calculate the pairwise genetic distance between
Mean paracentric inversion lengths were computed for each of genomes. Dice dissimilarity index measures were used to draw a
the six genomes and statistical significance among the means was Weighted Neighbour-Joining (NJ) tree (Figure 4) with DARwin
analyzed as follows. Most parametric statistical methods include v6.0.15. Bootstrap analysis (10 000 replicates) was performed
the assumption that the sample is drawn from a population where using DARwin v6.0.15 on the NJ tree. The FigTree v1.4.3 (http://
the variables are normally distributed. A Kolmogorov–Smirnov test tree.bio.ed.ac.uk/software/figtree/) program was used to edit the
to check the normality of the distribution using SPSS (ver. 22; IBM NJ tree.
Corp, 2013, https://www.ibm.com/analytics/data-science/predic
tive-analytics/spss-statistical-software) indicated that the data devi- Dot-plot
ated significantly from normality (P ≤ 0.01). This was visualized by The annotated primary transcripts and corresponding protein
histograms, which all showed positive skewness (skewed to the sequences in the emmer wheat genome assembly for accession
right). The data were transformed by the Boxcox transformation
Zavitan and the Ae. tauschii AL8/78 assembly v4.0 were down-
(Box and Cox, 1964). The transformed data were normally dis- loaded. Only the first transcript for each gene was retrieved. A
tributed. BLASTP search was conducted using the Ae. tauschii proteins as
Analysis of variance (ANOVA) of transformed paracentric inver- queries and the emmer wheat proteins as targets. The top two hits
sion lengths in a completely randomized design (CRD) showed with an E-value <1e-5 were recorded. Homologous gene pairs
significant genome effects (P ≤ 0.10). Duncan’s test of means com- identified by BLASTP were used to detect syntenic blocks using
parison was used to group the mean paracentric inversion length the software MCscanX (Wang et al., 2012). Collinear segments for
with a < 0.05. all possible pairs of chromosomes were detected using a match
An exponential function was fitted to each histogram (counts score of 50, a gap penalty of 1, an E-value threshold of 1e-05 and
of paracentric inversions on y-axis and their length on x-axis). a minimum of three genes and maximum gap size between two
Exponential functions were estimated in two steps. In the first consecutive proteins of 25 to declare a collinear block. The pair-
step, an exponential distribution was fitted to the paracentric wise comparative dot-plot using the MCscanX output was drawn
inversion length data by maximum likelihood estimation (MLE). using R (Figure 1b).
In the second step, the estimated exponential distribution was
multiplied by the total number of paracentric inversions. A likeli- De novo BioNano genome map assembly
hood ratio test was conducted to test the null hypothesis that
inversion lengths follow an exponential distribution against the High-molecular-weight (HMW) DNA was isolated from young
alternative hypothesis that the lengths follow a mixture of expo- leaves of wild emmer wheat accession Zavitan, Aegilops tauschii
nential distributions. This null hypothesis was rejected for all spe- ssp. tauschii accession AS75 and Ae. tauschii ssp. tauschii acc.
cies but rice (P < 0.0004). For the species where the null CIae 1 by Amplicon Express (Pullman, WA, USA). The nicking
hypothesis was rejected, a mixture of two exponential distribu- endonuclease Nt.BspQI (New England BioLabs, Ipswich, MA,
tions was fitted by MLE, which produced estimates for the means USA) was used to nick HMW DNA molecules. The nicked DNA
of the component distributions l1 and l2, as well as the mixture molecules were labeled and stained according to the instructions
proportion a (Figure 3). of the IrysPrep Reagent Kit (BioNano Genomics), as described in
detail in Luo et al. (2017). The DNA sample was loaded onto the
Ks analysis nanochannel array of an IrysChip (BioNano Genomics) and was
automatically imaged by the Irys system (BioNano Genomics).
Seven Ae. tauschii paracentric inversions shared with the wild Raw DNA molecules >20 kb were collected and converted into.
emmer wheat B genome (category AetB) involving 107 genes and bnx files by AutoDetect software to obtain basic labelling and
two Ae. tauschii paracentric inversions shared with the wild DNA length information. Molecules >180 kb that passed quality
emmer wheat A genome (category AetA) involving eight genes control were aligned, clustered and assembled into BNG contigs
were analyzed (Table S5). Amino acid sequences of compared using the BioNano Genomics assembly pipeline as described in
genes were aligned using ClustalW (Thompson et al., 1994), which previous publications (Lam et al., 2012; Cao et al., 2014). The P-
aligned and delimited the CDS. Amino acid sequences were then value thresholds used for pairwise assembly, extension/refine-
replaced by nucleotide sequence and the number of synonymous ment and final refinement stages were adjusted according to the
nucleotide substitutions were counted for each compared gene genome sizes (1e 5/genome size in Mb). The initial BNG maps
pair. Ks values were calculated using Bioperl module Bio::Align:: were then visually checked for chimeric contigs, which were dis-
DNAStatistics. sociated.
For a statistical analysis, mixed model ANOVA was performed
to assess significance of variation in Ks values among the A, B Sequence alignments on a BNG map and inversion
and Ae. tauschii genomes, using Ks values between genes validation
within an inversion as replications and different inversions as
independent sampling units. The null hypothesis of zero mean To compare a nucleotide sequence with a BNG map, the nucleo-
difference was tested. The difference in mean divergence was tide sequences were digested in silico with the restriction endonu-
then tested by two-sided conditional t-test, Bonferroni cor- clease Nt.BspQ1 by using Knickers (BioNano Genomics). The
rected. alignment of a nucleotide sequence with the BNG map or

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
Rates of genome evolution in grasses 501

alignments between BNG maps were computed with RefAligner pseudomolecules and those of Brachypodium distachyon, Oryza
(BioNano Genomics). The alignments were visualized in IrysView sativa and Sorghum bicolor.
(BioNano Genomics) and those of interest were saved. Software Table S2. Collinearity with Oryza sativa pseudomolecules of best
packages for these operations were obtained from BioNano Geno- blast hit targets in the wild emmer A- and B-genome pseudo-
mics (https://bionanogenomics.com/support/software-downloads/). molecules and those of Aegilops tauschii, Brachypodium dis-
Each paracentric inversion unique to the Ae. tauschii (Luo et al., tachyon and Sorghum bicolor.
2017) or wild emmer wheat A or B genomes (Avni et al., 2017) Table S3. Synopsis of rearrangements, identified using Aegilops
and involving three collinear genes and >3 collinear genes (cate- tauschii genes as blast queries, allocated to individual branches of
gories B and C, respectively, in Tables S1 and S2) was validated the phylogenetic tree (Figure 4).
(Figure 2). Coordinates (in bp) on the pseudomolecule of collinear Table S4. Analysis of translocations between Oryza sativa chro-
genes flanking a breakpoint were used to identify the breakpoint mosomes Os11 and Os12, within Brachypodium distachyon chro-
region on the BNG contig and a pseudomolecule. mosomes Bd4 and between Sorghum bicolor chromosomes Sb5
and Sb8, wild emmer wheat chromosomes 4B and 5B.
Naming structural variants
Table S5. Ka and Ks values in inversions shared between Aegilops
We used the following abbreviations for rearrangements: inversion tauschii and either the A or B genomes of wild emmer wheat.
(Inv), translocation (T), duplication (Dup), deletion (Del) and nested
REFERENCES
chromosome insertion (NCI). We employed the HC gene set v2.0
annotated in the Ae. tauschii genome v4.0 as a reference for the Akhunov, E.D., Goodyear, J.A., Geng, S. et al. (2003) The organization and
grass family and the branches of the grass phylogenetic tree as indi- rate of evolution of the wheat genomes are correlated with recombination
cators of rearrangement location and origin. Thus, InvAetA rates along chromosome arms. Genome Res. 13, 753–763.
(AET1Gv20186400-900) is an inversion shared by the Ae. tauschii Andreasen, K. and Baldwin, B.G. (2001) Unequal evolutionary rates between
(Aet) and wheat A genomes starting with HC Ae. tauschii gene annual and perennial lineages of checker mallows (Sidalcea, Malvaceae):
evidence from 18S-26S rDNA internal and external transcribed spacers.
AET1Gv20186400 and ending with NC gene AET1Gv20186900. For
Mol. Biol. Evol. 18, 936–944.
translocations, Ae. tauschii genes flanking a breakpoint are given, Avni, R., Nave, M., Barad, O. et al. (2017) Wild emmer genome architecture
e.g. TAetBA (AET4Gv20018000-9500; AET5Gv20867400-600) is a and diversity elucidate wheat evolution and domestication. Science, 357,
reciprocal translocation with breakpoints on 4D between genes 93–97.
AET4Gv20018000 and AET4Gv20019500 and 5D between genes Bernhardt, N., Brassac, J., Kilian, B. and Blattner, F.R. (2017) Dated tribe-
AET5Gv20867400 AET5Gv20867600. Once defined, the transloca- wide whole chloroplast genome phylogeny indicates recurrent hybridiza-
tion can be abbreviated as, e.g., T(4AL;5AL). tions within Triticeae. BMC Evol. Biol. 17, 141.
Birol, I., Raymond, A., Jackman, S.D. et al. (2013) Assembling the 20 Gb
Availability of data and materials white spruce (Picea glauca) genome from whole-genome shotgun
sequencing data. Bioinformatics, 29, 1492–1497.
All data generated in this study are included in this published arti- Bowers, J.E., Chapman, B.A., Rong, J.K. and Paterson, A.H. (2003) Unravel-
cle and Supporting Information files. Published Aegilops tauschii ling angiosperm genome evolution by phylogenetic analysis of chromo-
somal duplication events. Nature, 422, 433–438.
sequences on which these analyses are based are available from
Box, G.E.P. and Cox, D.R. (1964) An analysis of transformations. J. Roy.
this link http://aegilops.wheat.ucdavis.edu/ATGSP/data.php. Stat. Soc. Series B, 26, 211–234.
Brubaker, C.L., Paterson, A.H. and Wendel, J.F. (1999) Comparative genetic
ACKNOWLEDGEMENT mapping of allotetraploid cotton and its diploid progenitors. Genome,
This publication is based upon work supported by the US National 42, 184–203.
Science Foundation (NSF) under grant number IOS-1238231. Cao, H.Z., Hastie, A.R., Cao, D.D. et al. (2014) Rapid detection of structural
variation in a human genome using nanochannel-based genome map-
CONFLICT OF INTEREST ping technology. Gigascience, 3, 34.
Chantret, N., Salse, J., Sabot, F. et al. (2005) Molecular basis of evolution-
The authors declare that they have no competing interests. ary events that shaped the hardness locus in diploid and polyploid wheat
species (Triticum and Aegilops). Plant Cell, 17, 1033–1045.
AUTHORS’ CONTRIBUTIONS Dvorak, J. (2009) Triticeae genome structure and evolution. Plant Genet.
Genomics, 7, 685–711.
J.D, M.-C.L., K.R.D., C.M.J., T.Z., L.W., H-G.M, P.E.M, B.S.G., Dvorak, J. and Akhunov, E.D. (2005) Tempos of deletions and duplica-
tions of gene loci in relation to recombination rate during diploid and
K.M.D., Y.Q.G and A.D. planned the work. M.-C.L., K.R.D and
polyploid evolution in the Aegilops-Triticum alliance. Genetics, 171,
T.Z. performed BNG mapping and analyses; J.D., L.W., T.Z., 323–332.
H.-G.M., X.D., K.M.D., P.Q., P.E.M., Y.Q.G., R.K.R., H.D., Dvorak, J. and Zhang, H.B. (1990) Variation in repeated nucleotide
sequences sheds light on the phylogeny of the wheat B and G genomes.
F.M.Y. and P.J.G. performed the analyses of genome struc-
Proc. Natl Acad. Sci. USA, 87, 9640–9644.
ture and evolution. H-G.M., X.D., M.W.D., R.K.R. and H.D. Dvorak, J. and Zhang, H.B. (1992) Molecular tools for study of the phy-
performed statistical analyses. J.D. organized and managed logeny of diploid and polyploid species of Triticeae. In 1st Int. Symp. on
Triticeae. Helsingborg: Hereditas, pp. 37–42.
the contributions to this publication and was primary author.
Dvorak, J., McGuire, P.E. and Mendlinger, S. (1984) Inferred chromosome
All authors read and approved the final manuscript. morphology of the ancestral genome of Triticum. Plant Syst. Evol. 144,
209–220.
SUPPORTING INFORMATION Dvorak, J., di Terlizzi, P., Zhang, H.B. and Resta, P. (1993) The evolution of
polyploid wheats: identification of the A genome donor species. Gen-
Additional Supporting Information may be found in the online ver- ome, 36, 21–31.
sion of this article. Dvorak, J., Deal, K.R., Luo, M.C., You, F.M., von Borstel, K. and Dehghani,
Table S1. Collinearity with Aegilops tauschii pseudomolecules of H. (2012) The origin of spelt and free-threshing hexaploid wheat. J.
best blast hit targets in the wild emmer A- and B-genome Hered. 103, 426–441.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
502 Jan Dvorak et al.

Gale, M.D. and Devos, K.M. (1998) Plant comparative genetics after Luo, M.C., Gu, Y.Q., Puiu, D. et al. (2017) Genome sequence of the progeni-
10 years. Science, 282, 656–659. tor of the wheat D genome Aegilops tauschii. Nature, 551, 498–502.
Goff, S.A., Ricke, D., Lan, T.H. et al. (2002) A draft sequence of the rice gen- Marcussen, T., Sandve, S.R., Heier, L. et al. (2014) Ancient hybridizations
ome (Oryza sativa L. ssp japonica). Science, 296, 92–100. among the ancestral genomes of bread wheat. Science, 345, 1250092.
Gordon, S.P., Liu, L. and Vogel, J.P. (2015) The genus Brachypodium as a Martin, A.P. and Palumbi, S.R. (1993) Body size, metabolic-rate, genera-
model for perenniality and polyploidy. In Plant Genetics and Genomics: tion time and the molecular clock. Proc. Natl Acad. Sci. USA, 90, 4087–
Crop Models (Vogel, J.P., ed). Switzerland: 2015 Springer International 4091.
Publishing, pp. 313–325. Mascher, M., Gundlach, H., Himmelbach, A. et al. (2017) A chromosome
Gornicki, P., Zhu, H.L., Wang, J.W., Challa, G.S., Zhang, Z.Z., Gill, B.S. and conformation capture ordered sequence of the barley genome. Nature,
Li, W.L. (2014) The chloroplast view of the evolution of polyploid wheat. 544, 427–433.
New Phytol. 204, 704–714. Massa, A.N., Wanjugi, H., Deal, K.R. et al. (2011) Gene space dynamics
Haeggstrom, C.A. and Skyten, R. (1996) Flowering and individual survival of during the evolution of Aegilops tauschii, Brachypodium distachyon,
a population of the grass Brachypodium sylvaticum in Nato, Aland Oryza sativa and Sorghum bicolor genomes. Mol. Biol. Evol. 28, 2537–
islands, SW Finland. Ann. Bot. Fenn. 33, 1–10. 2547.
Hasegawa, M., Kishino, H. and Yano, T.A. (1985) Dating of the human ape McFadden, E.S. and Sears, E.R. (1946) The origin of Triticum spelta and its
splitting by a molecular clock of mitochondrial-DNA. J. Mol. Evol. 22, free-threshing hexaploid relatives. J. Hered. 37, 81–89, 107–116.
160–174. Middleton, C.P., Senerchia, N., Stein, N., Akhunov, E.D., Keller, B., Wicker,
Hastie, A.R., Dong, L.L., Smith, A. et al. (2013) Rapid genome mapping in T. and Kilian, B. (2014) Sequencing of chloroplast genomes from wheat,
nanochannel arrays for highly complete and accurate de novo sequence barley, rye and their relatives provides a detailed insight into the evolu-
assembly of the complex Aegilops tauschii genome. PLoS ONE, 8, tion of the Triticeae tribe. PLoS ONE, 9, e85761.
e55864. Ming, R., Liu, S.C., Lin, Y.R. et al. (1998) Detailed alignment of Saccharum
Hsiao, C., Chatterton, N.J., Asay, K.H. and Jensen, K.B. (1995) Phylogenetic- and Sorghum chromosomes: comparative organization of closely related
relationships of the monogenomic species of the wheat tribe, Triticeae diploid and polyploid genomes. Genetics, 150, 1663–1682.
(Poaceae), inferred from nuclear rDNA (internal transcribed spacer) Murphy, W.J., Larkin, D.M., Everts-van der Wind, A. et al. (2005) Dynamics
sequences. Genome, 38, 211–223. of mammalian chromosome evolution inferred from multispecies com-
Huang, J.T. and Dooner, H.K. (2008) Macrotransposition and other complex parative maps. Science, 309, 613–617.
chromosomal restructuring in maize by closely linked transposons in Neale, D.B., Wegrzyn, J.L., Stevens, K.A. et al. (2014) Decoding the massive
direct orientation. Plant Cell, 20, 2019–2032. genome of loblolly pine using haploid DNA and novel assembly strate-
Huang, S., Sirikhachornkit, A., Su, X., Faris, J., Gill, B.S., Haselkorn, R. and gies. Genome Biol. 15, R59.
Gornicki, P. (2002) Genes encoding plastid acetyl-CoA carboxylase and 3- Neale, D.B., Martınez-Garcıa, P.J., De La Torre, A.R., Montanari, S. and Wei,
phopshoglycerate kinase of the Triticum/Aegilops complex and the evo- X.X. (2017a) Tree genome sequencing: novel insights into plant biology.
lutionary history of polyploid wheat. Proc. Natl Acad. Sci. USA, 99, 8133– Annu. Rev. Plant Biol. 68, 457–483.
8138. Neale, D.B., McGuire, P.E., Wheeler, N.C. et al. (2017b) The Douglas-fir gen-
International Brachypodium Genome Initiative (2010) Genome sequencing ome sequence reveals specialization of the photosynthetic apparatus in
and analysis of the model grass Brachypodium distachyon. Nature, 463, Pinaceae. G3, 7, 3157–3167.
763–768. Nesbitt, M. and Samuel, D. (1996) From staple crop to extinction? The
Ira, G., Pellicioli, A., Balijja, A. et al. (2004) DNA end resection, homologous archaeology and history of hulled wheats. In Hulled Wheats. Promoting
recombination and DNA damage checkpoint activation require CDK1. the Conservation and Use of Underutilized and Neglected Crops. 4. Proc.
Nature, 431, 1011–1017. 1st Internatl. Workshop on Hulled Wheats (Padulosi, S., Hammer, K. and
Jorgensen, C., Luo, M.-C., Ramasamy, R., Dawson, M., Gill, B.S., Korol, A.B., Heller, J., eds). Castelvecchio Pacoli, Tuscany, Italy: International Plant
Distelfeld, A. and Dvorak, J. (2017) A high-density genetic map of wild Genetic Resources Institute, Rome, Italy, pp. 41–100.
emmer wheat from the Karaca Dag  region provides new evidence on the Nystedt, B., Street, N.R., Wetterbom, A. et al. (2013) The Norway spruce
structure and evolution of wheat chromosomes. Front. Plant Sci. 8, 1798. genome sequence and conifer genome evolution. Nature, 497, 579–584.
Kellogg, E.A. (2009) The evolutionary history of ehrhartoideae, oryzeae and Orthwein, A., Fradet-Turcotte, A., Noordermeer, S.M., Canny, M.D., Brun,
oryza. Rice, 2, 1–14. C.M., Strecker, J., Escribano-Diaz, C. and Durocher, D. (2014) Mitosis
Kihara, H. (1944) Discovery of the DD-analyser, one of the ancestors of Triti- inhibits DNA double-strand break repair to guard against telomere
cum vulgare (Japanese). Agric. Hortic. 19, 13–14. fusions. Science, 344, 189–193.
Knoll, A., Fauser, F. and Puchta, H. (2014) DNA recombination in somatic Paterson, A.H., Bowers, J.E. and Chapman, B.A. (2004) Ancient polyploidiza-
plant cells: mechanisms and evolutionary consequences. Chromosome tion predating divergence of the cereals and its consequences for com-
Res. 22, 191–201. parative genomics. Proc. Natl Acad. Sci. USA, 101, 9903–9908.
Kohne, D. (1970) Evolution of higher-organism DNA. Q. Rev. Biophys. 3, Paterson, A.H., Bowers, J.E., Bruggmann, R. et al. (2009) The Sorghum
327–375. bicolor genome and the diversification of grasses. Nature, 457, 551–556.
Lam, E.T., Hastie, A., Lin, C. et al. (2012) Genome mapping on nanochannel Perrier, X. and Jacquemoud-Collet, J.P. (2006) DARwin software http://dar
arrays for structural variation analysis and sequence assembly. Nat. Bio- win.cirad.fr/darwin.
tech. 30, 771–776. Price, H.J., Dillon, S.L., Hodnett, G., Rooney, W.L., Ross, L. and Johnston,
Li, L.F., Liu, B., Olsen, K.M. and Wendel, J.F. (2015) A re-evaluation of the J.S. (2005) Genome evolution in the genus Sorghum (Poaceae). Ann.
homoploid hybrid origin of Aegilops tauschii, the donor of the wheat D- Bot. 95, 219–227.
subgenome. New Phytol. 208, 4–8. Rothkamm, K., Kruger, I., Thompson, L.H. and Lobrich, M. (2003) Pathways
Lo€ ve, A. (1984) Conspectus of the Triticeae. Feddes Rep. 95, 425–521. of DNA double-strand break repair during the mammalian cell cycle.
Luo, M.C., Deal, K.R., Akhunov, E.D. et al. (2009) Genome comparisons Mol. Cell. Biol. 23, 5706–5715.
reveal a dominant mechanism of chromosome number reduction in Schnable, P.S., Ware, D., Fulton, R.S. et al. (2009) The B73 Maize Genome:
grasses and accelerated genome evolution in Triticeae. Proc. Natl Acad. complexity, diversity and dynamics. Science, 326, 1112–1115.
Sci. USA, 106, 15780–15785. Shibata, A. (2017) Regulation of repair pathway choice at two-ended DNA
Luo, M.C., Gu, Y.Q., You, F.M. et al. (2013) A 4-gigabase physical map double-strand breaks. Mutat. Res. 803, 51–55.
unlocks the structure and evolution of the complex genome of Aegilops Smith, S.A. and Donoghue, M.J. (2008) Rates of molecular evolution are
tauschii, the wheat D-genome progenitor. Proc. Natl Acad. Sci. USA, 110, linked to life history in flowering plants. Science, 322, 86–89.
7940–7945. Stevens, K.A., Wegrzyn, J.L., Zimin, A. et al. (2016) Sequence of the sugar
Luo, M.C., You, F.M., Li, P.C., Wang, J.R., Zhu, T.T., Dandekar, A.M., Leslie, pine megagenome. Genetics, 204, 1613–1626.
C.A., Aradhya, M., McGuire, P.E. and Dvorak, J. (2015) Synteny analysis Tang, H.B., Bowers, J.E., Wang, X.Y., Ming, R., Alam, M. and Paterson, A.H.
in Rosids with a walnut physical map reveals slow genome evolution in (2008) Perspective – Synteny and collinearity in plant genomes. Science,
long-lived woody perennials. BMC Genomics, 16, 707. 320, 486–488.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503
Rates of genome evolution in grasses 503

Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improv- pinpoint the geographic origin of hexaploid wheat. New Phytol. 198,
ing the sensitivity of progressive multiple sequence alignment through 925–937.
sequence weighting, position-specific gap penalties and weight matrix Wicker, T., Wing, R.A. and Schubert, I. (2015) Recurrent sequence exchange
choice. Nucleic Acids Res. 22, 4673–4680. between homeologous grass chromosomes. Plant J. 84, 747–759.
Town, C., Schmidt, R. and Bancroft, I. (2011) Comparative genome analysis Wilson, M.A., Gaut, B. and Clegg, M.T. (1990) Chloroplast DNA evolves
at the sequence level in the Brassicaceae. Genet. Genomics Brassicaceae, slowly in the palm family (Arecaceae). Mol. Biol. Evol. 7, 303–314.
9, 171–194. Wu, F.N. and Tanksley, S.D. (2010) Chromosomal evolution in the plant
Tuskan, G.A., DiFazio, S., Jansson, S. et al. (2006) The genome of black cot- family Solanaceae. BMC Genomics, 11, 182.
tonwood, Populus trichocarpa (Torr. & Gray). Science, 313, 1596–1604. Xiao, M., Phong, A., Ha, C. et al. (2007) Rapid DNA mapping by fluorescent
VanHulle, K., Lemoine, F.J., Narayanan, V., Downing, B., Hull, K., McCul- single molecule detection. Nucleic Acids Res. 35, e15.
lough, C., Bellinger, M., Lobachev, K., Petes, T.D. and Malkova, A. (2007) Yamane, K. and Kawahara, T. (2005) Intra- and interspecific phylogenetic
Inverted DNA repeats channel repair of distant double-strand breaks into relationships among diploid Triticum-Aegilops species (Poaceae) based
chromatid fusions and chromosomal rearrangements. Mol. Cell. Biol. 27, on base-pair substitutions, indels and microsatellites in chloroplast non-
2601–2614. coding sequences. Am. J. Bot. 92, 1887–1898.
Wang, X.Y., Tang, H.B. and Paterson, A.H. (2011) Seventy million years of Zhang, J.B., Zhang, F. and Peterson, T. (2006) Transposition of reversed Ac
concerted evolution of a homoeologous chromosome pair, in parallel, in element ends generates novel chimeric genes in maize. PLoS Genet. 2,
major Poaceae lineages. Plant Cell, 23, 27–37. 1535–1540.
Wang, Y.P., Tang, H.B., DeBarry, J.D. et al. (2012) MCScanX: a toolkit for Zhang, J.B., Yu, C.H., Pulletikurti, V., Lamb, J., Danilova, T., Weber, D.F.,
detection and evolutionary analysis of gene synteny and collinearity. Birchler, J. and Peterson, T. (2009) Alternative Ac/Ds transposition
Nucleic Acids Res. 40, e49. induces major chromosomal rearrangements in maize. Gene Dev. 23,
Wang, J.R., Luo, M.C., Chen, Z.X., You, F.M., Wei, Y.M., Zheng, Y.L. and 755–765.
Dvorak, J. (2013) Aegilops tauschii single nucleotide polymorphisms Zhao, G.Y., Zou, C., Li, K. et al. (2017) The Aegilops tauschii genome reveals
shed light on the origins of wheat D-genome genetic diversity and multiple impacts of transposons. Nat. Plants, 3, 946–955.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2018), 95, 487–503

You might also like