Title: Plant Genomic Sequencing: Bridging the Diversity Gap

Author: Kaitlyn Tom
Summary:
Here I investigate the past and present literature to explore the evolutionary gains that have been
made in the field of plant genomics in the past 15 years with regard to genomes that have been
sequenced and functions of important gene families (MADS-box, KNOX, SPL). As sequencing
becomes more cost effective, efficient, and accessible, sequencing data will continue to
accumulate. A focus on comparative analyses of this data in the coming years is essential and
may lead to more evolutionary breakthroughs in the near future, helping to close existing
phylogenetic gaps in plant evolution.
Introduction:
Since the 2000 sequencing of Arabidopsis thailiana, there has been an exponential increase in
plant sequencing, with 223 plant genomes having been sequenced in the past 15 years (of which
include an array of mosses, liverworts, lycophytes, gymnosperms, and angiosperms) (NCBI,
2015; IBG, 2015), a distribution of which is provided in Table 1.
Plant Lineages
Number of Species Sequenced
Angiosperm
202 (Arabidopsis thaliana, etc)
Gymnosperm
2
(Picea abies, Pinus taeda)
Lycophyte
1
(Selaginella moellendorffii)
Moss
1
(Physcomitrella patens)
Algae
17
(Cyanidioschyzon merolae, etc)
Table 1: Number of species from each plant class sequenced from years 2000-2015
This sequencing boom in recent years may be attributed to the development of more efficient,
accurate read, and cost effective sequencing and assembly technologies that have made
sequencing genomes increasingly accessible and feasible.
Before whole genome sequencing, one could clone and sequence individual genes of interest to
learn about its properties. Whole genomic sequencing and analysis of Arabidopsis thaliana in
2000, however, provided additional means of studying genes and plant evolution on the
molecular level, helping to strengthen previous phylogenetic inferences while shaping new ones.
The first round of plant genome sequencing enabled the first generation of functional genomics
that helped to define the roles of hundreds of genes, provided access to sequence-based markers
for breeding, and glimpses into plant evolutionary history. Each new genome that is sequenced
and analyzed continues to reveal new aspects of genome biology that can help to close the
phylogenetic gap within available plant genomes.
In this literature review, I explore the past and present literature to summarize the field of plant
genomics, its transformation in the past 15 years, and gains that have been made from earlier
sequencing projects. Further, I address gaps that exist within the field as well as measures that
may minimize the given disparities.
Important Gains from Plant Sequencing:

Here I address the evolutionary advances that have been made in the past 15 years within major
plant groups such as green algae, mosses, lycophytes, gymnosperms, and angiosperms, based on
whole genome sequencing and analysis. A chronological synopsis of the data acquired from past
sequencing projects within each major plant group has been provided (Table 2).
The genomic sequencing of Arabidopsis thaliana in 2000 revolutionized the field of plant
genomics and paved the way for a collection of other plant genomes to be sequenced and
analyzed (AGI, 2000). Using overlapping BAC clones and Sanger whole genome sequencing
and assembly methods, researchers finally assembled the 125 megabase genome, uncovering
more than 25,000 protein encoding genes from 11,000 gene families in the process (AGI, 2000).
Further analysis and comparison of the genome to existing sequenced genomes (Escherichia
coli, Synechocystis sp., Saccharomyces cerevisiae, C. elegans, Drosophila, and a non-redundant
protein set of Homo sapiens) revealed much about the evolution of A. thaliana, flowering plants,
and the broader class of green plants. Key discoveries about the evolution of A. thaliana from
this project include the conservation of many genes amongst all organisms, the involvement of a
whole genome duplication, genes of bacterial origin (showing that many genes of the
endosymbiotic ancestor of the plastid have been transferred to the nucleus, contributing to
diverse functions such as photoautotrophic growth and signaling), new members of receptor
families, the independent evolution of several families of transcription factors, and cellular
components for plant-specific functions. The gains from this sequencing project have been used
as stepping-stones for a diversity of fields beyond plant biology such as agricultural science,
evolutionary biology, bioinformatics, combinatorial chemistry, functional and comparative
genomics, and molecular medicine (AGI, 2000).
A few years following this sequencing project, Chlamydomonas reinhardtii, the green algae
which diverged from land plants over one billion years ago, was sequenced in 2007, revealing
much about the transition to land (Merchant, 2007). Comparisons to the proteomes of humans
and Arabidopsis reveal key evolutionary discoveries: 1226 gene families, protein similarity to
that in animal flagella proteomes (supports loss of flagella in angiosperms and maintenance in
animals), and proteins homologous to those in plants but not animals that function in chloroplasts
(Merchant, 2007). Hypotheses either suggest their presence in the common plant-animal ancestor
and retained in Chlamydomonas and angiosperms, but lost or diverged in animals, horizontal
transfer into Chlamydomonas, or arise after divergence of animals (but before the divergence of
Chlamydomonas) (Merchant, 2007).
In 2007, further advances were made in plant genome sequencing with the publication of the
Physcomitrella patens moss (early non-vascular land plant) whole genome sequence. Compared
to the genomes of flowering plants (separated by more than 400 million years) and unicellular
aquatic algae, the moss genome revealed important evolutionary changes related to the move to
land such as an increase in gene family complexity, loss of genes associated with aquatic living
(e.g., flagellar arms), gain in genes involved in tolerating terrestrial stresses (e.g. variation in
temperature and water availability), and the development of the auxin, abscisic acid signaling
pathways used in directing multicellular growth and response to dry environments (Rensing,
2007).

Four years later, the genome of an early vascular plant was sequenced within the lycophyte
Selaginella moellendorffii, elucidating the transition to a sporophyte dominant life cycle (Banks,
2011). Comparing Selaginella’s genome to those of the green algae Chlamydomonas, the moss
Physcomitrella, and 15 angiosperm species, researchers found that the transition from a
gametophyte- to a sporophyte-dominated life cycle required far fewer new genes than the
transition from a non-seed vascular to a flowering plant, which required three times more new
genes. Other important finds included modifications in posttranscriptional gene regulation,
including small RNA regulation of repetitive elements, absence of the trans-acting small
interfering RNA pathway, and extensive RNA editing of organellar genes (Banks, 2011).
The next plant lineage to be sequenced was the first gymnosperm, the Norway spruce Picea
abies, in 2013. Sequencing and analysis of the mega 20-gigabase genome revealed the presence
of numerous repeating transposable elements, long intron regions with low recombination
frequency, and no evidence of a whole genome duplication (WGD). These findings provide
insight into the limited evolution of reproductive barriers and morphological diversity within
gymnosperms and their divergence from angiosperms (Nystedt, 2013).
Helping to further explicate the evolution of flowering plants, Amborella trichopoda, the sole
surviving sister species to all angiosperms, was sequenced in 2013. Sequencing revealed a WGD
predating the diversification of angiosperms, possibly giving rise to novel genes critical to the
flowering process. Comparisons to the genomes of other angiosperms such as Arabidopsis, also
led to the discovery of gene order conservation (synteny) and the presence of floral regulatory
orthologs that first appeared in the ancestral angiosperm (Amborella Genome Project, 2013).

Species Name

Plant Class

Year Sequenced

Arabidopsis thaliana

Angiosperm

2000

Chlamydomonas
reinhardtii

Green Algae

2007

Physcomitrella
patens

Moss

2007

Selaginella
moellendorffii

Lycophyte

2011

Picea abies

Gymnosperm

2013

Key Findings
WGD, gene families,
functions, gene
conservation,
bacterial origin
Horizontal gene
transfer, flagellar
traces
Gain in gene
complexity, Hormone
signals direct growth
Fewer genes for
transition to land,
posttranscriptional
regulation
No WGD suspected,
long introns, repeat
transposable elements

Amborella trichopoda

Angiosperm

2013

WGD, gene
conservation in floral
regulatory pathways

Table 2: Summary of Key Findings from Previous Sequencing Projects
Important Gene Families:
From each genome that has been sequenced, researchers have uncovered new gene functions,
allowing them to expand upon what was previously known about certain gene families. Here, I
present three important gene families that have been found within the literature to play critical
roles within plant development and evolution.
1. Minichromosome maintenance 1 agamous deficiens and serum response factor (MADSbox):
MADS-box genes have been known to encode transcription factors that play a diversity of roles
in regulating flowering plant development and identity of floral organs (Thieben, 2000). While
over 100 MADS-box genes are in existence today (Parenicova, 2003), they can be categorized
into at least 9 different classes. The best studied include those involved in the ABCE model of
flower development, determining floral organ identity. Genes belonging to Class A include AP1
and AP2 and are involved in sepal (sep) and petal (pet) development. Class B genes include PI
and AP3 and control petal and stamen (sta) identity while C class genes such as AG, regulate
stamen and carpel (ca) development. Class E genes include AGL 2,4, and 9 and control petal,
stamen, carpel, and development. In addition to these floral organ identity genes, several other
MADS-box genes have been well studied within the literature and found to play important roles
in flower development (Table 3). Flowering locus C (FLC), for example, found to have arisen
from a single ancestral tandem duplication in a common ancestor of extant seed plants (Ruelens,
2013), was discovered to negatively regulate or repress flowering (Michaels, 1999). CAL,
another MADS-box gene was found to contain high sequence similarity to AP1, indicating
redundancy in function (floral meristem identity) (Becker, 2003). The transcription of MADSbox genes outside flowers and fruits as well as an increasing number of mutant and transgenic
flowering plants suggest that this gene family is also involved in regulation during vegetative
development. (Alvarez-Buylla, 2000; Huang, 1995; Ma, 1991; Rounsley,1995; Theiben, 2000).
The existence of MADS-box genes across gymnosperms, ferns, and mosses (non-flowering
plants), further demonstrates that the role of these genes in plants is not limited to the
development of fruits or flowers and may have evolved earlier than was thought originally
(Henschel, 2002; Münster, 1997, Tandre, 1995). Molecular evolutionary analyses suggest that the
most recent common ancestor of modern day MADS-box genes evolved much earlier than the
Cambrian explosion, approximately 650 mya (Nam, 2003).
2. Knotted1-Like Homeobox (KNOX):
Another well-studied gene family that has contributed to our current understanding of plant
evolution is KNOX. Encompassing two classes, KNOX I, II, KNOX genes belong to a large
family of transcription factors called homeobox genes (Mukherjee, 2009), which play an
important role in developmental regulation. Within Arabidopsis, several KNOX I genes were

discovered (Table 3), most notably Shootmeristemless (STM), which has been found to keep
stem cell pluripotency in shoot apical meristem (SAM), Kn1-like in Arabidopsis thaliana2
(KNAT2), Kn1-like in Arabidopsis thaliana6 (KNAT6), both discovered to maintain the
boundaries of the SAM during embryogenesis, and Brevipedicellus (BP) which also maintains
SAM activity (post embryogenesis) by repressing KNAT2, 6 (Hay, 2010). Though KNOX II class
has not been studied as extensively as class I genes, roles in transcriptional repression have been
suggested (Tsuda, 2015). Comparative analysis of KNOX amino acid sequences across a variety
of plant groups (e.g moss, lycophyte, angiosperm, gymnosperms green algae) reveals that KNOX
proteins were conserved and found in each of the observed land plants as well as in green algae,
with the greatest KNOX diversity discovered in angiosperms. Green plant KNOX genes were
found to be most similar to class I KNOX in higher plant forms, suggesting class I genes to be the
ancestral form (Gao, 2015). Further, the finding that green algae has only one KNOX gene while
bryophytes (first land plants) have five KNOX genes, supports results from previous studies that
suggest a KNOX gene duplication (Mukherjee et al., 2009).
3. Squamosa-promoter binding protein-like (SPL):
The independent evolution of heterospory within several clades of vascular land plants
(lycophytes, ferns, seed plants) indicate a selective advantage to the production of two separate,
endosporic sporangia in the transition to a environment above ground with little surrounding
water. One gene that is hypothesized to play a role in the evolution of this trait is Nozzle (NZZ), a
gene proposed to belong to the SPL gene family (Schiefthaler, 1999), encoding regulatory
transcription factors involved in plant growth and development (Klein, 1996) (Table 3). Studied
extensively in Arabidopsis, NZZ has been found to play a role in both male (pollen sac) and
female (nucellus) sporangia development (Schiefthaler, 1999). Further analysis of NZZ gives
genetic and molecular support to the proposal that heterosporous seed plants evolved from a
homosporous ancestor, and that NZZ orthologues may exist in homosporous, basal vascular
plants (Bateman, 1994; Schiefthaler, 1999). According to the Plant Transcription Factor
Database, analysis of the Selaginella genome showed the presence of 2 orthologous NZZ genes.
As heterospory evolved separately in lycophytes (Selaginella, Isoetes), ferns (aquatic), and
angiosperms, sequencing and analysis of additional tracheophytes such as ferns may tell us more
about these independent events and the evolution of heterospory. Several other important genes
within the SPL gene family include AtSPL14, AtSPL3, AtSPL4 and AtSPL5 (Table 3). In early
development of Arabidopsis, AtSPL14 has been discovered to play a role in early developmental
phase change from juvenile to adult growth, with the mutant transitioning to the flowering
reproductive phase much faster than the wild type (Stone et al., 2005). AtSPL3,4, and 5 on the
other hand, have been found to act in the exact opposite manner by regulating a prolonged
vegetative phase and delayed flowering (Wu, 2006).
Gene Name
AP1
AP2
PI, AP3
AG

Gene Family
MADS-box
MADS-box
MADS-box
MADS-box

Gene Function
floral meristem identity gene, Class A homeotic gene (sep, pet)
Class A homeotic gene (sep, pet)
Class B homeotic genes (pet, sta)
Class C homeotic gene (sta, ca)

AGL 2,4,9 MADS-box
Class E homeotic genes ( pet, sta, ca)
CAL
MADS-box
floral meristem identity gene
FLC
MADS-box
floral repression
STM
KNOX
maintain SAM stem cell pluripotency
KNAT 2
KNOX
SAM boundary maintenance during embryogenesis
KNAT 6
KNOX
SAM boundary maintenance during embryogenesis
BP
KNOX
maintains SAM activity post embryogenesis, represses KNAT2,6
NZZ
SPL
early sporangia development
AtSPL14
SPL
shorten vegetative phase, promote reproductive phase
AtSPL3,4,5 SPL
prolong vegetative phase, delay reproductive phase
Table 3: Arabidopsis Genes and Gene Function(s) Listed by Family
Future Directions:
Plant genomic sequencing has made great strides in the past 15 years. But where will the field
lead next? With newer, more efficient and cost effective sequencing techniques having been
developed in the past decade, sequencing has become increasingly affordable and accessible.
Thus, with these improvements, the field will likely see an exponential increase in sequencing
projects. Ideally this sharp increase in number and availability of sequencing data through
genomic databases will lead to a rise in comparative studies and eventually more breakthroughs
in plant and genomic evolution,
A heightened emphasis will also be placed on sequencing crop organisms, a trend we are already
seeing today. As the world’s population continues to grow, researchers will place greater focus on
sequencing projects that tell us more about molecular networks that control plant traits that
would improve for instance, crop yields, to fuel the human need for food and energy (Lee, 2014).
Existing Gaps:
Despite recent improvements made within the field, several gaps persist. While there has been an
overall increase in total plants sequenced in the past decade due to improvements in technology,
the number of angiosperms sequenced exceeds, by far, the number of species from all other plant
classes combined. In addition to this lack of plant species diversity, the mass accumulation of
sequencing data in recent years has created a disparity in subsequent annotation and analysis of
this data. Currently there is much more data than researchers can make sense of or analyze. This
last step is especially critical as it allows us to make biological sense of sequencing data to use in
comparative studies.
Of the plants sequenced to date, an overwhelming majority has been angiosperms, which
comprise nearly 90% of all living plant species (Neubauer, 2012). While efforts to sequence
other non-angiosperms are on going, the bias towards sequencing angiosperm crop species may
hamper funding for these sequencing projects. To date, most plants chosen for sequencing to date
are economically important, with seventy-three percent of plant genome publications up until
2013 having been on eudicot crop species (Michael, 2013). This bias in sequencing data is

problematic as the data acquired from predominate angiosperm sequencing projects can lead to
subjective phylogenetic evolutionary comparisons. While individuals from other plant lineages
have been sequenced, this data alone is not enough to make strong inferences about plant
evolution. Re-sequencing of these plants and or sequencing of their close relatives may help to
bridge the diversity gap and perhaps tell us more about the evolution of their traits in relation to
other plant classes.
The second gap that persists within the field involves data analysis. The mass accumulation of
sequencing data in the past few years has overwhelmed researchers, who have struggled to
analyze and make biological sense of such large quantities. To facilitate analysis of this data,
several innovative computational tools and bioinformatics approaches have been developed. In
the coming years, more comparative analyses of genomic sequences will need to be carried out
to extract relevant biological information from the abundance of sequenced data.
Evolutionary Implications of Comparative Genomics:
The next decade marks a new era of plant genomics, weighted towards comparative analyses of
genomes that have already been sequenced. Comparisons of whole genomes as well as specific
gene sequences across species can help researchers to identify regions of conservation as well as
regions that have changed over time, allowing inferences to be made about not only the evolution
of gene structure, but also gene function. These analyses can provide insight on evolutionary
consequences such as rates of evolution in certain genes and gene families, gene loss or retention
(due to duplication), and chromosomal rearrangements, all of which can contribute to
morphological change (Rahman, 2009). This concept of changes or modifications at the
molecular level affecting the evolution of developmental processes in different organisms and
phenotype forms the core of evolutionary developmental biology (“Evo-devo”) (Hall, 2012).
Genomic databases, such Phytozome and PlantGDB, can aid comparative genomic studies by
providing access to every green plant gene at the levels of gene sequence, structure, family, and
organization, making analyses convenient and accessible to the broader science community.
Using the tools available, one can search for orthologous sequences, entire genomes, gene
annotations, and information on gene function. (Goodstein, 2012).
In order for this approach to be effective however, accurate gene annotations or identifications of
gene regions are needed. Several approaches for sequence annotation, based on accuracy and
efficiency, have been described below. Though it is likely that these approaches will become 
antiquated in the near future, their use has been well supported in the literature to this point and 
is worth mentioning. 
Methods for Genome Analysis and Annotation:
Genome annotations are generally divided into two phases: a computation phase, and an
annotation phase. During computation, expressed sequence tags (ESTs) and proteins are aligned
to the genome and ab initio gene predictions are carried out. Ab initio predictors use statistical
and computational models rather than external evidence to identify genes and other elements
such as promoters, splice sites, introns, extrons, and translation initiation and termination sites.

While ab initio predictors provide a fast and easy way to identify genes, needing no external
evidence, they do not predict UTRs or alternatively spliced transcripts (Benson, 2005). Long
introns (Salamov, 2000), frame shift errors (Mathe, 2002), can also be unmanageable.
Additionally, since this approach uses genomic traits specific to organisms (e.g Caenorhabditis
elegans, D. melanogaster, Arabidopsis thaliana, humans and mice), it is not as accurate in its
predictions if the genome that is being annotated isn’t closely related to these organisms
(Yandell, 2012). However, some ab initio tools such as AUGUSTUS, FGENESH, and SNAP
(Table 6), can use external evidence to make more accurate predictions ((Brejova, 2005; Guigo,
2006; Stanke 2006; Leong, 2015; Salamov, 2000).
Following computation, gene annotations may be generated through programs called annotation
pipelines, which today, primarily annotate protein-coding genes. After repeats have been
identified using homology based and de novo tools, and a repeat library has been created, repeat
masking is completed as a means for protecting gene annotations from false evidence. Next,
annotation pipelines add UTRs and align identified protein sequences to the genome assembly.
Once filtering and clustering have been initiated, similar sequences (determined by BLAST) are
realigned with the target genome for greater exon precision (Yandell, 2012).
Examples of notable pipelines include MAKER-P, PASA, and others found within table 6.
Recently developed, MAKER-P has been shown to be relatively fast, easy to use, and highly
accurate (Law, 2015). New features include better parallelization for large-repeat plant genomes
and annotation of non-coding and pseudogene regions (Campbell, 2014). PASA also reports
similarly high accuracy predictions within the literature (Kwan, 2009). Though a full run by an
annotation pipeline may take longer than an ab initio gene finder, it can serve as a starting point
for annotation curation and further analyses downstream (Yandell, 2012).
Using data from predictions and alignment, several approaches exist today for automated
annotation of a genome, three of which I summarize here and in Table 6. The simplest approach,
discussed above, runs a single ab initio predictor. Though these work quickly and are easy to use,
they do not guarantee accuracy, predict UTRs, or generate a complete gene model (Yandell,
2012). However, when paired with external evidence, predictors such AUGUSTUS, FGENESH,
and SNAP can predict with relatively high accuracy in smaller genomes (Brejova, 2005; Guigo,
2006; Stanke 2006; Leong, 2015; Salamov, 2000).
A second approach that may be used is also rather simple and involves running a series of ab
initio gene finders on the genome and then selecting single prediction using a chooser algorithm.
Examples include JIGSAW, EVidenceModeler (EVM), and Evigan, the latter two retaining the
ability to automatically select the best set of exons in order to minimize error (Haas, 2008; Liu,
2008). This approach produces median accuracy, speed, and gene model completion (Yandell,
2012).
The most common approach however, is to feed the alignment evidence to gene predictors at run
time for greater accuracy and then use a chooser to determine the best prediction. Annotation
pipelines may then add UTRs to these predictions. Though time consuming and laborious, a full
run by an annotation pipeline offers the greatest accuracy and the most complete gene model.
MAKER and PASA represent just two of the many annotation pipelines in existence today.

Newer versions of these pipelines have been developed for improved speed, accuracy, and ease
of use (Law, 2015).
Method of
Annotation

Examples

1) Predict

AUGUSTUS,
SNAP,
FGENESH

2) Predict
and choose

3) Full scale
annotation
pipeline

How it Works
Run a single ab
initio predictor

Advantages

Disadvantages

Fast, requires
little effort (no
external
evidence)

Less accurate,
less complete
gene model

JIGSAW, EVM, Run a series of ab
Evigan
initio predictors,
then use chooser
algorithm

Average
accuracy, speed,
completion of
gene model

Average
accuracy, speed,
completion of
gene model

MAKER,
PASA,
Gnomon, NCBI

Highly accurate,
more complete
gene model

Time
consuming,
labor intensive

Feed alignments to
gene predictors at
run time, use
chooser, pipeline
adds UTRs

Table 6: Different Approaches for Genome Analysis and respective Pros and Cons.
Conclusions:
The field of plant genomics is constantly changing, as an increasing number of genomes become
sequenced and re-sequenced with newer technologies. In order to keep up with these advances
the increasing availability of data, researchers going forward, will need to place a greater
emphasis on comparative analyses, as these can reveal much about the evolution of gene
structure and function across different plant species and resolve pressing evo-devo questions that
have puzzled plant evolutionary biologists for years.
References:
1. Arabidopsis Genome Initiative (AGI) (2000) Analysis of the genome sequence of the
flowering plant Arabidopsis thaliana. Nature, 2000 Dec 14 408 (6814):796-815
2. Amborella Genome Project et al. (2013) The Amborella genome and the evolution of
flowering plants. Science. 2013 Dec 20 342(6165): 1241089.
3. Banks et al. (2011) The Selaginella genome identifies genetic changes associated with the
evolution of vascular plants. Science. 2011 May 20 332(6032):960-963. Epub 2011 May.
4. Bateman, R. M., and W. A. DiMichele. 1994. Heterospory: the most iterative key
innovation in the evolutionary history of the plant kingdom. Biol. Rev. 69:345–417
5. Becker, A. "The Major Clades of MADS-box Genes and Their Role in the Development
and Evolution of Flowering Plants." Molecular Phylogenetics and Evolution 29.3 (2003):

464-89. Web.
6. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. (2005)
GenBank. Nucleic Acids Res. 33, D34–D38.
7. Brejová B. PhD Thesis. Canada: University of Waterloo; 2005. Evidence combination in
hidden Markov models for gene prediction.
8. Campbell MS, Law M, Holt C, Stein JC, Moghe GD, Hufnagel DE, Lei J,
Achawanantakun R, Jiao D, Lawrence CJ, et al. (2014) MAKER-P: a tool kit for the
rapid creation, management, and quality control of plant genome annotations. Plant
Physiol 164: 513–524
9. E.R Alvarez-Buylla, S.J Liljegren, S Pelaz, S.E Gold, C Burgeff, G.S Ditta, F VergaraSilva, M.F Yanofsky. MADS-box gene evolution beyond flowers: expression in pollen,
endosperm, guard cells, roots and trichomes. Plant J., 24 (2000), pp. 457–466
10. Gao, Jie, Xue Yang, Wei Zhao, Tiange Lang, and Tore Samuelsson. "Evolution,
Diversification, and Expression of KNOX Proteins in Plants." Frontiers in Plant Science
Front. Plant Sci. 6 (2015): n. pag. Web.
11. "Genome Information by Organism." National Center for Biotechnology Information.
U.S. National Library of Medicine, 2015. Web. 07 Nov. 2015.
12. Goodstein, David M. et al. “Phytozome: A Comparative Platform for Green Plant
Genomics.” Nucleic Acids Research 40.Database issue (2012): D1178–D1186. PMC.
Web. 3 Dec. 2015.
13. Guigó R, et al. EGASP: the human ENCODE Genome Annotation Assessment Project.
BMC Genome Biol 2006;7(Suppl. 1):S2.1-31.
14. Haas, Brian J., Steven L. Salzberg, Wei Zhu, Mihaela Pertea, Jonathan E. Allen, Joshua
Orvis, Owen White, C. Robin Buell, and Jennifer R. Wortman. "Automated Eukaryotic
Gene Structure Annotation Using EVidenceModeler and the Program to Assemble
Spliced Alignments." Genome Biology (2008). BioMed Central. Web. 20 Nov. 2015.
15. Hall BK. Evolutionary developmental biology (Evo-Devo): Past, present, and future.
Evolution: Education and Outreach. 2012;5:184–193.
16. Hay, A., and M. Tsiantis. "KNOX Genes: Versatile Regulators of Plant Development and
Diversity." Development 137.19 (2010): 3153-165. Web.
17. Henschel, K., R. Kofuji, M. Hasebe, H. Saedler, T. Münster, and G. Theissen. 2002. Two
ancient classes of MIKC-type MADS-box genes are present in the moss Physcomitrella
patens. Mol. Biol. Evol. 19:801-814.
18. Huang H, Tudor M, Weiss C.A, Hu Y, Ma H. The Arabidopsis MADS-box gene AGL3 is
widely expressed and encodes a sequence-specific DNA-binding protein. Plant Mol.
Biol., 28 (1995), pp. 549–567
19. Klein J., Saedler H., Huijser P. (1996). A new family of DNA binding proteins includes
putative transcriptional regulators of the Antirrhinum majus floral meristem identity gene
SQUAMOSA. Mol. Genet. Genomics 250, 7–16.
20. Kwan, Alan L et al. “Improving Gene-Finding in Chlamydomonas
reinhardtii:GreenGenie2.” BMC Genomics 10 (2009): 210. PMC. Web. 20 Nov. 2015.
21. Law, MeiYee et al. “Automated Update, Revision, and Quality Control of the Maize
Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and
Identifies New Genes.” Plant Physiology 167.1 (2015): 25–39. PMC. Web. 20 Nov. 2015.
22. Lee, Insuk. “A Showcase of Future Plant Biology: Moving towards next-Generation Plant

Genetics Assisted by Genome Sequencing and Systems Biology.” Genome Biology 15.5
(2014): 305. PMC. Web. 19 Nov. 2015.
23. Leong, Ivone US et al. “Assessment of the Predictive Accuracy of Five in Silico
Prediction Tools, Alone or in Combination, and Two Metaservers to Classify Long QT
Syndrome Gene Mutations.” BMC Medical Genetics 16 (2015): 34. PMC. Web. 20 Nov.
2015.
24. Liu, Q., A. J. Mackey, D. S. Roos, and F. C. N. Pereira. "Evigan: A Hidden Variable
Model for Integrating Gene Evidence for Eukaryotic Gene Prediction." Bioinformatics
24.5 (2008): 597-605. Web.
25. Ma, H. Yanofsky, M.F, Meyerowitz, E.M. AGL1-AGL6, an Arabidopsis gene family with
similarity to floral homeotic and transcription factor genes. Genes Dev., 5 (1991), pp.
484–495
26. Mathe, C., Sagot, M.F., Schiex, T., and Rouze, P. (2002) Current methods of gene
prediction, their strengths and weaknesses. Nucleic Acids Res. 30, 4103–4117.
27. Merchant et al. (2007) The Chlamydomonas genome reveals the evolution of key animal
and plant functions. Science. 2007 Oct 12 318(5848):245-250.
28. Michael, Todd P., and Scott Jackson. "The First 50 Plant Genomes." The Plant Genome
6.2 (2013): 0. Crop Science Society of America. 11 Oct. 2015.
29. Michaels, S. D.& Amasino, R. M. FLOWERING LOCUS C encodes a novel MADS
domain protein that acts as a repressor of flowering. Plant Cell 11, 949– 956 (1999).
30. Mukherjee, K., L. Brocchieri, and T. R. Burglin. "A Comprehensive Classification and
Evolutionary Analysis of Plant Homeobox Genes." Molecular Biology and Evolution
26.12 (2009): 2775-794. Web.
31. Münster T., J. Pahnke, A. Di Rosa, J. T. Kim, W. Martin, H. Saedler, G. Theißen, 1997
Floral homeotic genes were recruited from homologous MADS-box genes preexisting in
the common ancestor of ferns and seed plants Proc. Natl. Acad. Sci. USA 94:2415-2420
32. Nam, J. "Antiquity and Evolution of the MADS-Box Gene Family Controlling Flower
Development in Plants." Molecular Biology and Evolution 20.9 (2003): 1435-447. Web. 2
Dec. 2015.
33. Neubauer, Raymond L. "Voyages into Homeostasis." Evolution and the Emergent Self:
The Rise of Complexity and Behavioral Versatility in Nature. New York: Columbia UP,
2012. 29. Print.
34. Nystedt et al. (2013) The Norway spruce genome sequence and conifer genome
evolution. Nature. 2013 May 30 497(7451): 579-584. Epub 2013 May 22.
35. Parr enicová, Lucie et al. “Molecular and Phylogenetic Analyses of the Complete MADSBox Transcription Factor Family in Arabidopsis: New Openings to the MADS World.”
The Plant Cell 15.7 (2003): 1538–1551. PMC. Web. 2 Dec. 2015.
36. "PlantTFDB - Plant Transcription Factor Databases @ CBI, PKU." PlantTFDB - Plant
Transcription Factor Databases @ CBI, PKU. Center for Bioinformatics, Peking
University, China, n.d. Web. 16 Nov. 2015.
37. Rahman, Mehboob-ur. "Comparative Genomics in Crop Plants." Springer. N.p., 12 Oct.
2009. Web. 04 Dec. 2015.

38. Rensing et al. (2007) The Physcomitrella genome reveals evolutionary insights into the
conquest of land by plants. Science. 2008 Jan 4 319(5859):54-69. Epub 2007 Dec 13.
39. Rounsley, S.D, Ditta, G.S, Yanofsky, M.F. Diverse roles for MADS box genes in
Arabidopsis development. Plant Cell, 7 (1995), pp. 1259–1269
40. Ruelens, Philip, Ruud A. De Maagd, Sebastian Proost, Günter Theißen, Koen Geuten,
and Kerstin Kaufmann. "FLOWERING LOCUS C in Monocots and the Tandem Origin
of Angiosperm-specific MADS-box Genes." Nature Communications Nat Comms 4
(2013): n. pag. Web.
41. Salamov, A.A. and Solovyev, V.V. (2000) Ab initio gene finding in Drosophila genomic
DNA. Genome Res. 10, 516–522.
42. Schiefthaler, Ursula et al. “Molecular Analysis of NOZZLE, a Gene Involved in Pattern
Formation and Early Sporogenesis during Sex Organ Development in Arabidopsis
Thaliana.” Proceedings of the National Academy of Sciences of the United States of
America 96.20 (1999): 11664–11669. Print.
43. Soltis DE, Ma H, Frohlich MW, Soltis PS, Albert VA, et al. (2007) The floral genome: an
evolutionary history of gene duplication and shifting patterns of gene expression. Trends
Plant Sci 12:358–367
44. Somers, Daryl J., Peter Langridge, and J. P. Gustafson. Plant Genomics: Methods and
Protocols. Vol. 513. New York: Humana, 2009. Print.
45. Stanke, Mario et al. “Gene Prediction in Eukaryotes with a Generalized Hidden Markov
Model That Uses Hints from External Sources.” BMC Bioinformatics 7 (2006): 62. PMC.
Web. 20 Nov. 2015.
46. Stone J. M., Liang X., Nekl E. R., Stiers J. J. (2005). Arabidopsis AtSPL14, a plantspecific SBP-domain transcription factor, participates in plant development and
sensitivity to fumonisin B1. Plant J. 41, 744–754.
47. Usadel Lab. "Published Plant Genomes." Published Plant Genomes. Institute of Bio- and
Geosciences (IGB), Forschungszentrum Jülich, 2015. Web. 20 Nov. 2015
48. Tandre K, Albert VA, Sundas A, Engström P. Conifer homologues to genes that control
floral development in angiosperms. Plant Mol Biol. 1995;27:69–78.
49. Theißen, G. Shattering developments. Nature, 404 (2000), pp. 711–713
50. Tsuda K., Hake S. (2015). Diverse functions of KNOX transcription factors in the diploid
body plan of plants. Curr. Opin. Plant Biol. 27, 91–96.
51. Wu, G. "Temporal Regulation of Shoot Development in Arabidopsis Thaliana by MiR156
and Its Target SPL3." Development 133.18 (2006): 3539-547. Web
52. Yandell M, Ence D. (2012) A beginner’s guide to eukaryotic genome annotation. Nat Rev
Genet 13: 329–342