Professional Documents
Culture Documents
ferent organisms, despite the fact that NGS had several sirable traits for agricultural production would undoubt-
drawbacks such as large genome size, high CG content, edly have a great impact on the economy of the region.
emergence of homopolymers etc. Different strategies This paper focuses on comparing and contrasting
for overcoming these obstacles have been eventually three commercial technologies most commonly applied
developed, various sequencing methods introduced or for WGS of plants, namely the Roche/454, ABI/SOLiD,
further refined, creating a scientific trend that continues and Solexa/Illumina, and presenting the applications ar-
as we speak. These high-throughput sequencing meth- eas of each in terms of WGS of plant genomes in light of
ods fostered numerous valuable discoveries in fields of genotypization of immortelle plant.
molecular and evolutionary biology, medicine, forensics,
agriculture etc. When it comes to application of NGS to 2. MATERIALS AND METHODS
plant species, the broadest and most prominent is whole This article has descriptive character and present sys-
genome sequencing (WGS) in quest for revealing the full tematic review of literature focusing on sequencing tech-
sequence of plant genomes, their genetic make-up as nologies utilized in sequencing of plant genomes. For the
well as genetic background of desirable traits in agricul- purpose of this study, we examined the published draft
tural production. WGS is especially useful for producing genomes of plants. Since publicly available data only
draft genomes of plants that are sequenced for the first cover researches published up to year 2014, any more
time (3). The researchers are predominantly interested recent research was not included in the study. Criteria
in staple cereal, vegetable and fruit species, for obvi- for paper selection were: a) the paper must introduce a
ous economic gain such research may potentially bring. draft genome of a plant species; b) the paper must be
However, many species of interest have incredibly large in English; c) sequencing method utilized is disclosed in
and complex genomes which make de novo sequencing a freely available form; d) sequencing method is either
of the whole genome labor-intensive and occasionally Roche/454, ABI/SOLiD, Illumina/Solexa or any combi-
completely impracticable. Due to this, strategies such as nation of those (combinations with traditional Sanger
sequencing of only one or several chromosomes of inter- method were also included); e) the results must indicate
est, transcriptome sequencing, or exome sequencing are the size of obtained draft genome.
some of the alternatives researchers embrace in quest of
Platform Roche/454 ABI/SOLiD Solexa/Illumina
specific plant genome data (4). Nevertheless, currently
Sequencing Sequencing Sequencing by
there is several dozens of published draft plant genomes, Pyrosequencing
mechanism by ligation synthesis
including a number of plants of great agricultural and in-
Read length 700 bp 75 bp 2x 101 bp
dustrial significance (Table 1).
Accuracy 99.9% 99.94% 98%
Genome Refer- Reads 1M 1.2-1.4 G 3G
Common name Species Year
size (Mb) ence Output data 0.7 Gb 120 Gb 600 Gb
Bread wheat Triticum aestivum 17,000 2014 [28] Run time 24 h 14 d 3-10 d
Canola Brassica rapa 516 2011 [52] Read length and
Cassava Manihot esculenta 770 2012 [31] Major advantage Accuracy High throughput
speed
Cotton Gossypium raimondii 880 2012 [26] Major disadvan- Short read Short read
Error rate
Flax Linum usitatissimum 373 2012 [22] tage assembly assembly
Potato Solanum tuberosum 844 2011 [21] Table 2. Technical properties of the three sequencing platforms
Sugar beet Beta vulgaris 758 2014 [33]
Tobacco Nicotiana tabacum 4,500 2014 [47] Platform Roche/454 ABI/SOLiD Solexa/Illumina
Table 1. Published Draft Genomes of several Agriculturally and Instrument Instrument Instrument
Industrially Significant Plants Instrument 500,000 USD, 495,000 USD, 690,000 USD,
Apart from direct use of various cultivars for produc- price 7000 USD per 15,000/100 Gb 6000 USD/(30x)
tion of food and beverages, animal feed, fabric, ropes run USD human genome
etc., pharmaceutics and cosmetics are significant areas Automation
of industrial production that involve numerous plants of library Available Available Available
preparation
whose genomics have not been yet elucidated. One such
Cost/million
plant is immortelle (Helichrysium arenarium), a peren- 10 USD 0.13 USD 0.07 USD
bases
nial plant widely spread on the Adriatic coast of Croatia,
Bosnian and Herzegovina and Montenegro (5, 6, 7). Due Table 3. Cost of the three sequencing platforms
to high market value of essential oil of immortelle, and
low investments required for its production, agricultur- Applying these criteria, 45 papers (9-53) were selected
al production of this plant has expanded rapidly in the (with 47 presented plant genome draft sequences), and
aforementioned regions, especially in in-land Herzegov- frequency of application of each NGS platform deter-
ina, where producers were able to achieve average an- mined. The papers and corresponding plant genomes are
nual revenues of 19,715.58 BAM (approximately 10,000 listed in Appendix 1. The frequencies were subsequently
EUR) per 1 ha of immortelle (8). Elucidating the genome brought in correlation with technical characteristics and
of this plant in quest of better understanding of its de- cost of sequencing, in order to deduce the best approach
to sequencing immortelle (Helichrysium arenarium). For
3. RESULTS
Papers included in the study were firstly examined
in terms of utilized NGS platform (or combination of
platforms, including combining with traditional Sanger
method), as shown in Table 4 and Figure 1 below.
Percentage
Number Number of
Platform of total number
of papers genomes
of genomes
Illumina/Solexa 18 19 40.42%
Combination of Sanger
sequencing, Roche/454 and 10 10 21.28%
Illumina/Solexa
Combination of Sanger
sequencing and Illumina/ 5 6 12.77%
Solexa
Combination of Sanger
5 5 10.64%
sequencing and Roche/454
Percentage
Combination of Roche/454 Number of
5 5Number of 10.64%
of total
and Illumina/Solexa
Platform
papers genomes number of
Combination of Roche/454, genomes
ABI/SOLiD and Illumina/ 18 2
Illumina/Solexa 219 40.42%
4.25%
Combination
Solexa of Sanger
sequencing, Roche/454 and 10 10 21.28%
Table 4. Frequency of used NGS platforms among draft genomes of
Illumina/Solexa
Combination of Sanger
plants published until 2014
sequencing and 5 6 12.77%
Illumina/Solexa is by far most frequently used plat-
Illumina/Solexa Table 5. Sizes of plant genomes published using different NGS
Combination of Sanger
form (with different 5models of the 5 technology)10.64%either platforms (or combinations of platforms)
sequencing and Roche/454
as a sole NGS
Combination platform used (in 19 studies, or 40.42%),
of Roche/454 ABI/SOLiD platforms. ABI/SOLiD was only used in two
5 5 10.64%
or in
and combination with other platforms and techniques,
Illumina/Solexa publications, in combination with Roche/454 and Illumi-
Combination of Roche/454,
such as Roche/454, ABI/SOLiD
ABI/SOLiD and 2
and
2
Sanger sequencing
4.25%
na/Solexa, and has by far the lowest overall inclusion in
(for a total of 42 draft plant genomes published using Il-
Illumina/Solexa plant genome sequencing projects, 4.25%.
lumina/Solexa platform to some extent). Roche/454 has The results demonstrate that the combining of dif-
been utilized in combination with Sanger sequencing, ferent platforms is a very common practice, as 59.58%
of published studies where the sequence was obtained
through combining several (up to three) different se-
quencing approaches. Combining the platforms appears
Figure 2 clearly demonstrates that the size of the genome, although an important
Figure 2), being the most frequently used, is also used on the broadest range of genome
to be due to the practical and infrastructural reasons, researchers that do not have a sequencer at disposal in
and employed on wide range of genome sizes, however, institutions they are affiliated with.
it is undoubtedly helpful in mitigating some of the set-
backs that each platform on its own has. Although the 4. DISCUSSION
focus of this study is on the NGS platforms, it is import- Examination of technical properties can be helpful for
ant to point out that we can still find a significant portion proper selection of a sequencing platform. Since longer
of sequencing projects that involve traditional Sanger reads are preferable for accurate assembling and for in-
sequencing approach. We have found that a total of 21 terpreting repetitive sequences, such as those of many
studies utilize Sanger sequencing (and always in com- plants of interest, the Sanger method would be the
bination with NGS platforms). Accordingly, 44.69% of most suitable. However, it is avoided due to high cost,
published draft genomes of plant have been elucidated time and labor requirements. Sanger sequencing is of-
using Sanger sequencing to some extent. ten used in combination with NGS platforms, for library
Genome size is another parameter taken into consid- sequencing, sequencing of genome portions that are im-
eration when selecting sequencing techniques for a re- properly sequenced by NGS, subsequent proofreading of
search. Table 5 and Figure 2 present the most frequently certain genome portions etc. Turktas, et al. suggest that
used NGS platforms (or platform combinations) in terms Roche/454 technology, offering the longest read-length
of size of the genomes deduced through their usage. capacity and highest speed among NGS platforms, ap-
Figure 2 clearly demonstrates that the size of the ge- pears as the method of choice for plant WGS without
nome, although an important consideration when as- considering the total sequencing cost (3). This study
sembling the sequence (54), is not effective to the choice shows, however, that this is not the case, and that Illu-
of the sequencing platform by researchers. Illumina/Sol- mina/Solexa platforms are by far most preferred by re-
exa platform (marked dark blue in Figure 2), being the searchers, either on their own, or in combination with
most frequently used, is also used on the broadest range other platforms. In fact, Roche/454 has been used only
of genome sizes, from as low as 200-299 megabases to in combination with Sanger sequencing on 5 projects,
massive 17,000-megabase genome of bread wheat. Next and never as a sole sequencing tool, presumably due to
in the line of broadness of application in terms of the size the high cost. Another indicator of the cost being the key
of sequenced genomes is the combination of tradition- consideration is the fact that researchers favored Illumi-
al Sanger sequencing, Roche/454 and Illumina/Solexa, na/Solexa platforms in a widest range of genome sizes
with the range of 0-99 to 1000-1999 Mb. Overall, no par- (Table 5, Figure 2). The failure rates of Illumina/Solexa
ticular trend can be detected when comparing the size of platforms are generally compensated for by their deep
the genome and sequencing platform researchers opt for. coverage, albeit it is not sufficient for avoiding gap gen-
Since genome size of Helichrysium arenarium has not eration when repetitive sequence is longer than the read
been reported yet in the literature as of the end of 2016, length. Shatz, et al. suggest using paired-end sequenc-
we will rely on the 2014 study of several other species ing (54), and Illumina/Solexa platforms are all capable
from the Helichrysium genus, published by Azizi et al. of performing it. This may be another indicator why re-
Since species displayed various degrees of polyploidy, searchers favor Illumina/Solexa to the extent presented
genome size range for genus Helichrysium can be, based in this study. Additionally, it has been reported on sev-
on the available data, confined to roughly 8,000-18,000 eral occasions that in recent years the Illumina sequenc-
Mb. Based on this estimation and the results of our study ing platform has been the most successful platform in
of the NGS platform usage in plant genome sequencing, terms of market share and widespreadness, to the point
it is clear that Illumina/Solexa is the most viable choice, of near monopoly, which only adds to the fact that it is
especially if we consider the financial weight of such the most popular choice of scientists performing the
sequencing project. If we refer back to the Table 3, the plant genome sequencing (55, 56). Although Azizi et al.
calculated price range for the estimated Helichrysium reported genome sizes of several species of genus Heli-
arenarium genome size range is 560-1,260 USD for a chrysium (56), there are no exact data regarding the size
single run using Illumina/Solexa platform, 1,040-2,340 of H. arenarium genome. Hence, we relied on estimated
USD using ABI/SOLiD and 80,000-180,000 USD using size range when calculating the estimated price range of
Roche/454. However, additional run costs including the a single sequencing run using Illumina/Solexa platform.
necessary chemicals, utensils, labor etc. must be taken Clearly, karyotype and genome size analysis for H. are-
into consideration. Although the results of this paper narium would provide valuable information prior to the
also indicate that combining of different platforms is a sequencing and reduce the risk of project failure due to
widespread practice, making aforementioned price es- financial reasons.
timations somewhat incomplete, it does not take away
from the fact that sole usage of Illumina/Solexa is by far 5. CONCLUSION
the most affordable approach. Another fact that adds Among 47 published draft plant genomes, 19 were
to such a conclusion is that Illumina/Solexa is the most obtained through sole usage of Illuina/Solexa platforms
widespread sequencing platform, meaning that the ma- (40.42%), while additional 23 sequences were obtained
jority of research institutions most likely possess a device through combining Illumina/Solexa with other plat-
manufactured by this company. Additionally, this makes forms and techniques (48.64%). Usage of Illumina/Solexa
Illumina/Solexa platforms most readily reachable to the platforms also encompasses nearly entire range of sizes