You are on page 1of 13

September 2014 Vol. 21 No.

3 84-96 ScienceDirect
Journal of Northeast Agricultural University (English Edition) Available online at www.sciencedirect.com

High-throughput Sequencing Technology and Its Application

Zhu Qiang-long, Liu Shi, Gao Peng, and Luan Fei-shi*

College of Horticulture, Northeast Agricultural University, Harbin 150030, China

Abstract: Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary
technological innovation in gene sequencing researches. This technology is characterized by low cost and high-throughput data.
Currently, high-throughput sequencing technology has been widely applied in multi-level researches on genomics, transcriptomics and
epigenomics. And it has fundamentally changed the way we approach problems in basic and translational researches and created many
new possibilities. This paper presented a general description of high-throughput sequencing technology and a comprehensive review
of its application with plain, concisely and precisely. In order to help researchers finish their work faster and better, promote science
amateurs and understand it easier and better.
Key words: high-throughput sequencing, data analysis, genome sequence, transcriptome sequence, bioinformatics
CLC number: S6 Document code: A Article ID: 1006-8104(2014)-03-0084-13

become one of the most important methods of


Introduction molecular analysis (Sanger et al., 1992). This
technology provides important data for basic biology
With the indepth study of life sciences and further study, such as disclosure of genetic information and
development of bio-technology, more and more regulation of gene expressions. With the appearence
scientists recognize that the whole genome sequencing of Roche's 454 technology (2005), Illumina's Solexa
of a species will be the fundamental basis and technology (2006) and ABI's SOLiD technology
important clue to help them reveal the nature of life (2007), high-throughput sequencing technology
of the species. The discovery of DNA double helix has got enormous developments, thus amounts of
(Watson and Crick, 1953), cracking genetic code genetic information is successively revealed, which
(Nirenberg et al., 1966), and the successful completion allow us to explore the essence of life in detail, to
of the first one complete genome map (Sanger et al., uncover the huge diversity of novel genes that are
1977) have undoubtedly become a series of important currently inaccessible, to understand nucleic acid
journey milestones in the history of life scientific therapeutics, to better integrate biological information
development, and make more scientists profoundly for a complete picture of health and disease at a
recognize that sequencing technology plays an personalized level and to move to advance that we can
important role in life science researches. The rapid not yet imagine (Kahvejian et al., 2008). Therefore, a
sequencing technology would make DNA sequencing number of bioinformatics methods and softwares have

Received 29 October 2013


Supported by the National Natural Science Foundations of China (31272186; 31301791)
Zhu Qiang-long (1989-), male, Master, engaged in the research of bioinformatics and watermelon molecular breeding. E-mail: longzhu2011@126.com
* Corresponding author. Luan Fei-shi, professor, supervisor of Ph. D students, engaged in the research of watermelon and melon molecular breeding.
E-mail: luanfeishi@sina.com

E-mail: xuebaoenglish@neau.edu.cn
Zhu Qiang-long et al. High-throughput Sequencing Technology and Its Application ·85·
been created to accelerate high-throughput sequencing (Maxam and Gilbert, 1977; Sanger et al., 1977), their
technology to be widely applied in aspects of geno- inventions first opened a door to study the genetic
mics researches on genomics, transcriptomics and code of life deeply for researchers, and brought
epigenetics. High-throughput sequencing technology hope to the development of faster and more efficient
has fundamentally changed the way we approached sequencing technology. Sanger method belongs to
problems in basic and translational researches and dideoxy chain termination method, while Gilbert
created many new possibilities. Whereas, it has also method is chemical degradation method. The former
brought new challenges for bioinformatics: how to is more convenient and more suitable for optical
effectively process and analyze these massive data and automatic detection gradually replaced the latter, and
extract valuable bio-information form it, which have became the most widely applied method of sequencing
become an important key to decide if high-throughput in the field of life science. Thus, Sanger won the 1980
sequencing technology plays a major role in the Nobel Prize in chemistry (Sanger, 1988). Most of the
scientific exploration. In this article, we intended to automated DNA sequencers are based on this method.
present a comprehensive and systematic introduction Its principle is as below, when a nucleic acid template
of high-throughput sequencing technology and its is replicating under the presence of DNA polymerase,
applications to the enthusiast of biological science with a pair of primers, four types of single deoxynucleotide
plain, concisely and precisely hope to help researchers triphosphate (dNTP, one of them labeled with a
finish their work faster and better, to promote science radioactive 32P), join four kinds of dideoxynucleotide
amateurs understand it easier and better. Meanwhile, triphosphate (ddNTP) into four reactive systems in
we tried to take data generated from Illumina Hiseq proportion, because dideoxynucleotide have no 3'-OH,
2000 sequencing platform as an example to present a so long as the dideoxynucleotide append to the end of
more complete description of the basic procedure, key the chain, its extension is stopped, if the single deo-
methods and existing software of the sequencing data xynucleotide triphosphate append to the end of
generating process, data processing and analysis. the chain, it can continue to be extended. So that a
series of the nucleic acid fragments with the dideoxy
History of High-throughput Se- nucleotide at the 3' end in different length ranges
quencing Technology Development will be synthesized in each reaction system. After
termination of the reaction, different lengths of
High-throughput sequencing technology is the second nucleic acid fragments should be isolated by gel
generation sequencing technology launched by electrophoresis in four lanes, where there is a differ-
Roche/454 Company, Illumina/Solexa Company and ence of one nucleotide among near segments. After
ABI/SOLiD Company based on Sanger sequencing autoradiography, the order of base in synthetic
and single-molecule sequencing technologies an- fragment can be read, according to the dideoxy nucleo-
TM
nounced by Helicos Heliscope and Pacific Bio- sides at the 3' end of the fragment (Xie et al., 2010).
sciences, which is also called as deep sequencing Subsequently, a variety of DNA sequencing tech-
technology (Sultan et al., 2008) or the next-generation nologies based on this technology has been exploited,
sequencing technology (NGS) (Schuster, 2008) . the most important one of them is fluorescent
automated sequencing technology (Fig. 1) (http://
The 1st generation sequencing technology en.wikipedia.org/wiki/File:Sanger-sequencing.svg).
In 1977, Sanger of Cambridge and Gilbert of Harvard This generation sequencing technology has played a
almost simultaneously published their different key role in human genome project, accelerating the
methods of DNA sequencing in the same magazine completion of human genome project. The sequencer

http: //publish.neau.edu.cn
·86· Journal of Northeast Agricultural University (English Edition) Vol. 21 No. 3 2014

using this technology still be used in today, which will and Short Tandem Repeat (STR) genotyping (Zhou
continue to play an important role, because it has an et al., 2010). The dependence on electrophoretic
obvious advantage in the original data quality and read separation, however, makes it difficult to further
length, and has been used widely in different fields, enhance the speed of analysis and to reduce sequencing
especially in PCR products sequencing, plasmids and cost by mini-aturization. Therefore, developing new
bacterial artificial chromosomes terminal sequencing, technologies to break these limitations is needed.

① Reaction mixture
Primer and DNA template
DNA polymerase
ddNTPs with flourochromes dNTPs (dATP, dCTP, dGTP, and dTTP)

Primer
5' 3'

3' 5'
Template
ddNTPs
ddTTP
ddCTP ③ Capillary gel electrophoresis
ddATP
ddGTP separation of DNA fragments
② Primer elongation Capillary gel
and chain termination

5' 3' Laser Detector

5' 3'

5' 3'

5' 3'

5' 3'

5' 3'
④ Laser detection of flourochromes
5' 3' and computation sequence analysis
5' 3'

5' 3'
Chromatograph

Fig. 1 Sanger (chain-termination) method for DNA sequencing


A primer is annealed to a sequence; reagents are added to the primer and template, including DNA polymerase, dNTPs, and a small amount of all
the four dideoxynucleotides (ddNTPs) labeled with fluorophores. During primer elongation, the random insertion of a ddNTP instead of a dNTP
terminates synthesis of the chain because DNA polymerase cannot react with the missing hydroxyl. This produces all the possible lengths of chains;
the products are separated on a single lane capillary gel, where the resulting bands are read by a imaging system; this produces several hundred
thousand nucleotides a day, data which require storage and subsequent computational analysis.

of polymerase synthesis (2006) and ABI/SOLiD


The second generation sequencing tech- sequencing ligase (2007) technology. Compared with
nology Sanger sequencing, the common prominent feature in
Compared with the Sanger sequencing method, the three kinds of next-generation sequencing technologies
second generation sequencing technology is also called is that they could output massive data in a single
as next-generation sequencing technology. The second run, thus they are also known as high-throughput
generation sequencing technology are mainly classified sequencing technologies (Ansorge, 2009). And their
into three major sequencing techniques: Roche/454 core idea is sequencing-by-synthesis. When generating
pyrosequencing (2005), Illumina/Solexa sequencing a new complementary strand of cDNA, they either

E-mail: xuebaoenglish@neau.edu.cn
Zhu Qiang-long et al. High-throughput Sequencing Technology and Its Application ·87·
added normal dNTP through enzymatic cascade molecule sequencing technology-single molecule
reaction to catalyze substrates to excite fluorescence real-time technology (SMRT) (Eid et al., 2009). The
(Roche/454), or directly into the fluorescently labeled sequencing technology take full advantages of DNA
dNTP (Illumina/Solexa) or semi-degenerate primers polymerase, which can be vividly described as a
(ABI/SOLiD), then when generating or connecting to real-time observation on DNA polymerase through
synthesize the complementary chain, the substrates the microscope. In a word, it records the entire
will release fluorescent signal. By capturing the process of DNA synthesis. Nanopore sequencing
optical signal and converting to a sequencing peak, it technique (Rusk, 2009) was to use the subtle changes
can be converted again to the sequence information of of electrostatic induction caused by different bases
complementary strand. High-throughput sequencing passing the nanopore to identify the types of the base
technology achieved massively parallel sequencing signal. Meanwhile, it could detect some important
(MPSS), so the cost of getting a base data declined information, for example, whether a base was being
lower than Sanger method, and it has been applied in methylation or not.
multi-level researches on medical science, agriculture
science and life science (Aksyonov et al., 2006). Main Methods and Steps of High-
throughput Data Analyses
The third generation sequencing technology
Although the second generation sequencing techno- The data generated from Illumina Hiseq 2000 (Fig. 2)
logy, compared with the first generation sequencing (http://bitesizebio.com/13546/sequencing-by-syn-
technology, has greatly improved and been more thesis-explaining-the-illumina-sequencing-technology/)
widely used in many aspects, but still built on the sequencing platform was taken to present a more
basis of PCR amplification. In order to reduce devia- complete description of the basic procedure, key
tion and cost caused by PCR amplification, scientists methods and existing software of the sequencing data
are now developing the third generation sequencing generating process, data processing and analysis.
that directly sequence a single molecule of DNA. The
most representative technologies included Heliscope Statistics and filtering of raw sequence data
single-molecule sequencing, single molecule real- Through the base calling, the original image data can
time compositing sequencing, nanopore sequencing be transformed into sequence data, which is called raw
technology. Helicos was a sequencing technology, data or raw reads, which are usually storied in a file
based on total internal reflection microscopy (TIRM)— with the format of fastq and is the most original file
single-molecule sequencing technology. The tech- that users would get, which stores not only the sequen-
nology completely gave up the signal amplifica- ce of reads, but also quality of sequencing reads. Each
tion process of sequencing platform based on PCR read in the fastq file is described by four lines:
amplification, but was still based on sequencing-  \@ WATERMALON: 1:8:6:490
by-synthesis principle (Harris et al., 2008), which  CCACTGTCATGTGAACATCACAGAGACATT
used a new fluorescent analogs and sensitive moni- TCTTGA
toring system that would be directly capable of  +
recording fluorescent form a single nucleotide,  bbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaa_ \ \
thereby overcoming the defect of other methods  Lines 1 and 3 are sequence names generated by
that need to simultaneously test thousands of the the sequencing machines; line 2 is sequence; line 4
same genes to increase the signal intensity. Soon is quality letter, of which each letter corresponds to a
after, PacificBiosciences developed another single base in line 2; we calculate the sequencing quality of

http: //publish.neau.edu.cn
·88· Journal of Northeast Agricultural University (English Edition) Vol. 21 No. 3 2014

each base in line 2 by subtracting 64 from ASCII value helps to assess whether the quality of data meets the
of the letter in line 4 (sequencing quality value). For requirements to analyze. Then, the original data still
example, ASCII value of c is 99, so the corresponding need some basic pre-processing according to the result.
sequencing quality value is 35. Sequencing quality For example, removing reads with adaptor, removing
values range from 2 to 35. After data is outputted, to reads with N ratio greater than 5%, removing low
there should be a statistics on reads obtained from the quality reads (the number of base with Q≤20 is 50%
sample sequencing, the length of reads per sample, the or more of the total number of bases) to obtain clean
number of nucleotides, GC content and so on, which reads for the further analyses.

Genomic DNA

Shear

Select 200-300 bp fragments

Apply to flowcell
Attach adapters to
create sequencing library

Cluster generation by
solid phase PCR
(bridge amplification)

Sequencing by synthesis with reversible terminators

Fig. 2 Sequencing method of Illumina Hiseq 2000


First DNA sample is prepared into a sequencing library by fragmenting into pieces each around 200 bases long. Custom adapters are added to each
end and the library is flowed across a solid surface (the flow cell) and the template fragments bind to its surface. Following up, a solid phase bridge
amplification PCR process (cluster generation) creates approximately one milion copies of each template in tight physical clusters on the flowcell
surface. Finally data result from sequencing by synthesis with reversible terminators.

E-mail: xuebaoenglish@neau.edu.cn
Zhu Qiang-long et al. High-throughput Sequencing Technology and Its Application ·89·
into a long sequence because of the short read (Weber
Data assembly and mapping et al., 2007). In recent years, researchers have designed
It mainly contains re-sequencing with a reference a variety of softwares to solve the problems to make
genome to locate reads and de novo genome sequenc- data from Illumina more suitable for assembly, and
ing assembly without reference. Re-sequencing read achieved good effects of splicing. The technology
paragraphs orientation: it refers to data assembly with that contains three kinds of sequencing platforms has
reference genome. When raw data generated, firstly, been widely applied in a lot of non-model animals
all reads should be sorted by their length, then mapped and plants (Butler et al., 2008). Currently, the most
to the reference genome, and analyzed them through commonly used softwares are: Trinity (Grabherr et al.,
comparing with reference genome to pick out all good- 2011), SOAP denovo, Velvet ( http://www.ebi.ac.uk/-
match reads, which is important for all subsequent zerbino/velvet/), etc.
processing and analysis. Re-assembly has been widely
applied in the model plant with reference genome Identification and functional annotation of
(Birol et al., 2009; Cheung et al., 2006), mostly genes
softwares used to assembly are: BWA (Li and Durbin, Currently, the main principles of gene identifica-
2009), SOAP2 (Li et al., 2009), Bowtie (Langmead et tion and functional annotation are the followings:
al., 2009), MAQ (Li et al., 2008), ZOOM (Lin et al., (1) sequence searching. Its hypothesis: sequence
2008), TopHat and cufflinks (Trapnell et al., 2012) etc. similarity=homology=similar function. (2) Sequence
De novo sequencing assembly: first the sequencing motif. In case of no significant sequence homology, it's
reads will be orderly assembled into contig, then the used to find the local features of sequence. (3) COGs of
contig will be assembled into the scaffold, and use N to proteins. (4) Subcellular localization. It's to predict
fill with the intermediate gap. Finally get the sequence function of gene by predicting subcellular localization.
without N, which cannot be extended at both ends, (5) Structure comparison. First predict unknown
called unigene. And blastx alignment (evalue<0.00001) gene protein structure, then predict its functions
between unigenes and protein databases like Non- through structure comparison. (6) Proteomics. To
redundant (Nr), UniProt Knowledgebase (UniProtKB), predict protein function by the networks of protein
Kyoto Encyclopedia of Genes and Genomes (KEGG) interaction, or other biomolecules networks. Sequence
and Cluster of Orthologous Group (COG) is performed, searching that based on the assumption "homologous
and the best aligning results are used to decide equal functionality similar" is widely used, most of
sequence orientation of unigenes. If results of different websites and softwares for annotating gene function
databases conflict with each other, a priority order of are primarily based on this principle at present. They
Nr, UniProtKB, KEGG and COG should be followed. take full advantages of bioinformatics methods to
When a unigene happens to be unaligned to none of speculate the unknown genes' function, making
the above databases, a software named ESTScan (Iseli the unknown gene sequences (e.g. unigene) search
et al., 1999) will be introduced to decide its sequence against public database, then obtain the highest similar
direction. De novo assembly provides an efficient annotated sequences with the given-query sequences.
way to quickly obtain expressive genes from the short Main annotated nucleotide databases are: GenBank
sequence and no reference sequence of assembly. Due (NCBI), EMBL, DDBJ, etc., protein databases are: Nr,
to longer reads of Roche/454 technology, so it was UNIPROTKB, TrEMBL, COG, etc. Main comparing
more suitable for assembly, while it was considerable software: BLAST, FASTA, etc. There are mainly
difficult for Illumina and SOLID technology in the two ways to annotate the currently gene function:
splicing strategy how to splice the short length reads Gene Ontology (GO) classification and KEGG func-

http: //publish.neau.edu.cn
·90· Journal of Northeast Agricultural University (English Edition) Vol. 21 No. 3 2014

tional classification. GO is an international standardiz- by bioinformatics methods to splice and assemble


ed gene functional classification system which offers a sequence. The technology has a very important
dynamic-updated controlled vocabulary and a strictly significance for a comprehensive understanding of the
defined concept to comprehensively describe pro- molecular evolution of a species, and its gene com-
perties of genes and their products in any organisms. ponent and regulation. The sequencing technology
GO has three ontologies: molecular function, cellular had greatly promoted the whole-genome sequencing
component and biological process, which is appli- work of non-model species, and helped scientists
cable to define and describe gene function of any free from the obstacles that the non-model organisms
species (Ashburner et al., 2000). KEGG is a database had relatively poor genetic background and few base
that is able to anaylze gene product in metabolism researches on their genes in the past. Full genome
process and related gene function in the cellular sequencing has been applied in multi-level researching
processes. With the help of KEGG database, we can areas, especially in the biology. Biologists could
further study genes' biological complex behaviors make the best of this technology to sequence the
(Altermann and Klaenhammer, 2005). Softwares that genome of important species, which would help
are mostly used including: Blast2go (Conesa et al., them to know the information of gene sequence,
2005), WEGO (Ye et al., 2006), GoMiner (Zeeberg to elucidate the evolutionary of the species and to
et al., 2003), DAVIA (Dennis et al., 2003), VisANT better understand the molecular mechanisms of
(Hu et al., 2009) etc. life. Watermelon (Citrullus lanatus) is an important
cucurbit crop grown throughout the world, Guo
Application of High-throughput Se- used high-throughput sequencing technology and
quencing Technology Sanger method to complete the watermelon genome
sequencing (Guo et al., 2013), and they reported a
It could be argued that the greatest transformative high-quality draft genome sequence of the east Asia
aspect of the Human Genome Project has not been watermelon cultivar 97103 (2n=2×=22) containing
the sequencing of the genome itself, but the resultant 23 440 predicted protein-coding genes. Comparative
development of new technologies. Since 454 Company genomics analysis provided an evolutionary scenario
developed the first full-automatic sequencer to open for the origin of the 11 watermelon chromosomes
the prelude to a new era of high-throughput DNA derived from a 7-chromosome paleohexaploid eudicot
sequencing in 2005, the technology had achieved a ancestor. Resequencing of 20 watermelon accessions
leap-type development, and brought genomics level representing three different C. lanatus subspecies
research into a new era. Meanwhile, this technology produced numerous haplotypes and identified the
has already made molecular biologists increase their extent of the genetic diversity and population structure
basic knowledge of genomics into a higher level. of watermelon germplasm. Genomic regions that were
Kahvejian et al. (2008) mentioned that high-through- preferentially selected during domestication were
put sequencing technology would allow us to move to identified. Many disease resistance genes were also
advances that we couldn't imagine yet. found to be lost during domestication. In addition,
integrative genomic and transcriptomic analyses
DNA level application gave important insights into aspects of phloem-based
Full genome sequencing vascular signaling in common between watermelon
Full genome sequencing, also known as de novo and cucumber and identified genes crucial to valuable
sequencing, directly sequence the whole genome of fruit-quality traits, including sugar accumulation
species and then get its complete genome sequences and citrulline metabolism. Meanwhile, genomic

E-mail: xuebaoenglish@neau.edu.cn
Zhu Qiang-long et al. High-throughput Sequencing Technology and Its Application ·91·
information of more and more species have been Variations (PAVs) and 17 111 Copy Number Varia-
published, such as potato (Solanum tuberosum) (Xu tions (CNVs). Meanwhile, they identified a cluster
et al., 2011), Chinese cabbage (Brassica rapa) (Wang of nearly 1 500 genes with structural differences in
et al., 2011), apple (Malus domestica Borkh) (Velasco sweet sorghum and sorghum grain. These genes were
et al., 2010), and cucumber (Cucumissativus L.) (Huang involved in metabolisms of sugar and starch, synthesis
et al., 2009). In addition, by virtue of its sensitive of lignin and coumarin, nucleic acid metabolism,
catch-capability for trace DNA, high-throughput stress response, biological processes and DNA repair.
sequencing technology has also been widely used In addition, in the field of evolution, scientists can
in paleontology researches. Rasmusse et al. (2010) apply population polymorphism analysis to explore
extracted DNA from a bunch of hair of Eskimos evolutionary model in different species; in the field of
4 000 years ago, then sequenced its whole-genome, microbiology, DNA sequencing genotyping has been
gotten about 79% of its sequence and compared proven faster and more accurate; in the medical field,
with the modern human genome sequence, which the genome re-sequencing has important significance
provided important information for exploring human in the discovery of relationship between SNPs and
evolutionary history. major diseases.
The whole genome re-sequencing
April 17, 2008, U.S. scientists in Nature published RNA level application
the genome sequencing results of "DNA Father" Transcriptome sequencing
James D.Watson, which is the first whole genome re- Transcriptome sequencing, also known as RNA-seq
sequencing results through high-throughput sequencing or mRNA-seq, namely enrich single-stranded mRNA
technology (Wheeler et al., 2008). The whole genome from total RNA, then reverse transcription into double-
re-sequencing is to sequence different individual stranded cDNA, then which will be used to high-
genomes of the same species under the condition throughput sequencing and subsequent correlation
of knowing its reference genome, and then conduct analysis. Transcriptome is the foundation and starting
differential analysis among individuals or groups. point for studying gene function and structure. With
Currently, re-sequencing with reference genome is reference genome squence, scientists can obtain much
applied widely in the field of the second generation more information, such as gene expression, alternative
sequencing technology and is also rapidly becoming splicing, optimizing-gene structure, and new genes
one of effective methods of breeding and has great by comparing RNA-seq data with genomic DNA
scientific values in the whole genome for scanning sequences. For no reference genome species, de novo
and detecting important traits of plants and animals sequencing would play an important role in trans-
associated with mutation sites. By re-sequencing, criptome studies, and would be effectively used to
scientists in the field of agriculture can obtain a discover new genes and develop new molecular
lot of Single Nucleotide Polymorphisms (SNPs), markers. Guo et al. (2011) performed half Roche/454
Insertions/Deletions (InDels), Structure Variations GS-FLX to identify more than 5 000 Simple Sequence
(SVs), and group's polymorphism. Zheng et al. (2011) Repeats (SSRs). Transcriptional regulation is the most
carried out the whole genome re-sequencing for three important regulation, and transcriptome sequencing
lines of sorghum bicolor with sequencing depth of 12 studies built on the basis of high-throughput sequenc-
times, then they took American grain sorghum genome ing have gradually substituted for gene chip technology
sequence as a reference to conduct information to be one of the current main approaches to study
analysis. They uncovered 1 057 018 SNPs, 99 948 gene expression on the level of the whole-genome.
InDels of 1-10 bp in length, 16 487 Presence/Absence By transcriptome sequencing, researchers could get

http: //publish.neau.edu.cn
·92· Journal of Northeast Agricultural University (English Edition) Vol. 21 No. 3 2014

abundance expression of transcript, transcriptional avoid many of the inherent limitations of microarray
loci, alternative splicing, SNP and other important analyses. The combination of transcriptome sequencing
information. Zhang et al. (2010) took Oryza sativa and DGE technologies can effectively explore new
L. ssp. indica cv. 9311 as material to researching rice functional genes for species with reference genome
transcriptome , they extracted total RNA from callus, or without. Luan et al. (2011) used DGE method to
root at seedling stage of 14 days, shoot at seedling analyze the gene expression variations between the
stage of 14 days, flag leaves at tillering stage, flag nonviruliferous and viruliferous whiteflie, then they
leaves at flowering stage, panicle at booting stage, revealed the relationship of coevolved adaptations
panicle at flowering stage, and panicle at filling stage, between begomoviruses and whiteflies and would
then sequenced the total RNA from each sample and provide a road map for future investigations into the
showed transcriptome map of different organs of complex interactions between plant viruses and their
cultivated rice. They detected 7 232 novel transcript insect vectors. In addition, Gao et al. (2014) performed
units, which have low abundance of expression and DGE to investigate the gene expression profiles of
tissue specificity, 23 800 alternative splicing occurred 4008 and p50 silkworm strains to provide important
in 33% of rice genome, 1 356 highly reliable chimeric clues on the molecular mechanism of BmCPV
fusions, and 234 candidate chimeric transcripts, invasion and resistance mechanism of silkworms
suggesting that the transcriptional fusion was more against BmCPV infection. Yan et al. (2014) applied
common than expected to occur, those data provided comparative DGE and quantitative real time PCR to
a stable foundation for future functional studies upon figure out five transcripts encoding proteins putatively
complex mechanisms of transcriptional regulation of associated with scent biosynthesis in roses and
rice. Besides that, the technology has been applied in provided a foundation for scent-related gene discovery
potato (Shan et al., 2013), watermelon (Grassi et al., in roses.
2013; Guo et al., 2011), pea (Liu et al., 2013), green MicroRNA sequencing
tea (Pan et al., 2014) and so on. MicroRNA is a non-coding RNA, only about 20-30
Digital gene expression profiling technology nucleotides, while it plays a significant role in vivo.
Digital Gene Expression (DGE) is to construct non- Post-transcriptional gene regulation by microRNA is
bias cDNA library of the cells or tissue in a particular a novel biological mechanism of gene regulation. In
state, through large-scale cDNA sequencing, collection recent years, the technology has been brought into a
of cDNA sequence fragments, and the qualitative and wide focus in the scientific community. And the rise
quantitative analysis about its mRNA population, of high-throughput sequencing has brought new ideas
scientists could obtain types of gene expression and to microRNA research. Although the high-through-
abundance information of the certain cell or tissue in put sequencing has the bottleneck of short sequence
the state. Gene expression profiling in the past mainly difficult to break, it is very suitable for sequencing
relied on conventional microarray technology, which microRNA. Researchers could take its advantage
relied on known gene sequences to design probes to predict new microRNA, research conserved
with fluorescence labeling and hybridization, then microRNA, establish microRNA expression profiling,
calculated the amount of expression according to compare miRNA expression abundance as well as find
fluorescence intensity, whereas its error was huge, and other non-coding RNA through sequencing. Wei et al.
it has no ability to detect unknown gene expression. In (2009) used high-throughput sequencing to research
contrast, DGE is more sensitive, accurate and suitable small RNAs of the locusts. By similarity searching
for comparing gene expression studies, and gradually against microRBase database, they identified 50
takes the place of the gene chip technology in order to conserved microRNA families, and identified 185

E-mail: xuebaoenglish@neau.edu.cn
Zhu Qiang-long et al. High-throughput Sequencing Technology and Its Application ·93·
unique microRNAs families of locust through bioinfor- methylated DNA immunoprecipitation sequencing
matics analysis. And the analysis of microRNAs (MeDIP-Seq) (Down et al., 2008), methyl-binding
expression between gregarious and solitary locust protein sequencing (MBD-Seq) and bisulfite se-
revealed that microRNAs expression of the solitary quencing (BS-Seq) (Cokus et al., 2008). High-
is richer than the gregarious, and drew expression throughput sequencing technology also provides an
profiles of microRNAs of two different lifestyles. efficient solution to detect genome-wide methylation
The technology, currently, has been succeeded in sites. Taylor et al. (2007) applied 454 sequencing
researching rice (Hu et al., 2014; Liu et al., 2014; technology to reveal an association between a single
Mittal et al., 2013; You et al., 2014) and peach nucleotide polymorphism and the methylation
(Zhenlin, 2013). present in LRP1B promoter. They finally concluded
that this new generation of methylome sequencing
Epigenomics applications would provide digital profiles of the aberrant DNA
Chromatin immunoprecipitation sequencing methylation for individual human cancers and offer
Chromatin immunoprecipitation sequencing (ChIP- a robust method for the epigenetic classification
Seq) technology is a powerful tool to study the of tumor subtypes. Currently, DNA methylation
interactions between protein and DNA in vivo, sequencing technology has achieved fruitful research
which combines the advantages of both Chromatin results in DNA methylation studies on CpG (Li et
immunoprecipitation (ChIP) and high-throughput al., 2013; Nan et al., 1998; Shanmuganathan et al.,
sequencing technology and have been successfully 2013), cancer (Calcagno et al., 2013), and provided
applied in the genome-wide study, such as protein an important alternative to conventional approaches in
binding sites, transcription factor binding sites and human brain studies (Houston et al., 2013).
specific histone modification sites studies. Thus,
scientists could get the information from segment of Conclusions
DNA interacted with transcription factors or histone
in full genome-wide. Li et al. (2014) used ChIP-seq to High-throughput sequencing technology is still at its
predict estrogen receptor (ER) biding sites in human early stage of development, but we could foresee that
breast cancer cell line MCF7, their result showed that it will be the golden time for the rapid development
E2 stimulated breast cancer cell growth through ER, of the third generation sequencing technology and
which might infer the function of ER in occurrence the coexistence of sequencing technologies of three
and development of breast cancer. In recent years, generations in the next few years. With the appearance
ChIP-Seq has also been applied mainly in studies on of new sequencing technologies, the cost of sequencing
rats (Corbo et al., 2010; Hull et al., 2013; Rintisch would continue to decline rapidly. Development of
et al., 2014; Triff et al., 2013) and human (Liu and new drug for incurable diseases, molecular breeding
Cheung, 2014; Pinho et al., 2013; Xing et al., 2013; technology for fine breeds will become easier and
Zheng et al., 2013). faster. Therefore, scientists in different fields will be
DNA methylation sequencing allowed to spend less and less money on sequencing
DNA methylation is another important way of gene genome or transcriptome of species to achieve better
regulation, which can control gene expression by experimental design and obtain more new discoveries.
altering chromatin structure, stability of DNA and In addition, how to analyze the massive sequencing
DNA-protein interactions. Currently, there are at data generated by high-throughput sequencing
least three kinds of DNA methylation analysis techni- technology and extract valuable bio-information from
que established on high-throughput sequencing: will become a hot research in the future.

http: //publish.neau.edu.cn
·94· Journal of Northeast Agricultural University (English Edition) Vol. 21 No. 3 2014

Grabherr M G, Haas B J, Yassour M, et al. 2011. Full-length

References transcriptome assembly from RNA-seq data without a reference

Aksyonov S A, Bittner M, Bloom L B, et al. 2006. Multiplexed DNA genome. Nature Biotechnology, 29(7): 644-650.

sequencing-by-synthesis. Analytical Biochemistry, 348(1): 127-138. Grassi S, Piro G, Lee J M, et al. 2013. Comparative genomics reveals

Altermann E and Klaenhammer T R. 2005. Pathway voyager: pathway candidate carotenoid pathway regulators of ripening watermelon

mapping using the Kyoto Encyclopedia of Genes and Genomes fruit. BMC Genomics, 14(1): 781-793.

(KEGG) database. BMC Genomics, 6: 49-55. Guo S, Liu J, Zheng Y, et al. 2011. Characterization of transcriptome

Ansorge W J. 2009. Next-generation DNA sequencing techniques. New dynamics during watermelon fruit development: sequencing,

Biotechnology, 25(4): 195-203. assembly, annotation and gene expression profiles. BMC Genomics,

Ashburner M, Ball C A, Blake J A, et al. 2000. Gene ontology: tool for 12: 454.

the unification of biology. Nature genetics, 25(1): 25-29. Guo S, Zhang J, Sun H, et al. 2013. The draft genome of watermelon

Birol I, Jackman S D, Nielsen C B, et al. 2009. De novo transcriptome (Citrullus lanatus) and resequencing of 20 diverse accessions. Nat

assembly with ABySS. Bioinformatics, 25(21): 2872-2877. Genet, 45(1): 51-58.

Butler J, MacCallum I, Kleber M, et al. 2008. ALLPATHS: de novo Harris T D, Buzby P R, Babcock H, et al. 2008. Single-molecule DNA

assembly of whole-genome shotgun microreads. Genome Res, 18(5): sequencing of a viral genome. Science, 320(5872): 106-109.

810-820. Houston I, Peter C J, Mitchell A, et al. 2013. Epigenetics in the human

Calcagno D Q, Gigek C O, Chen E S, et al. 2013. DNA and brain. Neuropsychopharmacology, 38(1): 183-197.

histone methylation in gastric carcinogenesis. World Journal of Hu W, Wang T, Yue E, et al. 2014. Flexible microRNA arm selection

Gastroenterology, 19(8): 1182-1192. in rice. Biochemical and Biophysical Research Communications,

Cheung F, Haas B J, Goldberg S M D, et al. 2006. Sequencing 447(3): 526-530.

medicago truncatula expressed sequenced tags using 454 life sciences Hu Z, Hung J-H, Wang Y, et al. 2009. VisANT 3.5: multi-scale network

technology. BMC Genomics, 7: 272-283. visualization, analysis and inference based on the gene ontology.

Cokus S J, Feng S, Zhang X, et al. 2008. Shotgun bisulphite sequencing Nucleic Acids Research, 37: 115-121.

of the Arabidopsis genome reveals DNA methylation patterning. Huang S, Li R, Zhang Z, et al. 2009. The genome of the cucumber,

Nature, 452(7184): 215-219. Cucumis sativus L. Nat Genet, 41(12): 1275-1281.

Conesa A, Gotz S, Garcia-Gomez J M, et al. 2005. Blast2GO: a Hull R P, Srivastava P K, Souza Z, et al. 2013. Combined ChIP-seq and

universal tool for annotation, visualization and analysis in functional transcriptome analysis identifies AP-1/JunD as a primary regulator

genomics research. Bioinformatics, 21(18): 3674-3676. of oxidative stress and IL-1 beta synthesis in macrophages. BMC

Corbo J C, Lawrence K A, Karlstetter M, et al. 2010. ChIP-seq reveals Genomics, 14: 5-16.

the cis-regulatory architecture of mouse photoreceptors. Genome Iseli C, Jongeneel C V, Bucher P. 1999. ESTScan: a program for

Research, 20(11): 1512-1525. detecting, evaluating, and reconstructing potential coding regions in

Dennis G, Sherman B T, Hosack D A, et al. 2003. DAVID: database for EST sequences. Proceedings International Conference on Intelligent

annotation, visualization, and integrated discovery. Genome Biology, Systems for Molecular Biology; ISMB. International Conference on

4(9): 12-22. Intelligent Systems for Molecular Biology, 12: 138-148.

Down T A, Rakyan V K, Turner D J, et al. 2008. A bayesian Kahvejian A, Quackenbush J, Thompson J F. 2008. What would you

deconvolution strategy for immunoprecipitation-based DNA do if you could sequence everything. Nature Biotechnology, 26(10):

methylome analysis. Nature Biotechnology, 26(7): 779-785. 1125-1133.

Eid J, Fehr A, Gray J, et al. 2009. Real-time DNA sequencing from Langmead B, Trapnell C, Pop M, et al. 2009. Ultrafast and memory-

single polymerase molecules. Science, 323(5910): 133-138. efficient alignment of short DNA sequences to the human genome.

Gao K, Deng X, Qian H, et al. 2014. Cytoplasmic polyhedrosis virus- Genome Biol, 10(3): 25-29.

induced differential gene expression in two silkworm strains of Li H, Durbin R. 2009. Fast and accurate short read alignment with

different susceptibilities. Gene, 539(2): 230-237. burrows-wheeler transform. Bioinformatics, 25(14): 1754-1760.

E-mail: xuebaoenglish@neau.edu.cn
Zhu Qiang-long et al. High-throughput Sequencing Technology and Its Application ·95·
Li H, Ruan J and Durbin R. 2008. Mapping short DNA sequencing on to inhibit lung tumorigenesis. Molecular Carcinogenesis, 53(1):

reads and calling variants using mapping quality scores. Genome 19-29.

Research, 18(11): 1851-1858. Pinho F G, Frampton A E, Nunes J, et al. 2013. Downregulation of

Li Q, Wang H, Yu L, et al. 2014. ChIP-seq predicted estrogen receptor microRNA-515-5p by the estrogen receptor modulates sphingosine

biding sites in human breast cancer cell line MCF7. Tumor Biology, kinase 1 and breast cancer cell proliferation. Cancer Research,

35(5): 4779-4784. 73(19): 5936-5948.

Li R, Yu C, Li Y, et al. 2009. SOAP2: an improved ultrafast tool for Rasmussen M, Li Y, Lindgreen S, et al. 2010. Ancient human genome

short read alignment. Bioinformatics, 25(15): 1966-1967. sequence of an extinct Palaeo-Eskimo. Nature, 463(7282): 757-762.

Li Z-G, Jiao Y, Li W-J, et al. 2013. Hypermethylation of two CpG Rintisch C, Heinig M, Bauerfeind A, et al. 2014. Natural variation of

sites upstream of CASP8AP2 promoter influences gene expression histone modification and its impact on gene expression in the rat

and treatment outcome in childhood acute lymphoblastic leukemia. genome. Genome Research, 24(6): 942-953.

Leukemia Research, 37(10): 1287-1293. Rusk N. 2009. Cheap third-generation sequencing. Nature Methods,

Lin H, Zhang Z, Zhang M Q, et al. 2008. ZOOM! Zillions of oligos 6(4): 244-245.

mapped. Bioinformatics, 24(21): 2431-2437. Sanger F. 1988. Sequences, sequences, and sequences. Science,

Liu H, Guo S, Xu Y, et al. 2014. OsmiR396d regulated OsGRFs 280(5369): 1515-1515.

function in floral organogenesis in rice through binding to their Sanger F, Nicklen S and Coulson A R. 1977. DNA sequencing with

targets OsJMJ706 and OsCR4. Plant Physiology, 165(1): 160-174. chain-terminating inhibitors. Proceedings of the National Academy of

Liu M H, Cheung E. 2014. Estrogen receptor-mediated long-range Sciences of the United States of America, 74(12): 5463-5467.

chromatin interactions and transcription in breast cancer. Molecular Schuster S C. 2008. Next-generation sequencing transforms today's

and Cellular Endocrinology, 382(1): 624-632. biology. Nature Methods, 5(1): 16-18.

Liu Z, Ma L, Nan Z, et al. 2013. Comparative transcriptional profiling Shan J, Song W, Zhou J, et al. 2013. Transcriptome analysis reveals

provides insights into the evolution and development of the novel genes potentially involved in photoperiodic tuberization in

zygomorphic flower of Vicia sativa (Papilionoideae). PLoS One, potato. Genomics, 102(4): 388-396.

8(2): 573-588. Shanmuganathan R, Basheer N B, Amirthalingam L, et al. 2013.

Luan J-B, Li J-M, Varela N, et al. 2011. Global analysis of the Conventional and nanotechniques for DNA methylation profiling.

transcriptional response of whitefly to tomato yellow leaf curl china Journal of Molecular Diagnostics, 15(1): 17-26.

virus reveals the relationship of coevolved adaptations. Journal of Sultan M, Schulz M H, Richard H, et al. 2008. A global view of gene

Virology, 85(7): 3330-3340. activity and alternative splicing by deep sequencing of the human

Maxam A M and Gilbert W. 1977. A new method for sequencing DNA. transcriptome. Science, 321(5891): 956-960.

Proceedings of the National Academy of Sciences of the United States Taylor K H, Kramer R S, Davis J W, et al. 2007. Ultradeep bisulfite

of America, 74(2): 560-564. sequencing analysis of DNA methylation patterns in multiple gene

Mittal D, Mukherjee S K, Vasudevan M, et al. 2013. Identification of promoters by 454 sequencing. Cancer Res, 67(18): 8511-8518.

tissue-preferential expression patterns of rice miRNAs. Journal of Trapnell C, Roberts A, Goff L, et al. 2012. Differential gene and

Cellular Biochemistry, 114(9): 2071-2081. transcript expression analysis of RNA-seq experiments with TopHat

Nan X, Ng H H, Johnson C A, et al. 1998. Transcriptional repression and Cufflinks. Nature Protocols, 7(3): 562-578.

by the methyl-CpG-binding protein MeCP2 involves a histone Triff K, Konganti K, Gaddis S, et al. 2013. Genome-wide analysis of the

deacetylase complex. Nature, 393(6683): 386-389. rat colon reveals proximal-distal differences in histone modifications

Nirenberg M, Caskey T, Marshall R, et al. 1966. The RNA code and and proto-oncogene expression. Physiological Genomics, 45(24):

protein synthesis. Cold Spring Harbor Symposia on Quantitative 1229-1243.

Biology, 31: 11-24. Velasco R, Zharkikh A, Affourtit J, et al. 2010. The genome of the

Pan J, Zhang Q, Xiong D, et al. 2014. Transcriptomic analysis by RNA- domesticated apple (Malus x domestica Borkh). Nature Genetics,

seq reveals AP-1 pathway as key regulator that green tea may rely 42(10): 833-840.

http: //publish.neau.edu.cn
·96· Journal of Northeast Agricultural University (English Edition) Vol. 21 No. 3 2014

Wang X, Wang H, Wang J, et al. 2011. The genome of the annotations. Nucleic Acids Research, 34: 293-297.

mesopolyploid crop species Brassica rapa. Nature Genetics, 43(10): You J, Zong W, Du H, et al. 2014. A special member of the rice

1035-1157. SRO family, OsSRO1c, mediates responses to multiple abiotic

Watson J D and Crick F H. 1953. Molecular structure of nucleic acids; a stresses through interaction with various transcription factors. Plant

structure for deoxyribose nucleic acid. Nature, 171(4356): 737-738. Molecular Biology, 84(6): 693-705.

Weber A P M, Weber K L, Carr K, et al. 2007. Sampling the Zeeberg B R, Feng W, Wang G, et al. 2003. GoMiner: a resource

arabidopsis transcriptome with massively parallel pyrosequencing. forbiological interpretation of genomic and proteomic data. Genome

Plant Physiology, 144(1): 32-42. Biol, 4(2): 28-32.

Wei Y, Chen S, Yang P, et al. 2009. Characterization and comparative Zhang G, Guo G, Hu X, et al. 2010. Deep RNA sequencing at single

profiling of the small RNA transcriptomes in two phases of locust. base-pair resolution reveals high complexity of the rice transcriptome.

Genome Biology, 10(1): 45-60. Geome Res, 20(5): 646-654.

Wheeler D A, Srinivasan M, Egholm M, et al. 2008. The complete Zheng L Y, Guo X S, He B, et al. 2011. Gemome-wide patterns of

genome of an individual by massively parallel DNA sequencing. genetic variation in sweet and grain sorghum (Sorghum bicolor).

Nature, 452(7189): 872-885. Geome Biol, 12(11): 114-120.

Xing Y, Yang Y, Zhou F, et al. 2013. Characterization of genome-wide Zheng Y, Zha Y, Spaapen R M, et al. 2013. Egr2-dependent gene

binding of NF-kappa B in TNF alpha-stimulated HeLa cells. Gene, expression profiling and ChIP-Seq reveal novel biologic targets in T

526(2): 142-149. cell anergy. Molecular Immunology, 56(4): 530-536.

Xu X, Pan S, Cheng S, et al. 2011. Genome sequence and analysis of Zhenlin W. 2013. Identification and characterization of microRNAs

the tuber crop potato. Nature, 475(7355): 189-194. and their targets in peach (Prunus persica). International Journal of

Yan H, Zhang H, Chen M, et al. 2014. Transcriptome and gene Agriculture and Biology, 15(5): 1017-1020.

expression analysis during flower blooming in Rosa chinensis Zhou X G, Ren L F, Li Y T, et al. 2010. The next-generation sequencing

'Pallida'. Gene, 540(1): 96-103. technology: a technology review and future perspective. Sci China

Ye J, Fang L, Zheng H, et al. 2006. WEGO: a web tool for plotting GO Life Sci, 53(1): 13-25.

E-mail: xuebaoenglish@neau.edu.cn

You might also like