Professional Documents
Culture Documents
Download textbook Computational Systems Biology Tao Huang ebook all chapter pdf
Download textbook Computational Systems Biology Tao Huang ebook all chapter pdf
Huang
Visit to download the full and correct content document:
https://textbookfull.com/product/computational-systems-biology-tao-huang/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/emerging-trends-in-applications-
and-infrastructures-for-computational-biology-bioinformatics-and-
systems-biology-systems-and-applications-1st-edition-arabnia/
https://textbookfull.com/product/computational-systems-biology-
approaches-in-cancer-research-1st-edition-inna-kuperstein-editor/
https://textbookfull.com/product/complex-systems-and-
computational-biology-approaches-to-acute-inflammation-2nd-
edition-yoram-vodovotz/
https://textbookfull.com/product/computational-psychiatry-a-
systems-biology-approach-to-the-epigenetics-of-mental-
disorders-1st-edition-rodrick-wallace-auth/
Systems Biology Nikolaus Rajewsky
https://textbookfull.com/product/systems-biology-nikolaus-
rajewsky/
https://textbookfull.com/product/computational-methods-in-
synthetic-biology-mario-andrea-marchisio/
https://textbookfull.com/product/genedis-2018-computational-
biology-and-bioinformatics-panayiotis-vlamos/
https://textbookfull.com/product/micro-electro-mechanical-
systems-micro-nano-technologies-qing-an-huang/
https://textbookfull.com/product/systems-biology-methods-in-
molecular-biology-2745-mariano-bizzarri/
Methods in
Molecular Biology 1754
Computational
Systems Biology
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK
Edited by
Tao Huang
Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Editor
Tao Huang
Shanghai Institutes for Biological Sciences
Chinese Academy of Sciences
Shanghai, China
This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC part of
Springer Nature.
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Preface
v
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 DNA Sequencing Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Keyi Long, Lei Cai, and Lin He
2 Transcriptome Sequencing: RNA-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Hong Zhang, Lin He, and Lei Cai
3 Capture Hybridization of Long-Range DNA Fragments
for High-Throughput Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Xing Chen, Gang Ni, Kai He, Zhao-Li Ding, Gui-Mei Li,
Adeniyi C. Adeola, Robert W. Murphy, Wen-Zhi Wang, and Ya-Ping Zhang
4 The Introduction and Clinical Application of Cell-Free Tumor DNA. . . . . . . . . . 45
Jun Li, Renzhong Liu, Cuihong Huang, Shifu Chen, and Mingyan Xu
5 Bioinformatics Analysis for Cell-Free Tumor DNA Sequencing Data . . . . . . . . . . 67
Shifu Chen, Ming Liu, and Yanqing Zhou
6 An Overview of Genome-Wide Association Studies . . . . . . . . . . . . . . . . . . . . . . . . . 97
Michelle Chang, Lin He, and Lei Cai
7 Integrative Analysis of Omics Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Xiang-Tian Yu and Tao Zeng
8 The Reconstruction and Analysis of Gene Regulatory Networks . . . . . . . . . . . . . . 137
Guangyong Zheng and Tao Huang
9 Differential Coexpression Network Analysis for Gene Expression Data . . . . . . . . 155
Bao-Hong Liu
10 iSeq: Web-Based RNA-seq Data Analysis and Visualization . . . . . . . . . . . . . . . . . . 167
Chao Zhang, Caoqi Fan, Jingbo Gan, Ping Zhu, Lei Kong, and Cheng Li
11 Revisit of Machine Learning Supported Biological and Biomedical Studies. . . . . 183
Xiang-tian Yu, Lu Wang, and Tao Zeng
12 Identifying Interactions Between Long Noncoding RNAs
and Diseases Based on Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Wei Lan, Liyu Huang, Dehuan Lai, and Qingfeng Chen
13 Survey of Computational Approaches for Prediction
of DNA-Binding Residues on Protein Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Yi Xiong, Xiaolei Zhu, Hao Dai, and Dong-Qing Wei
14 Computational Prediction of Protein O-GlcNAc Modification . . . . . . . . . . . . . . . 235
Cangzhi Jia and Yun Zuo
15 Machine Learning-Based Modeling of Drug Toxicity. . . . . . . . . . . . . . . . . . . . . . . . 247
Jing Lu, Dong Lu, Zunyun Fu, Mingyue Zheng, and Xiaomin Luo
16 Metabolomics: A High-Throughput Platform for Metabolite
Profile Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Jing Cheng, Wenxian Lan, Guangyong Zheng, and Xianfu Gao
vii
viii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Contributors
ADENIYI C. ADEOLA State Key Laboratory of Genetic Resources and Evolution, Kunming,
Yunnan, China; China-Africa Centre for Research and Education & Yunnan Laboratory
of Molecular Biology of Domestic Animals, Kunming, Yunnan, China; Animal Branch
of the Germplasm Bank of Wild Species, Kunming Institute of Zoology, Chinese Academy
of Sciences, Kunming, Yunnan, China
LEI CAI Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders
(Ministry of Education), Collaborative Innovation Center for Genetics and Development,
Bio-X Institutes, Shanghai Jiao Tong University, Shanghai, China
MICHELLE CHANG Key Laboratory for the Genetics of Developmental and Neuropsychiatric
Disorders (Ministry of Education), Collaborative Innovation Center of Genetics and
Development, Bio-X Institutes, Shanghai Jiao Tong University, Shanghai, China
JIAN CHEN State Key Laboratory of Transducer Technology, Institute of Electronics, Chinese
Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing,
China
QINGFENG CHEN School of Computer, Electronics and Information, Guangxi University,
Nanning, China; State Key Laboratory for Conservation and Utilization of Subtropical
Agro-bioresources, Guangxi University, Nanning, China
SHIFU CHEN HaploX Biotechnology, Shenzhen, Guangdong, China
XING CHEN State Key Laboratory of Genetic Resources and Evolution, Kunming, Yunnan,
China
JING CHENG Department of Medical Instrument, Shanghai University of Medicine
and Health Sciences, Shanghai, China
HAO DAI School of Life Sciences and Biotechnology, Shanghai Jiao Tong University,
Shanghai, China
ZHAO-LI DING Kunming Biological Diversity Regional Centre of Large Apparatus
and Equipments, Kunming, Yunnan, China; Public Technology Service Centre,
Kunming, Yunnan, China
BEIYUAN FAN State Key Laboratory of Transducer Technology, Institute of Electronics,
Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences,
Beijing, China
CAOQI FAN Peking-Tsinghua Center for Life Sciences, Academy for Advanced
Interdisciplinary Studies; Center for Bioinformatics, School of Life Sciences, Peking
University, Beijing, China
ZUNYUN FU State Key Laboratory of Drug Research, Drug Discovery and Design Center,
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
JINGBO GAN Peking-Tsinghua Center for Life Sciences, Academy for Advanced
Interdisciplinary Studies; Center for Bioinformatics, School of Life Sciences, Peking
University, Beijing, China
SHAN GAO College of Life Sciences, Nankai University, Tianjin, People’s Republic of China;
Institute of Statistics, Nankai University, Tianjin, People’s Republic of China
XIANFU GAO Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology,
Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
ix
x Contributors
KAI HE State Key Laboratory of Genetic Resources and Evolution, Kunming, Yunnan,
China
LIN HE Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders
(Ministry of Education), Collaborative Innovation Center for Genetics and Development,
Bio-X Institutes, Shanghai Jiao Tong University, Shanghai, China
CUIHONG HUANG HaploX Biotechnology, Shenzhen, Guangdong, China
LIYU HUANG Information and Network Center, Guangxi University, Nanning, China
TAO HUANG Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences,
Shanghai, China
CANGZHI JIA Department of Mathematics, Dalian Maritime University, Dalian, China
LEI KONG Peking-Tsinghua Center for Life Sciences, Academy for Advanced
Interdisciplinary Studies; Center for Bioinformatics, School of Life Sciences, Peking
University, Beijing, China
DEHUAN LAI School of Computer, Electronics and Information, Guangxi University,
Nanning, China
WEI LAN School of Computer, Electronics and Information, Guangxi University, Nanning,
China
WENXIAN LAN State Key Laboratory of Bio-Organic and Natural Product Chemistry,
Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, China
CHENG LI Peking-Tsinghua Center for Life Sciences, Academy for Advanced
Interdisciplinary Studies; Center for Bioinformatics, School of Life Sciences, Peking
University, Beijing, China; Center for Statistical Science, Peking University, Beijing,
China
GUI-MEI LI Kunming Biological Diversity Regional Centre of Large Apparatus and
Equipments, Kunming, Yunnan, China; Public Technology Service Centre, Kunming,
Yunnan, China
JUN LI HaploX Biotechnology, Shenzhen, Guangdong, China
BAO-HONG LIU State Key Laboratory of Veterinary Etiological Biology; Key Laboratory of
Veterinary Parasitology of Gansu Province; Lanzhou Veterinary Research Institute, Chinese
Academy of Agricultural Sciences, Lanzhou, Gansu, People’s Republic of China; Jiangsu
Co-Innovation Center for Prevention and Control of Animal Infectious Diseases and
Zoonoses, Yangzhou, People’s Republic of China
MING LIU HaploX Biotechnology, Nanshan District, Shenzhen, Guangdong, China
RENZHONG LIU HaploX Biotechnology, Shenzhen, Guangdong, China
KEYI LONG Key Laboratory for the Genetics of Developmental and Neuropsychiatric
Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and
Development, Bio-X Institutes, Shanghai Jiao Tong University, Shanghai, China
DONG LU State Key Laboratory of Drug Research, Drug Discovery and Design Center,
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China;
University of Chinese Academy of Sciences, Beijing, China
JING LU Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai
University), Ministry of Education, Collaborative Innovation Center of Advanced Drug
Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai
University, Yantai, China
XIAOMIN LUO Drug Discovery and Design Center, State Key Laboratory of Drug Research,
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
Contributors xi
ROBERT W. MURPHY State Key Laboratory of Genetic Resources and Evolution, Kunming,
Yunnan, China; Centre for Biodiversity and Conservation Biology, Royal Ontario
Museum, Toronto, ON, Canada
GANG NI State Key Laboratory of Genetic Resources and Evolution, Kunming, Yunnan,
China; Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming,
Yunnan, China
YANG SHU Precision Medicine Center, State Key Laboratory of Biotherapy, Precision
Medicine Key Laboratory of Sichuan Province, West China Hospital, Sichuan University,
Chengdu, Sichuan, China
JUNBO WANG State Key Laboratory of Transducer Technology, Institute of Electronics,
Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences,
Beijing, China
LU WANG Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology,
Chinese Academy Science, Shanghai, China
WEN-ZHI WANG State Key Laboratory of Genetic Resources and Evolution, Kunming
Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China; Animal
Branch of the Germplasm Bank of Wild Species, Kunming Institute of Zoology, Chinese
Academy of Sciences, Kunming, Yunnan, China; Wildlife Forensics Science Services,
Kunming, Yunnan, China; Guizhou Academy of Testing and Analysis, Guiyang,
Guizhou, China
YI-YI WANG Department of Neurology, Tianjin Haihe Hospital, Tianjin, P.R. China
DONG-QING WEI School of Life Sciences and Biotechnology, Shanghai Jiao Tong University,
Shanghai, China
BING-DI XIE Department of Neurology, Tianjin Medical University General Hospital,
Tianjin, P.R. China
YI XIONG School of Life Sciences and Biotechnology, Shanghai Jiao Tong University,
Shanghai, China
HENG XU State Key Laboratory of Biotherapy, Precision Medicine Key Laboratory of
Sichuan Province, Precision Medicine Center, West China Hospital, Sichuan University,
Chengdu, Sichuan, China
MINGYAN XU HaploX Biotechnology, Shenzhen, Guangdong, China
YING XU Key Laboratory of Cell Differentiation and Apoptosis of Ministry of Education,
Department of Pathophysiology, Shanghai Jiao-Tong University School of Medicine,
Shanghai, China
YUNGANG XU Center for Systems Medicine, School of Biomedical Informatics, UTHealth at
Houston, Houston, TX, USA; Center for Bioinformatics and Systems Biology, Wake Forest
School of Medicine, Winston-Salem, NC, USA
XIANG-TIAN YU Key Laboratory of Systems Biology, Institute of Biochemistry and Cell
Biology, Chinese Academy Science, Shanghai, China
TAO ZENG Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology,
Chinese Academy Science, Shanghai, China
CHAO ZHANG PKU-Tsinghua-NIBS Graduate Program, School of Life Sciences, Peking
University, Beijing, China
HONG ZHANG Key Laboratory for the Genetics of Developmental and Neuropsychiatric
Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and
Development, Bio-X Institutes, Shanghai Jiaotong University, Shanghai, China
xii Contributors
YA-PING ZHANG State Key Laboratory of Genetic Resources and Evolution, Kunming,
Yunnan, China; Yunnan Laboratory of Molecular Biology of Domestic Animals,
Kunming, Yunnan, China; Animal Branch of the Germplasm Bank of Wild Species,
Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China;
Laboratory for Conservation and Utilization of Bio-resource and Key Laboratory for
Microbial Resources of the Ministry of Education, Yunnan University, Kunming, Yunnan,
China
GUANGYONG ZHENG Key Laboratory of Computational Biology, Bio-Med Big Data Center,
CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological
Sciences, Chinese Academy of Sciences, Shanghai, China
MINGYUE ZHENG State Key Laboratory of Drug Research, Drug Discovery and Design
Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai,
China
XIAOBO ZHOU Center for Systems Medicine, School of Biomedical Informatics, UTHealth at
Houston, Houston, TX, USA; Center for Bioinformatics and Systems Biology, Wake Forest
School of Medicine, Winston-Salem, NC, USA
YANQING ZHOU HaploX Biotechnology, Nanshan District, Shenzhen, Guangdong, China
PING ZHU Peking-Tsinghua Center for Life Sciences, Academy for Advanced
Interdisciplinary Studies; Center for Bioinformatics, School of Life Sciences, Peking
University, Beijing, China
XIAOLEI ZHU School of Life Sciences and Biotechnology, Shanghai Jiao Tong University,
Shanghai, China
YUN ZUO Department of Mathematics, Dalian Maritime University, Dalian, China
Chapter 1
Abstract
Among various biological data, DNA sequence is doubtlessly a fundamental datum. By obtaining particular
DNA sequence data and analyzing, biologists get to understand life science more precisely. This chapter is
an overview of DNA sequencing technology and its data analysis methods, providing information about
DNA sequencing, several different methods, and tools applied in data analysis. Both advantages and
disadvantages are discussed.
Key words DNA sequence, DNA sequencing, Data analysis, Sequence comparison, Methods and
tools
1 DNA Sequencing
Three essential elements of life science are DNA, RNA, and protein;
they lay the foundation of all living creatures. Millions of scientists
make joint efforts to understand the mystery of life, and tons of
work have been done to figure out relations between structures and
their properties.
For molecular biologists, information encoded in the
sequences of nucleic acid molecules is of vital importance since it
not only passes the genetic information from generation to genera-
tion but also influences function by transcription and translation.
Research at the frontiers of life science cannot be done without
obtaining and analyzing certain DNA sequences, which means
determining the particular order and number of the four bases—
adenine, guanine, cytosine, and thymine—in a strand of DNA.
Advances in recombinant DNA technology have allowed the isola-
tion of large numbers of biologically interesting fragments of
DNA [1].
1.1 Methods of DNA With the help of restriction endonucleases, large DNA molecules
Sequencing can be cut into small fragments in an orderly fashion. Also, recom-
binant DNA techniques aid in purifying and characterizing
Tao Huang (ed.), Computational Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1754,
https://doi.org/10.1007/978-1-4939-7717-8_1, © Springer Science+Business Media, LLC, part of Springer Nature 2018
1
2 Keyi Long et al.
1.1.1 Traditional There are two basic methods in DNA sequencing—the Maxam-
Methods Gilbert sequencing (also known as chemical sequencing) as well as
the chain termination method (also known as Sanger sequencing).
The former method attaches radioactive labels to the 50 end of
DNA, and by using chemical treatment, it generates subsequent
breaks at particular bases. Autoradiography helps yield a series of
dark bands, which represent the radiolabeled DNA fragments. On
the other hand, Sanger’s method requires modified
di-deoxynucleoside triphosphates (ddNTPs). Due to the fact that
DNA polymerase I cannot distinguish normal deoxynucleoside
triphosphates (dNTPs) and ddNTPs, those new strands with
ddNTPs lack a 30 -OH group required for the formation of a
phosphodiester bond between two nucleotides, thus stopping the
elongation of DNA. By labeling ddNTPs, we get to know the DNA
sequence [2].
Although Sanger’s way is effective in many aspects, it can only
read 450 bp in a single reaction, and the process is time-consuming,
limiting its use in large fragment sequencing. After prevailing for
decades, other methods are invented and widely used on the basis
of their work, like the shotgun strategy and bridge PCR. More
importantly, with the rapid development of science and technology,
high-throughput sequencing methods are established; they then
play an essential role in modern DNA sequencing with the ability to
process mass data in a short time.
1.1.2 High-Throughput Since the 1990s, a handful of new methods of DNA sequencing
(HTP) Sequencing Methods were invented—454 pyrosequencing, Illumina (Solexa) sequenc-
ing, and SOLiD sequencing are three most used technologies.
Other methods include the massively parallel signature sequencing
(MPSS), the polony sequencing, DNA nanoball sequencing, etc.
These methods all share common characteristics of high through-
put and low costs, and together they were known as the “next-
generation” sequencing (NGS) methods. The core thought of
HTP methods is to do DNA sequencing while synthesizing the
new strand.
DNA Sequencing Data Analysis 3
Table 1
Comparison of several high-throughput sequencing methods
Pyrosequencing 700 bp 99.9% 1 million 24 h $10 Long read size. Runs are
Keyi Long et al.
2.1 General Steps of Generally, DNA sequencing data analysis includes these four steps:
DNA Sequencing Data l Trimming of overlapping sequences.
Analysis
l Multiple alignments of template sequences.
l Consistency check between reading text and chromatogram
peak data.
l Review and correction of software misreads.
To be more precise, by using DNA sequencing technology,
especially the Sanger sequencing, we obtain data in the form of
chromatogram—a series of four differently colored peaks. Usually,
after opening the result file in a software such as Chromas Lite,
there shows red, black, green, and blue peaks, each color
corresponding to a different DNA base. On both ends of the
chromatogram, there exist about 50 bases that are difficult to
recognize. This is because of impurities and is a normal
phenomenon.
When screening the chromatogram, we are likely to find two
overlapping peaks. It seems that this spot represents a heterozygos-
ity locus. However, things get more complicated when the two
overlapping peaks have different axes or when the two peaks share
one axis but are of the same height. This spot is not a heterozygos-
ity locus since one peak is the interference peak. Mostly, one or two
spots before a big base peak exists an interference peak whose
height is approximately half of the big peak. The closer they are,
the more interference they have. And under these circumstances,
the computer often makes mistakes; that is where humans step in
and correct those misreads.
When checking the outcome of the software, we conclude
some rules to help us determine whether the results are accurate
after tons of work:
1. The main peak mostly sits on the right side of the
interference peak.
2. The interference peak can be higher or lower or of the same
height than the main peak.
As a result, in order to reduce misreads, we often do several
procedures:
1. Consistency check among reading text and results in gene pool
and chromatogram peak data must be done.
6 Keyi Long et al.
2.2 Procedure for When it comes to analyzing the results of next-generation DNA
NGS Data Analysis sequencing (NGS) data, the situation is more complicated. This is
because the results are determined by varied DNA library con-
2.2.1 Quality Control
structing process and adaptors-adding process. Since the modern
high-throughput sequencers can generate hundreds of millions of
sequences in a single run, before analyzing this sequence to draw
biological conclusions, we are prone to perform some simple qual-
ity control checks to ensure that the raw data looks good and there
are no problems or biases in the data.
Although many sequencers will generate a QC report, this is
usually not enough since it only focused on identifying problems
which were generated by the sequencer itself. FastQC is a widely
used software that aims to provide a more detailed QC report,
which can spot problems which originate either in the sequencer
or in the starting library material. When using FastQC, we should
know the following steps:
1. Use the Linux system and install FastQC:
(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
2. Type in command “fastqc [-o output dir] [--(no)extract] [-f
fastq|bam|sam] [-c contaminant file].” “output dir” means the
output path, the parameter “extract” determines the output
unpacking, and the parameter “-f” represents the format of
input.
3. Run FastQC and read the result files:
l The HTML report shows a summary of the modules which
were run and a quick evaluation of whether the results of the
module seem entirely normal (green tick), slightly abnormal
(orange triangle), or very unusual (red cross).
l View the per base sequence quality. Quality can be seen as
the value of Fred. In “10 log10( p),” “p” stands for the
possibility of a mistake. Values of the lower quartile and the
median should be considered. If the value of the lower
quartile exceeds 30, the quality can be regarded as
very good.
l View the per sequence quality scores. Normally, if 90% of
the reads have the quality value of more than 35 scores, the
quality can be regarded as very good.
l View the distribution of A,T,G,C. In most cases, the
amount of A/T (28%) outweighs that of G/C (22%).
DNA Sequencing Data Analysis 7
2.2.2 Data Analysis For data analysis, we choose Illumina system as an example. Illu-
mina offers a variety of next-generation sequencing (NGS) data
analysis software tools. Push-button tools for DNA sequence align-
ment, variant calling, and data visualization are all included. Data
generated on Illumina sequencing instruments are automatically
transferred and stored securely in BaseSpace Sequence Hub. And
the analyzing procedure should be done as follows:
Primary Analysis 1. Judge the results’ quality. If the outcome is not in good quality,
the analyzing process will be meaningless.
2. Searching for your aim fragments.
3. Real-time analysis and base calling by the Illumina system.
Secondary Analysis 1. After real-time analysis (RTA) in the primary analysis, use
MiSeq Reporter, an online software, to analyze data.
2. After opening MiSeq Reporter, click “analysis” to see different
modules including A (assembly), E (enrichment), G (generate
FASTQ), M (metagenomics), R (Resequencing), etc.
3. Choose the analyzing module you need and run the procedure.
4. Read the MiSeq Reporter report. For example, if you choose
module R, after running the resequencing procedure, the
detailed report will show a list of samples, a table of targets, a
list of SNPs and their corresponding scores, Q score, as well as
the depth of sequencing.
5. The output is in demultiplex (*.demux) and FASTQ (*.fastq)
formats. You can use third-party software programs to further
analyze the data.
6. Compare the results with the reference genome.
2.3 Several Tools to It is a DNA sequence viewer and annotation tool written in Java.
Facilitate Data User can download it for free and run it under systems including
Analysis UNIX, GNU/Linux, Macintosh, and Windows.
First, import information from EMBL and GenBank, as well as
2.3.1 Artemis R5
files in FASTA format. Then it gives visualization of sequence
features, next-generation data and the results of analyses within
the context of the sequence, and also its six-frame translation.
2.3.4 SSAHA2 (Sequence It is a pairwise sequence alignment program designed for the
Search and Alignment by efficient mapping of sequencing reads onto genomic reference
Hashing Algorithm) sequences.
It can recognize a range of output formats concluding SAM,
CIGAR, PSL, etc. And this tool reads data from most sequencing
platforms like ABI-Sanger, Roche 454, and Illumina-Solexa.
There are many other tools for researchers to use, facilitating
them to better analyze data generated. Table 2 illustrates tools of
different kinds.
3.1 Background In the past decades, many manual methods have been applied to
analyzing DNA sequence data. However, the drawbacks of these
methods are apparent—when the data is in extraordinary amount,
it takes lots of time and energy. Fortunately, computers are well-
used in solving the problem. By establishing DNA sequence data-
bases storing data information of magnanimity, researchers are able
to adopt statistical approaches for analysis.
DNA Sequencing Data Analysis 9
Table 2
Several tools for data analysis
3.1.1 Two Stages of DNA Analyzing nucleic acid sequences with computer programs can be
Sequence Analysis divided into two stages:
1. The first stage is the straightforward search for sequences with
known properties, which involves position determination.
2. The second stage aims to detect subtle, less straightforward
sequence patterns including controlling elements like promo-
ters. The results can be presented by catalogs of sequence
patterns.
3.2 Methods and There are various widely used DNA sequencing data analysis tools;
Tools some are more familiar to us while some may not.
3.2.1 Two Types of DNA DNA sequence alignment can be divided into different types:
Sequence Alignment
1. Pairwise alignment: it can only compare two sequences.
2. Multiple sequence alignment: it is an extension of pairwise
alignment to incorporate more than two sequences at a time.
Several software are chosen to be discussed as follows.
3.2.2 BLAST BLAST, also known as Basic Local Alignment Search Tool (site:
blast.ncbi.nlm.nih.gov/Blast.cgi), is an algorithm to compare pri-
mary biological sequence information. Usually, you don’t have to
download and install it. All you have to do is to visit the website
stated above.
BLAST is actually a family of programs that is widely used in
bioinformatics; it enables us to make comparison between the
query sequence and a database of sequences. Those sequences can
belong to DNA, RNA, or protein. By selecting particular BLAST
tool and determining a certain threshold, we can identify sequences
that resemble the input sequence. For nucleic acid, there is
nucleotide-nucleotide BLAST (blastn). After putting in a DNA
query and setting certain parameters, we get results showing the
most similar DNA sequences.
Blastn does its job by locating short matches. Usually, there is a
threshold score T. If the score is higher than a predetermined T, the
alignment will be included in the results given by BLAST and vice
versa. Therefore, choosing a proper value of T means getting a
proper amount of results.
This tool is highly sensitive and can be utilized for several
purposes: species identification, domains location, phylogeny
establishment, etc.
1. Visit the site blast.ncbi.nlm.nih.gov/Blast.cgi and choose
blastn.
2. Upload your DNA sequence in proper format like FASTA.
3. Set proper parameters including T.
4. Click BLAST.
5. Reviewing your alignment results; mismatches can be a frame-
shift in the query sequence.
6. If any error exists, go back, check the sequence file, change
values of parameters, and BLAST again.
DNA Sequencing Data Analysis 11
3.2.3 CLUSTAL Clustal is an effective tool for multiple alignment of nucleic acid and
protein sequences. After downloading the software, we input data
containing DNA sequences, then set certain parameters and wait
for the results. When multiple sequence alignment is needed, we
use Clustal X.
The proper input formats conclude NBRF/PIR, FASTA,
EMBL/Swiss-Prot, Clustal, GCC/MSF, GCG9 RSF, and GDE,
while the output format can be Clustal, NBRF/PIR, GCG/MSF,
PHYLIP, GDE, or NEXUS.
When using Clustal for data analysis, the bigger the input file is,
the longer it takes for alignment. The results obtained from Clustal
can be further utilized by loading the output file into other software
like MEGA, which will be soon discussed.
1. Download the desktop application and open it.
2. Upload file containing DNA sequences in proper format; at
this stage, you can have a look at the colored bases.
3. Select different tools for different purposes:
Select “do complete alignment” for a pairwise alignment.
Select “do alignment from guide tree and phylogeny” to create
a guide tree (or use a user-defined tree).
Select “produce guide tree only” to use the guide tree to carry
out a multiple alignment.
4. Review the results, save it in a favorable format.
5. The results can be used for further studies.
References
1. Gingeras TR, Roberts RJ (1980) Steps toward 8. Kumar S, Nei M, Dudley J, Tamura K (2008)
computer analysis of nucleotide sequences. Sci- MEGA: a biologist-centric software for evolu-
ence 209(4463):1322–1328 tionary analysis of DNA and protein sequences.
2. Sanger F, Air GM, Barrell BG, Brown NL, Brief Bioinform 9(4):299–306. https://doi.
Coulson AR, Fiddes CA, Hutchison CA, Slo- org/10.1093/bib/bbn017
combe PM, Smith M (1977) Nucleotide 9. Cai L, Yuan W, Zhang Z, He L, Chou GC
sequence of bacteriophage phi X174 DNA. (2016) In-depth comparison of somatic point
Nature 265(5596):687–695. https://doi. mutation callers based on different tumor next-
org/10.1016/0022-2836(78)90346-7 generation sequencing depth data. Sci Rep
3. ten Bosch JR, Grody WW (2008) Keeping up 6:36540. https://doi.org/10.1038/
with the next generation: massively parallel srep36540
sequencing in clinical diagnostics. J Mol 10. Huang T, Liu CL, Li LL, Cai MH, Chen WZ,
Diagn 10(6):484–492. https://doi.org/10. Xu YF, O’Reilly PF, Cai L, He L (2016) A new
2353/jmoldx.2008.080027 method for identifying causal genes of schizo-
4. Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, phrenia and anti-tuberculosis drug-induced
Lu L, Law M (2012) Comparison of next- hepatotoxicity. Sci Rep 6:32571. https://doi.
generation sequencing systems. J Biomed Bio- org/10.1038/srep32571
technol 2012:251364. https://doi.org/10. 11. Fang S, Zhang Y, Xu M, Xue C, He L, Cai L,
1155/2012/251364 Xing X (2016) Identification of damaging
5. Schadt EE, Turner S, Kasarskis A (2010) A nsSNVs in human ERCC2 gene. Chem Biol
window into third-generation sequencing. Drug Des 88(3):441–450. https://doi.org/
Hum Mol Genet 19(R2):R227–R240. 10.1111/cbdd.12772
https://doi.org/10.1093/hmg/ddq416 12. Cai L, Deng SL, Liang L, Pan H, Zhou J, Wang
6. Excoffier L, Laval G, Schneider S (2005) Arle- MY, Yue J, Wan CL, He G, He L (2013)
quin (version 3.0): an integrated software pack- Identification of genetic associations of
age for population genetics data analysis. Evol SP110/MYBBP1A/RELA with pulmonary
Bioinformatics Online 1:47–50 tuberculosis in the Chinese Han population.
7. Librado P, Rozas J (2009) DnaSP v5: a soft- Hum Genet 132:265–273. https://doi.org/
ware for comprehensive analysis of DNA poly- 10.1007/s00439-012-1244-5
morphism data. Bioinformatics 25
(11):1451–1452
Chapter 2
Abstract
RNA sequencing (RNA-seq) can not only be used to identify the expression of common or rare transcripts
but also in the identification of other abnormal events, such as alternative splicing, novel transcripts, and
fusion genes. In principle, RNA-seq can be carried out by almost all of the next-generation sequencing
(NGS) platforms, but the libraries of different platforms are not exactly the same; each platform has its own
kit to meet the special requirements of the instrument design.
Key words Next-generation sequencing, RNA sequencing, Messenger RNA, Library construction,
Data analysis
1 Introduction
Tao Huang (ed.), Computational Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1754,
https://doi.org/10.1007/978-1-4939-7717-8_2, © Springer Science+Business Media, LLC, part of Springer Nature 2018
15
16 Hong Zhang et al.
Fig. 1 mRNA library construction workflow for Illumina (from David Corney 2013)
2 Materials
2.2 mRNA Library 1. RNA Purification Beads: purifying the poly-A containing
Construction mRNA molecules using oligo-dT attached magnetic bead,
stored at 4 C (Illumina, San Diego, CA).
2. Bead Washing Buffer (BWB), Elution Buffer (ELB), Bead-
Binding Buffer (BBB): 1 tube per 48 reactions, stored at
20 C (Illumina, San Diego, CA).
3. Elute, Prime, Fragment Mix (EPF): 1 tube per 48 reactions,
stored at 20 C (Illumina, San Diego, CA).
4. First-Strand Master Mix (FSM): 1 tube, stored at 20 C
(Illumina, San Diego, CA).
5. SuperScript II Reverse Transcriptase: 1 tube, stored at 20 C.
6. Second-Strand Master Mix (SSM): 1 tube per 48 reactions,
stored at 25 C to 15 C (Illumina, San Diego, CA).
7. AMPure XP beads: stored at 4 C.
8. 80% ethanol.
9. Resuspension Buffer (RSB): 1 tube, stored at 20 C.
10. End-Repair Mix: add 50 -phosphate groups needed for down-
stream ligation, 1 tube per 48 reactions, stored at 20 C
(Illumina, San Diego, CA).
11. A-Tailing Mix: make fragments compatible with adapters and
prevent self-ligation by adding a 30 -A overhang, 1 tube per
48 reactions, stored at 20 C (Illumina, San Diego, CA).
12. Ligation Mix: join 30 -T overhang adapters to 30 -A overhang
inserts, 1 tube per 48 reactions, stored at 20 C (Illumina,
San Diego, CA).
13. Stop Ligation Buffer: inactivate the ligation. 1 tube per 48 reac-
tions, stored at 20 C (Illumina, San Diego, CA).
14. Resuspension Buffer (RSB): 1 tube, stored at 20 C (Illu-
mina, San Diego, CA).
15. PCR Master Mix (PMM): 1 tube per 48 reactions, stored at
20 C (Illumina, San Diego, CA).
16. PCR Primer Cocktail (PPC): 1 tube per 48 reactions, stored at
20 C (Illumina, San Diego, CA).
17. Sequencing chip: flow cell.
18. Illumina HiSeq system.
3 Methods
3.1 Total RNA 1. Remove the tissue sample from 80 C refrigerator, and
Extraction immediately put it in the thermos cup with liquid nitrogen
(see Note 1).
2. Remove the sample from the liquid nitrogen and put into a
1.5 mL EP tube; add 300 μL TRIzol reagent, fully grinding
with an electric tissue grinder; then add 700 L TRIzol; and
place the tube on the ice for 30 min to ensure that sufficient
crushing of the cells.
3. Add 200 μL chloroform, vortex, and then centrifuge at
13,000 g for 10 min.
4. Remove supernatant to a new EP tube (see Note 2).
5. Add 500 μL isopropanol, vortex, place at 20 C for 20 min,
and then centrifuge at 13,000 g for 10 min.
6. Discard supernatant; add 1 mL 70% ethanol solution, mild
concussion for 10s; and then centrifuge at 8000 g for 2 min.
7. Discard supernatant, and repeat step 6 one time.
8. Discard supernatant, centrifuge at 8000 g for 15 s, remove
excess liquid, and place the EP tube on ice for 2 min to make
ethanol fully volatile.
9. According to the precipitation size, add 30–200 μL ultrapure
water.
10. Determine the concentration of RNA solution by using Nano-
Drop 2000 spectrophotometer.
11. Use the Agilent 2100 Bioanalyzer system to detect the RNA
integrity (see Note 3).
12. RNA solution should be stored in the 80 C refrigerator.
3.2 Library 1. Add 2 μg total RNA samples (less than 50 μL) to a 200 μL EP
Construction tube, dilute to 50 μL, then add 50 μL RNA Purification Beads
(see Note 4), and gently pipette the entire volume up and down
eight times to mix thoroughly.
2. Place the EP tube on PCR thermal cycler (65 C for 5 min,
4 C hold) to denature the RNA.
3. Place the EP tube at room temperature for 5 min to facilitate
binding of the polyA RNA to the beads.
Transcriptome Sequencing: RNA-Seq 19
Title: The art of music, Vol. 07 (of 14), Pianoforte and chamber
music
Language: English
Editor-in-Chief
Associate Editors
Managing Editor
CÉSAR SAERCHINGER
Modern Music Society of New York
In Fourteen Volumes
Profusely Illustrated
NEW YORK
THE NATIONAL SOCIETY OF MUSIC
Home Concert
Department Editor:
Introduction by
HAROLD BAUER
NEW YORK
THE NATIONAL SOCIETY OF MUSIC
Copyright, 1915, by
THE NATIONAL SOCIETY OF MUSIC, Inc.
[All Rights Reserved]
PREFATORY NOTE
The editor has not attempted to give within the limits of this single
volume a detailed history of the development of both pianoforte and
chamber music. He has emphasized but very little the historical
development of either branch of music, and he has not pretended to
discuss exhaustively all the music which might be comprehended
under the two broad titles.
In the part of the book dealing with chamber music the material has
been somewhat arbitrarily arranged according to combinations of
instruments. The string quartets, the pianoforte trios, quartets, and
quintets, the sonatas for violin and piano, and other combinations
have been treated separately. The selection of some works for a
more or less detailed discussion, and the omission of even the
mention of others, will undoubtedly seem unjustifiable to some; but
the editor trusts at least that those he has chosen for discussion may
illumine somewhat the general progress of chamber music from the
time of Haydn to the present day.
For the chapters on violin music before Corelli and the beginnings of
chamber music we are indebted to Mr. Edward Kilenyi, whose initials
appear at the end of these chapters.
Leland Hall
INTRODUCTION
The term Chamber Music, in its modern sense, cannot perhaps be
strictly defined. In general it is music which is fine rather than broad,
or in which, at any rate, there is a wealth of detail which can be
followed and appreciated only in a relatively small room. It is not, on
the whole, brilliantly colored like orchestral music. The string quartet,
for example, is conspicuously monochrome. Nor is chamber music
associated with the drama, with ritual, pageantry, or display, as are
the opera and the mass. It is—to use a well worn term—very nearly
always absolute music, and, as such, must be not only perfect in
detail, but beautiful in proportion and line, if it is to be effective.
With very few exceptions, all the great composers have sought
expression in chamber music at one time or another; and their
compositions in this branch seem often to be the finest and the most
intimate presentation of their genius. Haydn is commonly supposed
to have found himself first in his string quartets. Mozart’s great
quartets are almost unique among his compositions as an
expression of his genius absolutely uninfluenced by external
circumstances and occasion. None of Beethoven’s music is more
profound nor more personal than his last quartets. Even among the
works of the later composers, who might well have been seduced
altogether away from these fine and exacting forms by the
intoxicating glory of the orchestra, one finds chamber music of a rich
and special value.
In order to study intelligently the mechanics, or, if you will, the art of
touch upon the piano, and in order to comprehend the variety of
tone-color which can be produced from it, one must recognize at the
outset the fact that the piano is an instrument of percussion. Its
sounds result from the blows of hammers upon taut metal strings.
With the musical sound given out by these vibrating strings must
inevitably be mixed the dull and unmusical sound of the blow that set
them vibrating. The trained ear will detect not only the thud of the
hammer against the string, but that of the finger against the key, and
that of the key itself upon its base. The study of touch and tone upon
the piano is the study of the combination and the control of these two
elements of sound, the one musical, the other unmusical.
The pianist can acquire but relatively little control over the musical
sounds of his instrument. He can make them soft and loud, but he
cannot, as the violinist can, make a single tone grow from soft to
loud and die away to soft again. The violinist or the singer both
makes and controls tone, the one by his bow, the other by his breath;
the pianist, in comparison with them, but makes tone. Having caused
a string to vibrate by striking it through a key, he cannot even sustain
these vibrations. They begin at once to weaken; the sound at once
grows fainter. Therefore he has to make his effects with a volume of
sounds which has been aptly said to be ever vanishing.
On the other hand, these sounds have more endurance than those
of the xylophone, for example; and in their brief span of failing life the
skillful pianist may work somewhat upon them according to his will.
He may cut them exceedingly short by allowing the dampers to fall
instantaneously upon the strings, thus stopping all vibrations. He
may even prolong a few sounds, a chord let us say, by using the
sustaining pedal. This lifts the dampers from all the strings, so that
all vibrate in sympathy with the tones of the chord and reënforce
them, so to speak. This may be done either at the moment the notes
of the chord are struck, or considerably later, after they have begun
appreciably to weaken. In the latter case the ear can detect the
actual reënforcement of the failing sounds.
Moreover, the use of the pedal serves to affect somewhat the color
of the sounds of the instrument. All differences in timbre depend on
overtones; and if the pianist lifts all dampers from the strings by the
pedals, he will hear the natural overtones of his chord brought into
prominence by means of the sympathetic vibrations of other strings
he has not struck. He can easily produce a mass of sound which
strongly suggests the organ, in the tone color of which the shades of
overtones are markably evident.
The study of such effects will lead him beyond the use of the pedal
into some of the niceties of pianoforte touch. He will find himself able
to suppress some overtones and bring out others by emphasizing a
note here and there in a chord of many notes, especially in an
arpeggio, and by slighting others. Such an emphasis, it is true, may
give to a series of chords an internal polyphonic significance; but if
not made too prominent, will tend rather to color the general sound
than to make an effect of distinct drawing.
It will be observed that in the matter of so handling the volume of
musical sound, prolonging it and slightly coloring it by the use of the
pedal or by skillful emphasis of touch, the pianist’s attention is
directed ever to the after-sounds, so to speak, of his instrument. He
is interested, not in the sharp, clear beginning of the sound, but in
what follows it. He finds in the very deficiencies of the instrument
possibilities of great musical beauty. It is hardly too much to say,
then, that the secret of a beautiful or sympathetic touch, which has
long been considered to be hidden in the method of striking the keys,
may be found quite as much in the treatment of sounds after the
keys have been struck. It is a mystery which can by no means be
wholly solved by a muscular training of the hands; for a great part of
such training is concerned only with the actual striking of the keys.
We have already said that striking the keys must produce more or
less unmusical sounds. These sounds are not without great value.
They emphasize rhythm, for example, and by virtue of them the
piano is second to no instrument in effects of pronounced,
stimulating rhythm. The pianist wields in this regard almost the
power of the drummer to stir men to frenzy, a power which is by no
means to be despised. In martial music and in other kinds of
vigorous music the piano is almost without shortcomings. But
inasmuch as a great part of pianoforte music is not in this vigorous
vein, but rather in a vein of softer, more imaginative beauty, the
pianist must constantly study how to subject these unmusical sounds
to the after-sounds which follow them. In this study he will come
upon the secret of the legato style of playing.