Professional Documents
Culture Documents
Next Generation Sequencing: Methods and Protocols
Next Generation Sequencing: Methods and Protocols
Steven R. Head
Phillip Ordoukhanian
Daniel R. Salomon Editors
Next
Generation
Sequencing
Methods and Protocols
Methods in Molecular Biology
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK
Editors
Steven R. Head
Next Generation Sequencing and Microarray Core Facility,
The Scripps Research Institute,
La Jolla, CA, USA
Phillip Ordoukhanian
Next Generation Sequencing and Microarray Core Facility,
The Scripps Research Institute,
La Jolla, CA, USA
Daniel R. Salomon
Department of Molecular Medicine, The Scripps Research Institute,
Center for Organ and Cell Transplantation,
La Jolla, CA, USA
Editors
Steven R. Head Phillip Ordoukhanian
Next Generation Sequencing and Microarray Next Generation Sequencing and Microarray
Core Facility Core Facility
The Scripps Research Institute The Scripps Research Institute
La Jolla, CA, USA La Jolla, CA, USA
Daniel R. Salomon
Department of Molecular Medicine
The Scripps Research Institute
Center for Organ and Cell Transplantation
La Jolla, CA, USA
This volume is dedicated to the memory of our friend, colleague, mentor, and surf buddy,
Daniel R. Salomon, M.D.
v
Preface
vii
viii Preface
characterize rare circulating CD4 T cells using a Biomark HD and profiles 96 different
genes by quantitative PCR (qPCR) and generates NGS libraries using the Biomark HD
Access Array. Chapter 8 describes a protocol for isolation, extraction, and sequencing of
single bacterial and archaeal cells using FACS, MDA, a 16S rRNA screen, and a computa-
tional approach for quality assurance.
Two other chapters (Chapters 9 and 12) have more of a computational focus. Chapter 9
uses low pass whole genome sequencing and reference-guided assembly of non-model organ-
isms for SNP discovery allowing for the genotyping of populations. Chapter 12 describes a
computational pipeline for RNAseq analysis from tissue samples containing a complex hetero-
geneous population of cell types (e.g., a blood sample). The pipeline is able to estimate cell
type composition and other statistical analysis information generated from bulk RNAseq
profiles.
Another important component to almost all NGS-related work, especially those involv-
ing microbiome studies, is contamination. To this end, Chapter 11 focuses on the best
practices and approaches for sample handling, DNA and/or RNA extraction, and library
preparation from microbial and viral samples to generate NGS libraries. Chapter 6 discusses
a novel method for generating sequencing libraries from viral RNA termed, “Clickseq,”
which uses “Click Chemistry” to attach one of the NGS adapters preventing artifactual
generation of chimeras, and consequently, greatly increasing the ability to detect rare
recombination events in viral RNA. The last chapter in the book (Chapter 16) describes
another RNAseq library prep method for profiling reverse transcription termination sites.
It is an efficient protocol and generates good NGS library yields from low RNA inputs
generated from protocols such as nascent RNA sequencing, RNA Immunoprecipitation
(RIPseq), and 5′-RACE structural probing.
Next Generation Sequencing technology has brought together disparate fields of
research from bacterial and viral studies to the study of plants, non-model organisms, and
of course human disease. This requires collaboration between the users of NGS data and
those needed to design and perform the enzymatic procedures for the preparation of
sequencing libraries and ensure the desired target is being captured, targeted, or amplified.
Other critical players are the engineers that develop microfluidics and sequencing hardware
systems or the bioinformatics experts necessary to ensure data generated is being analyzed
correctly and translated into a meaningful analysis. We hope you enjoy this book and find
it informative and a useful reference.
Preface������������������������������������������������������������������������������������������������������������������������������� vii
Contributors ���������������������������������������������������������������������������������������������������������������������� xi
1 An Integrated Polysome Profiling and Ribosome Profiling Method
to Investigate In Vivo Translatome��������������������������������������������������������������������� 1
Hyun Yong Jin and Changchun Xiao
2 Measuring Nascent Transcripts by Nascent-seq��������������������������������������������������� 19
Fei Xavier Chen, Stacy A. Marshall, Yu Deng, and Sun Tianjiao
3 Genome-Wide Copy Number Alteration Detection in Preimplantation
Genetic Diagnosis ��������������������������������������������������������������������������������������������� 27
Lieselot Deleye, Dieter De Coninck, Dieter Deforce,
and Filip Van Nieuwerburgh
4 Multiplexed Targeted Sequencing for Oxford Nanopore MinION:
A Detailed Library Preparation Procedure ��������������������������������������������������������� 43
Timokratis Karamitros and Gkikas Magiorkinis
5 Hi-Plex for Simple, Accurate, and Cost-Effective Amplicon-based
Targeted DNA Sequencing ������������������������������������������������������������������������������� 53
Bernard J. Pope, Fleur Hammet, Tu Nguyen-Dumont, and Daniel J. Park
6 ClickSeq: Replacing Fragmentation and Enzymatic Ligation
with Click-Chemistry to Prevent Sequence Chimeras����������������������������������������� 71
Elizabeth Jaworski and Andrew Routh
7 Genome-Wide Analysis of DNA Methylation in Single Cells
Using a Post-bisulfite Adapter Tagging Approach����������������������������������������������� 87
Heather J. Lee and Sébastien A. Smallwood
8 Sequencing of Genomes from Environmental Single Cells ��������������������������������� 97
Robert M. Bowers, Janey Lee, and Tanja Woyke
9 SNP Discovery from Single and Multiplex Genome Assemblies
of Non-model Organisms����������������������������������������������������������������������������������� 113
Phillip A. Morin, Andrew D. Foote, Christopher M. Hill,
Benoit Simon-Bouhet, Aimee R. Lang, and Marie Louis
10 CleanTag Adapters Improve Small RNA Next-Generation Sequencing
Library Preparation by Reducing Adapter Dimers����������������������������������������������� 145
Sabrina Shore, Jordana M. Henderson, and Anton P. McCaffrey
11 Sampling, Extraction, and High-Throughput Sequencing Methods
for Environmental Microbial and Viral Communities����������������������������������������� 163
Pedro J. Torres and Scott T. Kelley
12 A Bloody Primer: Analysis of RNA-Seq from Tissue Admixtures������������������������� 175
Casey P. Shannon, Chen Xi Yang, and Scott J. Tebbutt
13 Next-Generation Sequencing of Genome-Wide CRISPR Screens����������������������� 203
Edwin H. Yau and Tariq M. Rana
ix
x Contents
Index����������������������������������������������������������������������������������������������������������������������� 263
Contributors
xi
xii Contributors
Abstract
Recent advances in global translatome analysis technologies enable us to understand how translational
regulation of gene expression modulates cellular functions. In this chapter, we present an integrated
method to measure various aspects of translatome by polysome profiling and ribosome profiling using
purified B cells. We standardized our protocols to directly compare the results from these two approaches.
Parallel assessment of translatome with these two approaches can generate a comprehensive picture on
how translational regulation determines protein output.
Key words Translatome, Ribosome profiling, Ribo-seq, Polysome profiling, Translational regulation
of gene expression
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_1, © Springer Science+Business Media, LLC 2018
1
2 Hyun Yong Jin and Changchun Xiao
Table 1
The strength and weakness of polysome profiling and ribosome profiling
2 Materials
2.2 B Cell 1. MACS LD column (Miltenyi Biotech, San Diego, CA, USA).
Purification 2. B cell medium: DMEM-GlutaMAX, 50 mL FCS, 5 mL of
and Activation 100× Nonessential amino acids, 5 mL of 1 M HEPES, 5 mL
of 100× Penn/Strep, 0.6 mL of 55 mM β-Mercaptoethanol.
3. Lipopolysaccharides from Escherichia coli 055:B5 (LPS).
4 Hyun Yong Jin and Changchun Xiao
Purify B cells
In vitro activation
CHX treatment
Polysome Ribosome
Cytosolic fractions
profiling profiling
Split samples
15% RNase I digestion
Harvest
45% all fractions 15%
Harvest
monosome fraction only
45%
qRT-PCR or
deep sequencing
Footprint library
Deep sequencing
4. IL-4.
5. Ficoll solution.
2.3 Cytosolic 1. Hypotonic buffer: 1.5 mM KCl, 10 mM MgCl2, 5 mM Tris–
Fractionation HCl, pH 7.4, 100 μg/mL CHX.
2. Hypotonic lysis buffer: 2% sodium deoxycholate, 2% Triton
X-100, 2.5 mM DTT, 100 μg/mL CHX, 10 units of RNase
inhibitor/mL.
2.5 Ribosome 1. RNase decontamination wipes and reagent (see Note 1).
Profiling Library 2. RNase-free tubes with a low-binding affinity for nucleic acids
Preparation (see Note 2).
3. Hypotonic lysis buffer without RNase Inhibitor: 2% sodium
deoxycholate, 2% Triton X-100, 2.5 mM DTT, 100 μg/mL
CHX.
4. RNase I.
An Integrated Polysome Profiling and Ribosome Profiling Method… 5
5. RNase inhibitor.
6. miRNeasy RNA purification kit (Qiagen, Valencia, CA, USA)
(see Note 3).
7. 2× denaturing loading buffer: 98%(vol/vol) formamide with
10 mM EDTA and 300 μg/mL bromophenol blue.
8. 5× TBE solution: Combine; 27 g Tris Base, 13.7 g Boric acid,
pH 8.0, 10 mL 0.5 M EDTA and fill up to 500 mL RNase-
free water.
9. 10% polyacrylamide TBE-Urea gel: Combine; 16.8 g urea,
4 mL of 5× TBE and 16 mL RNase-free water. Heat to dis-
solve urea for 20 min. Cool down the solution to room tem-
perature and add 10 mL of 40% acrylamide/bis solution
(37.5:1), 240 μL of 10% APS, and 32 μL of TEMED. For
more detailed condition, see ref. 14.
10. 10 bp DNA ladder.
11. 26–34 nt RNA marker: 10 μM mix of NI-NI-19 and NI-NI-
20. For the detailed sequence information, see ref. 15.
12. SYBR gold, 10,000× (ThermoFisher Scientific).
13. Isopropanol (≥99.5%).
14. Glycoblue, 15 mg/mL (ThermoFisher Scientific).
2.5.1 Ribo-Zero rRNA 1. Ribo-Zero rRNA removal kit (Illumina, San Diego, CA, USA).
Depletion 2. Zymo RNA Clean and Concentrator-5 kit (Zymo Research,
Irvine, CA, USA).
Barcode_Forward 5′-aatgatacggcgaccaccgagatctacac-3′
Barcode_Reverse_01_ 5′-caagcagaagacggcatacgagatAGTCGTgtgactggagttcagacgtgtgctcttccgatct-3′
ACGACT
Barcode_Reverse_02_ 5′-caagcagaagacggcatacgagatACTGATgtgactggagttcagacgtgtgctcttccgatct-3′
ATCAGT
Barcode_Reverse_03_ 5′-caagcagaagacggcatacgagatATGCTGgtgactggagttcagacgtgtgctcttccgatct-3′
CAGCAT
Barcode_Reverse_04_ 5′-caagcagaagacggcatacgagatACGTCGgtgactggagttcagacgtgtgctcttccgatct-3′
CGACGT
Barcode_Reverse_05_ 5′-caagcagaagacggcatacgagatAGCTGCgtgactggagttcagacgtgtgctcttccgatct-3′
GCAGCT
Barcode_Reverse_06_ 5′-caagcagaagacggcatacgagatATCGTAgtgactggagttcagacgtgtgctcttccgatct-3′
TACGAT
Barcode_Reverse_07_ 5′-caagcagaagacggcatacgagatCGTCAGgtgactggagttcagacgtgtgctcttccgatct-3′
CTGACG
Barcode_Reverse_08_ 5′-caagcagaagacggcatacgagatCGTAGCgtgactggagttcagacgtgtgctcttccgatct-3′
GCTACG
Barcode_Reverse_09_ 5′-caagcagaagacggcatacgagatGCTGCAgtgactggagttcagacgtgtgctcttccgatct-3′
TGCAGC
3 Methods
3.1 Generation 1. Generate 15% sucrose solution (50 mL of 5× sucrose base buf-
of Sucrose Gradients fer, 37.5 g of sucrose, and fill up to 250 mL).
2. Generate 45% sucrose solution (50 mL of 5× sucrose base buf-
fer, 112.5 g of sucrose, and fill up to 250 mL).
3. (Optional) Filter the sucrose solution using 0.45 μm vacuum
filter.
4. Next, generate five different concentrations of gradients by
mixing 15% and 45% sucrose solution.
3.2 B Cell 1. 100 × 106 follicular B cells were purified from total spleno-
Purification cytes by depleting cells positive for CD5, CD43 or CD93
and Activation using MACS LD column according to the manufacturer’s
instruction. The purified cells were in vitro activated with
LPS (25 μg/mL) and IL-4 (5 ng/mL) in B cell medium for
aimed time points [16].
2. Before harvest, add CHX directly to the culture medium and
incubate for 15 min. This will freeze the ribosomes on mRNAs.
3. (Optional) At later time points of B cell activation, a significant
fraction of cells undergo apoptosis. Purifying live cells from the
culture may improve the results. Live cells can be purified using
Ficoll solution according to the manufacturer’s instruction.
During this process, add same final concentration of CHX to
Ficoll solution to maintain ribosome association to mRNAs.
4. Split samples into two groups at this point. One for polysome
profiling (see Subheading 3.3) and the other for ribosome pro-
filing (see Subheading 3.5.1).
8 Hyun Yong Jin and Changchun Xiao
3.4 Polysome 1. Carefully overlay the cytosolic fractions on the thawed 15–45%
Profiling and Total sucrose gradient at 4 °C.
RNA Extraction 2. Centrifuge at 40,000 rpm (274,000 × g at r-max and
from Each Fraction 121,000 × g at r-min) for 1.5 h at 4 °C, using SW41 rotor.
3. Fractionate the sucrose into 20 tubes using automated frac-
tionation machine and simultaneously measure A254 values to
estimate the overall translation profile. Each fraction will con-
tain roughly 540 μL of sucrose after fractionation. Flow rates
of 0.75 mL/min and collection exchange speed of 0.73 min/
tube are recommended.
4. (Optional) Add equal amounts of Solaris RNA (2 μL) to each
collection as a normalization control. Alternatively, ERCC
RNA can be used.
5. Add 1.5 mL of Trizol-LS to each tube (total ~2 mL solution)
and extract total RNA following the manufacturer’s instruction.
ar
ol
le
os
uc
yt
N
HDAC1
GAPDH
Fig. 2 Western blot analysis showing the efficient separation of cytosolic from
nuclear fractions. After the cytosolic fractions are harvested, remaining pellets
(nuclear fractions) were further lysed with 1% NP-40 lysis buffer with standard
procedure. Note that the nuclear marker, HDAC1, is mainly present in the nucleus
while the cytosolic marker, GAPDH, is present exclusively in cytosolic fractions
An Integrated Polysome Profiling and Ribosome Profiling Method… 9
3.5.2 RNase 1. Take the 600 μL of cytosolic fraction and add 7.5 μL of RNase
I Footprinting I (100 U/μL).
and Ribosome Recovery 2. Incubate for 45 min at room temperature with gentle mixing
on nutator.
3. To stop the RNase I digestion, add 10 μL of SUPERase
Inhibitor and tap the tubes gently.
4. Run 15–45% sucrose gradients as described in Subheading 3.4
and confirm efficient RNase I digestion (see Fig. 3).
5. Collect and combine fractions 6, 7, and 8. Fractions 9–10
include disome-protected mRNA fragments which may repre-
sent ribosome stalling [8], and are excluded from our analysis
(see Fig. 3).
6. Add 4.5 mL of Trizol-LS to the combined fractions (total
6 mL) and extract total RNA. To increase yield, the miRNeasy
RNA purification kit is recommended. Expected amounts of
RNA from 50 million B cells in each step are listed in Table 2.
A254
A254
Disome
Collection
stopped
Fr: 1 5 10 15 20 Fr: 6 7 8
Fig. 3 A254 profiles before and after RNase I digestion. Profiles are from 25.5h activated wild type B cells
Table 2
Expected RNA amount at each step, starting with 50 million 25.5 h
activated B cells
Expected RNA
Step Method amount
3.5.1 Total RNA from cytosolic fractions and ~200 μg
miRNeasy purification
3.5.2 RNase I treated, selection of ribosome ~15 μg
protected monosome from sucrose
gradient and miRNeasy purification
3.5.3 26–32 nt RNA gel purification ~1 μg
3.5.4 rRNA depletion using Ribo-zero kit ~200 ng
3.5.5 Linker ligation 90% efficiency
e
som
er no
dd o
d er la d
m
d A t e
A
la RN trea
N 4 nt I
D /3 e
bp 6nt as
10 2 RN
50bp
40bp
34bp
30bp
26bp
20bp
10bp
50bp
40bp
30bp
20bp
10bp
3.5.4 Ribo-Zero rRNA 1. Conduct the rRNA depletion using Ribo-zero rRNA removal
Depletion kit following the manufacturer’s protocol.
2. Concentrate rRNA-depleted sample using Zymo RNA Clean
and Concentrator kit according to the manufacturer’s proto-
col. At the final elution step, use 8 μL of RNase-free water,
which will give 7 μL final elute.
r
ke
ar
m
A
N ts
n tR pr
in
34 t
nt
/ oo r
er
F ke
26 k +l
in
lin
+
Fig. 5 Confirm the efficiency of linker ligation. Zymo RNA Clean and
Concentrator-5 purified linker ligated RNA marker or footprints. Un-ligated link-
ers are completely removed by the Zymo kit, therefore do not need this gel
running steps for experimental samples
r
ke
ar
m
A
RN
4 nt t
t/3 rin
r n otp
de ly of 26 fo
la
d
on t of
A er duc u ct
N od
D rim ro pr
bp T p T p
10 R R RT
3.5.7 Indexed Library 1. PCR amplification of circularized library using Phusion poly-
Generation merase according to the manufacturer’s instruction. Run several
different cycles (i.e., 4, 6, and 8 cycles) for comparison. Use dif-
ferent barcode reverse primers (Barcode_Reverse_##) for differ-
ent samples with a single forward primer (Barcode_Forward).
2. Add 6× DNA loading dye to each PCR product and prepare
1 μL 10 bp DNA ladder as control.
3. Separate the PCR products using 8% polyacrylamide TBE gel,
for 90 min at 180 V in 0.5× TBE buffer, or until the bromo-
phenol blue dye reaches to two-third of the gel (see Fig. 7).
4. Stain the gel for 3 min in 1× SYBR Gold in 0.5× TBE buffer.
5. Excise the ~175 nt PCR product from the gel and carefully
exclude slowly migrating smeared band (above 200 nt), bands
from un-extended reverse transcription (~145 nt), template
and primer (less than 100 nt) (see Fig. 7).
6. Overnight gel extraction. Add 400 μL of DNA gel extraction
buffer and freeze the sample for 30 min on dry ice. Leave the
sample overnight (15 h is recommended) with gentle mixing.
Collect the entire solution and transfer it to non-stick RNase-
free eppi tubes. Precipitate the footprints by adding 2 μL
Glycoblue and 500 μL isopropanol. Carry out precipitation for
30 min or more on dry ice. Centrifuge at maximum speed at
4 °C for 30 min using tabletop centrifuge. Completely remove
16 Hyun Yong Jin and Changchun Xiao
10bp DNA
Sample 1 Sample 2 Sample 3
(Barcode-01) (Barcode-02)(Barcode-03)
ladder
4 6 8 4 6 8 4 6 8 :PCR cycles
100bp
20bp
330bp
After gel cutting
Avoid 145nt-long,
100bp
un-extended RT product
the supernatant and air dry for 5 min. Dissolve the footprint in
15 μL of 10 mM Tris–HCl (pH 8.0).
7. Measure DNA concentration and combine equal amount from
each sample for Illumina sequencing.
3.6 Illumina 1. The end product of this protocol results in a fully functional
Sequencing sequencing library compatible with Illumina HiSeq, MiSeq,
and NextSeq sequencing platforms. The user should follow
standard procedures recommended by Illumina for adjusting
the final library concentration for loading onto an Illumina
flow-cell and sequencing appropriate for the sequencer used.
We recommend a minimum of single-end 50 bp read lengths
with six base indexing reads for downstream demultiplexing as
described here. Sequencing protocols for polysome fraction
and ribosome profiling are identical.
An Integrated Polysome Profiling and Ribosome Profiling Method… 17
4 Notes
Acknowledgments
References
1. Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, tion using ribosome profiling. Science
Herschlag D (2003) Genome-wide analysis of 324(5924):218–223. https://doi.org/10.1126/
mRNA translation profiles in Saccharomyces Science.1168978
cerevisiae. Proc Natl Acad Sci U S A 5. Ingolia NT (2014) Ribosome profiling: new
100(7):3889–3894. https://doi.org/10.1073/ views of translation, from single codons to
pnas.0635171100. [pii] 0635171100 genome scale. Nat Rev Genet 15(3):205–213.
2. Lackner DH, Beilharz TH, Marguerat S, Mata J, https://doi.org/10.1038/nrg3645. [pii]
Watt S, Schubert F, Preiss T, Bahler J (2007) A nrg3645
network of multiple regulatory layers shapes gene 6. Nedialkova DD, Leidel SA (2015)
expression in fission yeast. Mol Cell 26(1):145– Optimization of codon translation rates via
155. https://doi.org/10.1016/j.mol- tRNA modifications maintains proteome integ-
cel.2007.03.002. [pii] S1097-2765(07)00147-5 rity. Cell 161(7):1606–1618. https://doi.
3. Steitz JA (1969) Polypeptide chain initiation: org/10.1016/j.cell.2015.05.022. [pii]
nucleotide sequences of the three ribosomal S0092-8674(15)00571-1
binding sites in bacteriophage R17 7. Lareau LF, Hite DH, Hogan GJ, Brown PO
RNA. Nature 224(5223):957–964 (2014) Distinct stages of the translation elon-
4. Ingolia NT, Ghaemmaghami S, Newman JRS, gation cycle revealed by sequencing ribosome-
Weissman JS (2009) Genome-wide analysis protected mRNA fragments. elife 3:e01257.
in vivo of translation with nucleotide resolu- https://doi.org/10.7554/eLife.01257
18 Hyun Yong Jin and Changchun Xiao
8. Guydosh NR, Green R (2014) Dom34 rescues 17~92. PLoS Genet 13 (2):e1006623. doi:
ribosomes in 3′ untranslated regions. Cell 10.1371/journal.pgen.1006623
156(5):950–962. https://doi.org/10.1016/j. 14. Jin HY, Gonzalez-Martin A, Miletic A, Lai M,
cell.2014.02.006. [pii] Knight S, Sabouri-Ghomi M, Head SR,
S0092-8674(14)00162-7 Macauley MS, Rickert R, Xiao C (2015)
9. Payne SH (2015) The utility of protein and Transfection of microRNA mimics should be
mRNA correlation. Trends Biochem Sci 40(1):1– used with caution. Front Genet 6:340. https://
3. https://doi.org/10.1016/j.tibs.2014.10.010. doi.org/10.3389/fgene.2015.00340
[pii] S0968-0004(14)00202-3 15. Ingolia NT, Brar GA, Rouskin S, McGeachy
10. Schott J, Reitter S, Philipp J, Haneke K, AM, Weissman JS (2012) The ribosome profil-
Schafer H, Stoecklin G (2014) Translational ing strategy for monitoring translation in vivo
regulation of specific mRNAs controls feed- by deep sequencing of ribosome-protected
back inhibition and survival during macro- mRNA fragments. Nat Protoc 7(8):1534–1550.
phage activation. PLoS Genet https://doi.org/10.1038/nprot.2012.086.
10(6):e1004368. https://doi.org/10.1371/ [pii] nprot.2012.086
Journal.Pgen.1004368. Artn E1004368 16. Jin HY, Oda H, Lai M, Skalsky RL, Bethel K,
11. Schafer S, Adami E, Heinig M, Rodrigues KE, Shepherd J, Kang SG, Liu WH, Sabouri-Ghomi
Kreuchwig F, Silhavy J, van Heesch S, Simaite M, Cullen BR, Rajewsky K, Xiao C (2013)
D, Rajewsky N, Cuppen E, Pravenec M, MicroRNA-17~92 plays a causative role in lym-
Vingron M, Cook SA, Hubner N (2015) phomagenesis by coordinating multiple onco-
Translational regulation shapes the molecular genic pathways. EMBO J 32(17):2377–2391.
landscape of complex disease phenotypes. Nat https://doi.org/10.1038/emboj.2013.178.
Commun 6:7200. https://doi.org/10.1038/ [pii] emboj2013178
ncomms8200. [pii] ncomms8200 17. Cho J, Yu NK, Choi JH, Sim SE, Kang SJ, Kwak
12. Liu BT, Han Y, Qian SB (2013) Cotranslational C, Lee SW, Kim JI, Choi DI, Kim VN, Kaang BK
response to proteotoxic stress by elongation (2015) Multiple repressive mechanisms in the
pausing of ribosomes. Mol Cell 49(3):453–463. hippocampus during memory formation. Science
https://doi.org/10.1016/J. 350(6256):82–87. https://doi.org/10.1126/
Molcel.2012.12.001 science.aac7368. [pii] 350/6256/82
13. Jin HY, Oda H, Chen P, Yang C, Zhou X, 18. Ingolia NT, Brar GA, Stern-Ginossar N, Harris
Kang SG, Valentine E, Kefauver JM, Liao L, MS, Talhouarne GJ, Jackson SE, Wills MR,
Zhang Y, Gonzalez-Martin A, Shepherd J, Weissman JS (2014) Ribosome profiling reveals
Morgan GJ, Mondala TS, Head SR, Kim P-H, pervasive translation outside of annotated protein-
Xiao N, Fu G, Liu W-H, Han J, Williamson JR, coding genes. Cell Rep 8(5):1365–1379. https://
Xiao C (2017) Differential Sensitivity of Target doi.org/10.1016/j.celrep.2014.07.045. [pii]
Genes to Translational Repression by miR- S2211-1247(14)00629-9
Chapter 2
Abstract
A complete understanding of transcription and co-transcriptional RNA processing events by polymerase
requires precise and robust approaches to visualize polymerase progress and quantify nascent transcripts
on a genome-wide scale. Here, we present a transcriptome-wide method to measure the level of nascent
transcribing RNA in a fast and unbiased manner.
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_2, © Springer Science+Business Media, LLC 2018
19
20 Fei Xavier Chen et al.
2 Materials
3 Methods
3.1 Preparation 1. Culture 1–5 × 107 cells for each nascent RNA-seq sample (see
of Cell Cultures Note 1).
2. For adherent cells, harvest the cells with trypsin-EDTA or by
scraping in PBS; for suspension cells, pellet the cells by spin-
ning at 300 × g for 5 min. Discard the supernatant.
3. Resuspend the pellet with 10 mL of ice-cold PBS and transfer
the cells to a 15 mL conical tube. Spin at 300 × g for 5 min at
22 Fei Xavier Chen et al.
4 °C to pellet cells. Repeat this wash two more times. Cell
pellet can be used directly for nuclei isolation or be flash-frozen
in liquid nitrogen and stored at −80 °C until use, with the
former recommended to avoid the interference in nuclei isola-
tion by the freezing and thawing of cells.
3.2 Isolation of Cell 1. Resuspend the cell pellet in 10 mL of ice-cold Hypotonic
Nuclei Buffer with protease inhibitor and DTT. Incubate the cell
suspension on ice for 15 min.
2. Centrifuge the cell suspension at 200
× g for 10 min at
4 °C. Discard the supernatant.
3. Resuspend the pellet in 5 mL of cold Hypotonic Buffer with
protease inhibitor and DTT.
4. Transfer cell suspension into a precooled 7 mL Dounce tissue
grinder. Homogenize the suspension with tight pestle with ~10
strokes. Check cell disruption under the microscope with Trypan
Blue staining. Aim for ~90% disruption (Disrupted cells can take
up the dye, unbroken cells cannot). Do not over-disrupt the cells.
5. Pellet the nuclei by spinning at 600 × g for 10 min at 4 °C. Discard
the supernatant that contains the cytoplasmic fraction.
6. Resuspend the pellet with 5 mL of ice-cold Nuclei Wash Buffer
with DTT, protease and RNase inhibitor. Spin at 1500 × g for
5 min at 4 °C and discard the supernatant. Repeat this wash
once more.
3.3 Purification 1. Prepare 1× NUN buffer by mixing 50% volume of 2 M urea
of Nascent RNA solution (fresh and filtered) and 50% volume of ice-cold 2×
NUN Buffer supplemented with DTT, protease and RNase
inhibitor (see Notes 2 and 3).
2. Suspend the pellet and disrupt the nuclei with 1 mL of 1× NUN
buffer by pipetting up and down vigorously for 10–15 times.
3. Immediately transfer the suspension into a 1.5 mL RNase-free
Eppendorf tube. Incubate on a nutator or rotating wheel for
5 min at 4 °C.
4. Pellet the chromatin by spinning at 1000 × g for 3 min at
4 °C. Carefully remove the supernatant using pipette instead
to vacuum to avoid losing the chromatin pellet (see Note 4).
5. Resuspend the chromatin pellet carefully with 1 mL of 1× NUN
buffer. Incubate on a nutator or rotating wheel for 5 min at
4 °C. Remove the supernatant carefully with pipette. Repeat this
wash twice more. For the last spinning, use 5000 × g for 5 min at
4 °C (see Note 5).
6. Add 1 mL of TRIzol Reagent into chromatin pellet and vortex
for 30 s. The chromatin pellet should be still visible.
Measuring Nascent Transcripts by Nascent-seq 23
3.4 Preparation 1. Check nascent RNA quality with either the Agilent RNA 6000
of Nascent RNA Nano Kit (for 5–500 ng/μL of starting material) or Pico Kit
Library (for 50–5000 pg/μL of starting material) on the 2100 Agilent
Bioanalyzer. Normally, the average size of nascent RNA is larger
than total RNA due to the lack of co-transcriptional RNA
splicing. And the amount of rRNA in nascent RNA is relatively
less than that in total RNA (Fig. 2) (see Note 6).
Fig. 1 The level of transcripts at exon and intron on Actb for nascent and total
RNA in mouse embryonic stem cells. It is normalized on the level of exon
24 Fei Xavier Chen et al.
nascent RNA
[FU]
20 25 30 35 40 45 50 55 60 65 [s]
total RNA
28s rRNA
[FU] 18s rRNA
100
50
0
20 25 30 35 40 45 50 55 60 65 [s]
3.5 Next-Generation 1. Pool uniquely indexed libraries evenly. This is achieved using
Sequencing the measured library concentration (Section 3.4, Step 4) and
the average bp size (Section 3.4, Step 5) to calculate the con-
centration (nM) of each sample. The libraries are pooled so as
to have equal amount (fmol) of each sample.
2. Denature and dilute libraries as instructed in Sequencing
documentation.
3. Sequence on an Illumina sequencer (see Note 9).
Measuring Nascent Transcripts by Nascent-seq 25
4 Notes
References
1. Holoch D, Moazed D (2015) RNA-mediated 4. Liu X, Bushnell DA, Kornberg RD (2013) RNA
epigenetic regulation of gene expression. Nat polymerase II transcription: structure and mech-
Rev Genet 16:71–84. https://doi. anism. Biochim Biophys Acta 1829:2–8.
org/10.1038/nrg3863 https://doi.org/10.1016/j.
2. Orom UA, Shiekhattar R (2013) Long noncod- bbagrm.2012.09.003
ing RNAs usher in a new era in the biology of 5. Bentley DL (2014) Coupling mRNA processing
enhancers. Cell 154:1190–1193. https://doi. with transcription in time and space. Nat Rev
org/10.1016/j.cell.2013.08.028 Genet 15:163–175. https://doi.org/10.1038/
3. Smith E, Shilatifard A (2013) Transcriptional nrg3662
elongation checkpoint control in development 6. Core LJ, Waterfall JJ, Lis JT (2008) Nascent
and disease. Genes Dev 27:1079–1088. https:// RNA sequencing reveals widespread pausing and
doi.org/10.1101/gad.215137.113 divergent initiation at human promoters. Science
26 Fei Xavier Chen et al.
Abstract
Shallow whole genome sequencing has recently been introduced for genome-wide detection of chromosomal
copy number alterations (CNAs) in preimplantation genetic diagnosis (PGD), using only 4–7 trophectoderm
cells biopsied from day-5 embryos. This chapter describes the complete method, starting from whole genome
amplification (WGA) on isolated blastomere(s), up to data analysis for CNA detection. The process is described
generically and can also be used to perform CNA analysis on a limited number of cells (down to a single cell) in
other applications. This unique description also includes some tips and tricks to increase the chance of success.
Key words Massive parallel sequencing (MPS), Shallow whole genome sequencing, Preimplantation
genetic diagnosis (PGD), Whole genome amplification (WGA), Copy number alterations (CNAs)
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_3, © Springer Science+Business Media, LLC 2018
27
28 Lieselot Deleye et al.
This kit is suited for DNA input amounts as low as 500 pg. The
needed library preparation method depends on the downstream
sequencing technology. In this protocol, Illumina NextSeq500
sequencing is described. Ion Torrent sequencing will show similar
results as demonstrated in a previous report [1].
2 Materials
2.4 Library Quality 1. Quality control using capillary gel electrophoresis: High-
Control Sensitivity DNA kit (lab-chip) and 2100 Bioanalyzer (Agilent
and Quantification Technologies).
2.
Quantification of adapter-ligated fragments: Sequencing
library qPCR quantification guide (Illumina, San Diego,
CA, USA).
3. Sequencing on an Illumina instrument (see Note 4).
3 Methods
3.1 Whole Genome 1. Isolate the necessary amount of cells in less than 2.5 μL of cell
Amplification medium or PBS in a 0.2 mL tube (see Note 5).
2. Dilute with the appropriate volume of Cell Extraction Buffer
to achieve a total sample volume of 5 μL (see Note 6).
3. Prepare a positive and negative control. Take 2.5 μL of PBS as
negative control and take 2.5 μL of the diluted positive
control.
4. Prepare an extraction cocktail for at least five samples (positive
and negative control included) (see Note 7). Combine 24 μL
of Extraction Enzyme Dilution Buffer and 1 μL of Cell
Extraction Enzyme in a 0.2 mL tube and mix by flicking the
tube. Add 5 μL of this cocktail to each sample (see Note 8).
CNA Detection in Preimplantation Genetic Diagnosis 31
Proceed immediately.
5. Prepare a pre-amp cocktail for at least five samples. Combine
24 μL of Pre-Amp buffer and 1 μL of Pre-Amp Enzyme in a
0.2 mL tube and mix by flicking the tube. Add 5 μL of the
cocktail to each sample (see Note 8). Incubate the samples in a
pre-programmed thermocycler:
2 min 95 °C
15 s 95 °C
50 s 15 °C
40 s 25 °C
12 cycles
30 s 35 °C
40 s 65 °C
40 s 75 °C
Hold 4 °C
2 min 95 °C
15 s 95 °C
1 min 65 °C 14 cycles
1 min 75 °C
Hold 4 °C
11. Add 375 μL Binding buffer to each sample and transfer this
mix to the spin column (see Note 9).
12. Centrifuge in a microfuge at 11,000 rpm (11,092 × g) for 25 s.
13. Discard the flow-through and pat the collector tube dry.
Re-use the same collector tube.
14. Wash the spin columns with 200 μL Wash buffer.
15. Centrifuge in a microfuge at 11,000 rpm (11,092 × g) for 25 s.
16. Repeat step 14.
17. Centrifuge in a microfuge at 11,000 rpm (11,092 × g) for 25 s
(see Note 10).
18. Transfer the spin columns to a new 1.5 mL tube.
19. Elute the DNA in 32 μL of the pre-warmed molecular biology
grade water (see Note 11) and incubate for 1 min at room
temperature.
20. Centrifuge in a microfuge at 11,000 rpm (11,092 × g) for 35 s.
21. Remove the spin column and store the 1.5 mL tubes at −20 °C
or perform a quality check before storing. Measure the con-
centration with Qubit (see Note 12). Analyze the sample on a
high sense Agilent lab-chip (see Note 13).
3.2 Fragmentation 1. Dilute 100 ng of the WGA product with 1/5 × TE buffer in
of Purified WGA 130 μL in microTUBES.
Product 2. Adjust the settings of the Covaris S2 for a 200 bp fragmenta-
tion: duty cycle of 10%, intensity of 5, 200 cycles/burst, and a
treatment time of 190 s (see Note 14).
3.3 Library 1. Make sure the heated lid of the thermocycler is set on 75 °C.
Preparation 2. Add 7 μL of End repair reaction buffer to empty 0.2 mL
tubes. Transfer 50 μL of each fragmented sample into a
0.2 mL tube with buffer. Add 3 μL of End prep enzyme mix
to each sample and mix by flicking the tube. Incubate the
samples immediately in the pre-programmed thermocycler
with heated lid on 75 °C:
30 min 20 °C
30 min 65 °C
Hold 4 °C
Proceed immediately.
During incubation, put a thermomixer at 20 °C and pre-program
another thermocycler at 37 °C with a heated lid at 47 °C. Make
sure the temperature of the thermocycler is already at 37 °C
when the samples are introduced. If DNA input was less than
100 ng, dilute the adapter to a finale concentration of 1.5 μM
(see Note 15).
CNA Detection in Preimplantation Genetic Diagnosis 33
30 s 98 °C
10 s 98 °C
9 cycles (see Note 22)
75 s 65 °C
5 min 65 °C
Hold 4 °C
34 Lieselot Deleye et al.
3.4 Library Quality Library quality control is performed by capillary gel electrophore-
Control sis on a Bioanalyzer High sensitivity lab-chip. The result is dis-
and Quantification played as an electropherogram (EPG) showing intensity in function
of fragment-size distribution. The library should contain DNA
fragments of around 300 bp. Figure 1a shows an EPG of a good
quality library. The other two smaller peaks represent the internal
standards (upper and lower marker). The intensity of the library
informs about the concentration of the library (see Note 25). A
peak visible around 85 bp, as the one shown in Fig. 1b, indicates
the presence of primer-dimers (see Note 26).
The quantification of adapter-ligated fragments in the library is
performed by qPCR. Only fragments carrying an adapter will bind
to the flowcell, and therefore will be sequenced. qPCR is per-
formed as recommended by Illumina. Make sure the libraries are
diluted to fit the standard curve used for qPCR. The concentration
measured after qPCR reflects the amount of DNA that can be
sequenced (see Note 27).
The libraries are further prepared for sequencing using the stan-
dard Illumina protocols. Sequencing is performed on a NextSeq500
using a high output flowcell for single read and 75 cycles. The
sequencing run will last for about 11 h.
3.5 Data Analysis Several tools and programs are available to detect CNAs using shal-
low, genome-wide massively parallel sequencing data [11, 12].
However, most of these tools only handle a certain part of the anal-
ysis. It requires bioinformatics knowledge to handle and combine
these different tools to perform CNA detection starting from raw
CNA Detection in Preimplantation Genetic Diagnosis 35
Fig. 1 Library quality control. (a) An electropherogram of a good quality library with a fragment length
of ±300 bp. (1) Lower marker, (2) Library, (3) Upper marker. (b) An electropherogram of a library with
primer-dimers (4)
Fig. 2 Creating a new project in ViVar. First, select ‘Projects’ from the top navigation bar (1), then click the ‘New
project’ button to create a new project (2). After filling out the project’s information, the newly created project
will appear in the projects list (3)
Fig. 3 Starting a new sequencing experiment in ViVar. A new experiment/analysis based on massively parallel
sequencing data can be started by clicking the ‘Next’ button in the ‘Sequencing data’ field (red circle)
Fig. 4 Setting the parameters for a new analysis in ViVar. Once a new analysis based on massively parallel
sequencing data has been created, parameters for the analysis can be specified: the project to which the
experiment should be added, the sample(s) which should be analyzed, the reference genome which should be
used in the analysis, and the bin size (1). Analysis can be started by switching the tumble button from ‘OFF’ to
‘ON’ (2) and clicking the ‘Next…’ button (3)
Fig. 5 Some of the visualizations possible with ViVar. Karyo plot (a); line view plot (b); genome heatmap (c); and
chromosome view (d)
CNA Detection in Preimplantation Genetic Diagnosis 39
Fig. 6 Visualizing results in ViVar. Results can be visualized in ViVar by ticking the checkbox in front of the name
of the experiment you want to visualize (1) and choosing the visualization of interest from the dropdown list (2)
4 Notes
20. Make sure each sample is assigned a unique index. The specific
combination of indexes is only important if less than seven
samples are pooled (see kit documentation). Dual-indexing is
used when more than 24 samples are pooled.
21. In samples where template DNA concentrations are rather low,
tRNA is added as a carrier that will adsorb to most of the tube
wall, resulting in a more concentrated template DNA in the rest
of the tube. Higher concentration increases the efficiency of
primer annealing to the template. More template will be effi-
ciently amplified and the amount of primer-dimers will decrease.
22. The number of cycles depends on the input amount of DNA.
Decrease the number of cycles if possible, to decrease amplifica-
tion bias.
23. Make sure the beads are well mixed before use. When removing
the supernatant, check your tip for beads. If beads are visible,
transfer the liquid back into the right tube and wait again until
the liquid is clear.
24. The protocol states not to dry the beads until they are cracked.
Nevertheless, based on our experience, the elution efficiency
increases when the ethanol is completely evaporated.
25. The intensity of both internal standards should be similar. If this
is not the case, the concentration measurements are not reliable.
26. Intense primer-dimer peaks should be removed from the
library. They will skew qPCR results, because they also contain
the binding place for the qPCR primers and thus yield an unre-
liable quantification of adapter-ligated fragments. The dimers
are removed by size selection on gel, as described before. The
bright band of 200–400 bp is cut from the gel and the dimers
are left behind nearly at the end of the gel. Primer-dimers
might also be avoided by decreasing the primer input during
enrichment PCR or decreasing the amount of samples pre-
pared simultaneously. A peak at approx.125 bp on the EPG
indicates the presence of adapter-dimers. They will compete
with the library-fragments for binding places on the flowcell.
Only fragments carrying the correct adapters on both ends will
bind to the flowcell. Measuring the amount of DNA from the
library that will actually bind to the flowcell is essential to over-
come over- or underclustering during the sequencing run and
can only be achieved using qPCR.
27. When the fragment length between the samples and PhiX is
different, a size correction is performed on the measured con-
centration after qPCR. PhiX has a fragment length of 500 bp,
while the samples have fragment lengths of only 300 bp. For
size correction, the measured concentration is multiplied by
the ratio of PhiX length over library length.
42 Lieselot Deleye et al.
References
1. Deleye L, Dheedene A, De CD, Sante T, 9. Deleye L, De Coninck D, Christodoulou C,
Christodoulou C, Heindryckx B et al (2015) Sante T, Dheedene A, Heindryckx B et al
Shallow whole genome sequencing is well suited (2015) Whole genome amplification with
for the detection of chromosomal aberrations in SurePlex results in better copy number altera-
human blastocysts. Fertil Steril 104:1276–1285 tion detection using sequencing data compared
2. Fiorentino F, Biricik A, Bono S, Spizzichino L, to the MALBAC method. Sci Rep 5:11711
Cotroneo E, Cottone G et al (2014) 10. de Bourcy CF, De Vlaminck I, Kanbar JN, Wang
Development and validation of a next-genera- J, Gawad C, Quake SR (2014) A quantitative
tion sequencing-based protocol for 24-chro- comparison of single-cell whole genome amplifi-
mosome aneuploidy screening of embryos. cation methods. PLoS One 9:e105585
Fertil Steril 101:1375–1382 11. Duan J, Zhang JG, Deng HW, Wang YP (2013)
3. Wells D, Kaur K, Grifo J, Glassner M, Taylor JC, Comparative studies of copy number variation
Fragouli E et al (2014) Clinical utilisation of a detection methods for next-generation sequenc-
rapid low-pass whole genome sequencing tech- ing technologies. PLoS One 8(3):e59128
nique for the diagnosis of aneuploidy in human 12. Zhao M, Wang Q, Wang Q, Jia P, Zhao Z (2013)
embryos prior to implantation. J Med Genet Computational tools for copy number variation
51:553–562 (CNV) detection using next-generation sequenc-
4. Yin X, Tan K, Vajta G, Jiang H, Tan Y, Zhang C ing data: features and perspectives. BMC
et al (2013) Massively parallel sequencing for chro- Bioinformatics 14(Suppl 11):S1
mosomal abnormality testing in trophectoderm 13. Sante T, Vergult S, Volders PJ, Kloosterman WP,
cells of human blastocysts. Biol Reprod 88:69 Trooskens G, De Preter K et al (2014) ViVar: a
5. Tan Y, Yin X, Zhang S, Jiang H, Tan K, Li J et al comprehensive platform for the analysis and visu-
(2014) Clinical outcome of preimplantation alization of structural genomic variation. PLoS
genetic diagnosis and screening using next gen- One 9:e113800
eration sequencing. Gigascience 3:30 14. Langmead B, Trapnell C, Pop M, Salzberg SL
6. De Vos A, Staessen C, De Rycke M, Verpoest W, (2009) Ultrafast and memory-efficient align-
Haentjens P, Devroey P et al (2009) Impact of ment of short DNA sequences to the human
cleavage-stage embryo biopsy in view of PGD on genome. Genome Biol 10:R25
human blastocyst implantation: a prospective 15. Scheinin I, Sie D, Bengtsson H, van de Wiel MA,
cohort of single embryo transfers. Hum Reprod Olshen AB, van Thuijl HF et al (2014) DNA
24:2988–2996 copy number analysis of fresh and formalin-fixed
7. Los FJ, Van Opstal D, van den Berg C (2004) specimens by shallow whole-genome sequencing
The development of cytogenetically normal, with identification and exclusion of problematic
abnormal and mosaic embryos: a theoretical regions in the genome assembly. Genome Res
model. Hum Reprod Update 10:79–94 24:2022–2032
8. Vanneste E, Voet T, Le CC, Ampe M, Konings P, 16. Olshen AB, Venkatraman ES, Lucito R, Wigler
Melotte C et al (2009) Chromosome instability is M (2004) Circular binary segmentation for
common in human cleavage-stage embryos. Nat the analysis of array-based DNA copy number
Med 15:577–583 data. Biostatistics 5:557–572
Chapter 4
Abstract
MinION is a small form factor sequencer recently retailed by Oxford Nanopore technologies. This lighter-
sized USB3.0-interfaced device uses innovative nanotechnology to generate extra-long reads from libraries
prepared using only standard molecular biology lab equipment. The flexibility and the portability of the
platform makes it ideal for point-of-interest and real-time surveillance applications. However, MinION’s
limited capacity is not enough for the study of specific targets within larger genomes. Apart from just PCR-
amplifying regions of interest, the capture of long reads spanning the edges of known-unknown genomic
regions is of great importance for structural studies, such as the identification of mobile elements’ integra-
tions sites, bridging over low complexity repetitive regions etc.
In this study, using MinION-kit-included and commercially available reagents, we have developed an
easy and versatile wet-lab procedure for the targeted enrichment of MinION libraries, capturing DNA
fragments of interest before the ligation of the sensitive MinION sequencing-adapters. This method allows
for simultaneous target-enrichment and barcode-multiplexing of up to 12 libraries, which can be loaded
in the same sequencing run.
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_4, © Springer Science+Business Media, LLC 2018
43
44 Timokratis Karamitros and Gkikas Magiorkinis
Fig. 1 Standard library-preparation for MinION genomic DNA sequencing. (a) Preprocessing of genomic DNA:
Shearing is performed with g-tubes usually aiming in >8 Kbp fragments generation. The fragments are end-
repaired and d-A tailed using standard library preparation modules. (b) Ligation of protein-conjugated (rhom-
bus) MinION adapters (red) is followed by the attachment of the Tether (green), which anchors-saturate the
fragments onto the nanopores flowcell. The library is conditioned and loaded to the MinION sequencer after
this step
now available and the included PCR barcode adapters can be used
instead of the standard ones—those for low-input DNA—for mul-
tiplexing the libraries in this protocol.
It is important to note that this method defers from just ampli-
fying the targets using barcoded primers and following the library
preparation for amplicon-sequencing, which would only amplify
the intended sequences. Target enrichment allows the capture of
sequences flanking the target itself, which is extremely useful for
phasing and structural studies, especially when the flanks of the tar-
get are unknown: identification of viral and other mobile elements’
integration sites, genomic rearrangements, large-scale insertions/
deletions and bridging over repetitive regions are a few examples.
This method, in conjunction with MinION’s extra long reads,
opens immense potential in this particular era, as we have shown
that the capture of flanks more than 2000 bp long is feasible [12].
2 Materials
Fig. 2 The MinION target-enrichment library-preparation process. Genomic DNA (black) containing target
sequences (orange) is sheared with G-tubes aiming in 5–6 Kbp fragments generation. PCR adapters (blue) are
ligated to the end-repaired fragments. Biotinylated baits (blue—green) (optionally PCR-generated) are hybrid-
ized with the DNA fragments. Enrichment for the targeted sequences is achieved after the capture of the
hybrids on streptavidin-coated beads and magnetic separation from irrelevant sequences. PCR-adapter-
compatible primers (blue) (optionally barcoded—light-green) amplify the captured fragments. The ligation of
the protein-conjugated sequencing-adapters is followed as normal afterward
2.2 Additional Kits 1. SeqCap Hybridization and Wash Kits (Roche Diagnostics,
Indianapolis, IN, USA).
2. QIAquick Gel Extraction Kit (Qiagen, Valencia, CA, USA).
3. Quant-iT PicoGreen dsDNA (ThermoFisher Scientific).
3 Methods
3.2 Preparation 1. Shear the genomic DNA (2 μg of DNA in 85 μL TE) in 5000–
of the Genomic DNA 6000 bp fragments using g-Tubes.
2. Blunt-end repair by adding: 85 μL of sheared DNA, 10 μL of
reaction buffer, and 5 μL of enzyme mix and incubating at
20 °C for 30 min.
3. Purify with 1× Agencourt AMPure XP beads, elute in 25 μL of
TE buffer.
4. Perform dA-tailing by adding: 25 μL of end-repaired genomic
DNA, 3 μL of reaction buffer and 2 μL of enzyme.
5. Ligate the PCR adapters (PCA—included in the Genomic
DNA Sequencing kit) by adding: 30 μL of dA-tailed Genomic
DNA, 20 μL of MinION PCR adapters, and 50 μL of 2×
Blunt/TA Ligase mastermix, incubate at room temperature
for 20 min (see Note 3).
6. Purify with 1× Agencourt AMPure XP beads, elute in 16 μL of
TE buffer.
7. Quantify the resulting PCR-adapter-ligated Genomic DNA
with PicoGreen (use 1 μL) and keep it in ice.
3.3 Hybridization 1. Follow the protocol of SeqCap Hybridization and Wash Kit
and Capture after modifying the initial hybridization mixture: add 500 ng
of genomic DNA (diluted in 15 μL TE) (final volume/2) μL
of 2× Hybridization buffer and (final volume/10) μL
Hybridization component A.
2. Denature the hybridization mix at 95 °C for 5 min, and then
add immediately 8 μL of baits (see Note 4).
3. Follow the SeqCap Hybridization and Wash Kit protocol up to
the final washing step of streptavidin beads.
4. Resuspend the pelleted beads in 48 μL of PCR-grade water.
5. Amplify the captured fragments using the provided Primer
Mix (PRM—included in the Genomic DNA Sequencing kit)
(see Note 5). Add 50 μL of LongAmp Taq 2× mastermix
(M0287S, New England BioLabs, Hitchin, UK), 2 μL of
MinION primers (barcoded if necessary) and the resus-
pended beads. Run PCR program as follows: initial denatur-
ation 95 °C for 3 min, 18 amplification cycles of 95 °C for
15 s, 62 °C for 15 s, 65 °C for 4 min, final elongation 65 °C
for 8 min and 4 °C hold (see Note 6).
6. Remove the streptavidin beads and the baits with a magnetic
rack and purify the product with 1× AMPure XP beads.
7. Quantify with Picogreen and dilute 1 μg of PCR product in
80 μL of PCR-grade water or TE.
MinION Targeted Sequencing 49
3.4 MinION The PCR-amplified captured library can now be used as template
Sequencing-Adapters for the MinION library preparation (see Note 7).
Ligation and Library 1. Add the provided internal control (CS-DNA).
Preparation
2. Follow the MinION genomic DNA library preparation proto-
col for the end-repair, dA-tailing, and the ligation of
sequencing-adapters.
3. Condition and load the library to the sequencer, reloading
library in standard intervals.
4. Bioinformatics Analysis can be performed using MinION ded-
icated tools (see Note 8).
4 Notes
Table 1
Barcode sequences used in the primers of the “PCR barcoding kit”
Primer 5′-Sequence-3′
BC1 GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCT
BC2 GGTGCTGTCGATTCCGTTTGTAGTCGTCTGTTTAACCT
BC3 GGTGCTGGAGTCTTGTGTCCCAGTTACCAGGTTAACCT
BC4 GGTGCTGTTCGGATTCTATCGTGTTTCCCTATTAACCT
BC5 GGTGCTGCTTGTCCAGGGTTTGTGTAACCTTTTAACCT
BC6 GGTGCTGTTCTCGCAAAGGCAGAAAGTAGTCTTAACCT
BC7 GGTGCTGGTGTTACCGTGGGAATGAATCCTTTTAACCT
BC8 GGTGCTGTTCAGGGAACAAACCAAGTTACGTTTAACCT
BC9 GGTGCTGAACTAGGCACAGCGAGTCTTGGTTTTAACCT
BC10 GGTGCTGAAGCGTTGAAACCTTTGTCCTCTCTTAACCT
BC11 GGTGCTGGTTTCATCTATCGGAGGGAATGGATTAACCT
BC12 GGTGCTGCAGGTAGAAAGAAGCAGAATCGGATTAACCT
References
Abstract
Hi-Plex is a suite of methods to enable simple, accurate, and cost-effective highly multiplex PCR-based
targeted sequencing (Nguyen-Dumont et al., Biotechniques 58:33–36, 2015). At its core is the principle
of using gene-specific primers (GSPs) to “seed” (or target) the reaction and universal primers to “drive”
the majority of the reaction. In this manner, effects on amplification efficiencies across the target amplicons
can, to a large extent, be restricted to early seeding cycles. Product sizes are defined within a relatively nar-
row range to enable high-specificity size selection, replication uniformity across target sites (including in
the context of fragmented input DNA such as that derived from fixed tumor specimens (Nguyen-Dumont
et al., Biotechniques 55:69–74, 2013; Nguyen-Dumont et al., Anal Biochem 470:48–51, 2015), and
application of high-specificity genetic variant calling algorithms (Pope et al., Source Code Biol Med 9:3,
2014; Park et al., BMC Bioinformatics 17:165, 2016). Hi-Plex offers a streamlined workflow that is
suitable for testing large numbers of specimens without the need for automation.
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_5, © Springer Science+Business Media, LLC 2018
53
54 Bernard J. Pope et al.
[High]:barcoding
[High]:driving
[Low]:seeding
n-plex
Fig. 1 Schematic illustration of the Hi-Plex PCR mechanisms. In a single reaction, 5′ “heeled” GSPs represent-
ing all targeted amplicons (n amplicons) at low individual concentrations seed the reaction and relatively high
concentrations of abridged adapter primers (universal primers) (shown in red) drive PCR. Full-length adapter
primers (barcoding adapter primers) are included in the PCR only during terminal cycles
Fig. 2 Hi-Plex library structure and (optional) overlapping reads to work with
ROVER and UNDR-ROVER variant-calling software. The centre rectangle repre-
sents the target insert DNA sequence flanked by gene-specific primer (GSP) sites
(blue) and adapter sequences (green). The two reads of a pair are shown in yel-
low. The 5′ end of each read starts with its corresponding gene-specific primer
sequence. The insert size is chosen so that both reads overlap the target insert
sequence completely. The 3′ ends of reads may extend into the adapter sequence
depending on the read length and the presence/absence of insertions/deletions
in the template DNA. The diagram is not to scale. Typically, the insert sequence
will be significantly longer than the primer sequences
2 Materials
2.1 Template DNAs Template DNAs should be prepared using a high quality DNA
extraction system and quantified as accurately as possible by
Picogreen assay (or equivalent) and calibrated instrumentation (see
Note 2).
2.2 General Labware 1. PCR machine (PCR plate must be compatible with the
and Instrumentation thermocycler).
2. Qubit (ThermoFisher Scientific, Waltham, MA, USA).
3. BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA).
4. 96-well PCR plate.
5. Narrow stem transfer pipette (ThermoFisher Scientific).
6.
Eppendorf low binding tubes (Eppendorf, Hamburg,
Germany).
7. MicroAmp Clear adhesive film (ThermoFisher Scientific).
56 Bernard J. Pope et al.
2.4 PCR Primers All the primer sequences are shown in the 5′ to 3′ orientation. All
the primers are suspended in low TE and can be ordered at stan-
dard desalting grade with the exception of the custom sequencing
primers, which should be HPLC purified (or equivalent).
2.4.1 GSPs Combine 5 μL of each individual GSP at 200 μM into a single GSP
pool vessel. We take this to represent an aggregate GSP pool con-
centration of 200 μM. Create a working stock of 50 μM aggregate
GSP pool by diluting an aliquot with molecular grade water.
Example forward GSP (lower case shows universal heel
sequence, upper case shows gene-specific sequence):
ctctctatgggcagtcggtgattTGGTAATAATTTTAGGACACTG-
TAGTTCCTG
Example reverse GSP (lower case shows universal heel
sequence, upper case shows gene-specific sequence):
ctgcgtgtctccgactcagGAACCAATATACAGGACAATGAGTC-
TACA
N501_TSIT_A AATGATACGGCGACCACCGAGATCTACACTAGATCGC
ccatctcatccctgcgtgtctccgactcag
N502_TSIT_A AATGATACGGCGACCACCGAGATCTACACCTCTCTAT
ccatctcatccctgcgtgtctccgactcag
N503_TSIT_A AATGATACGGCGACCACCGAGATCTACACTATCCTCT
ccatctcatccctgcgtgtctccgactcag
N504_TSIT_A AATGATACGGCGACCACCGAGATCTACACAGAGTAGA
ccatctcatccctgcgtgtctccgactcag
(continued)
Hi-Plex for Simple, Accurate, and Cost-Effective Amplicon-based Targeted DNA… 57
N505_TSIT_A AATGATACGGCGACCACCGAGATCTACACGTAAGGAG
ccatctcatccctgcgtgtctccgactcag
N506_TSIT_A AATGATACGGCGACCACCGAGATCTACACACTGCATA
ccatctcatccctgcgtgtctccgactcag
N507_TSIT_A AATGATACGGCGACCACCGAGATCTACACAAGGAGTA
ccatctcatccctgcgtgtctccgactcag
N508_TSIT_A AATGATACGGCGACCACCGAGATCTACACCTAAGCCT
ccatctcatccctgcgtgtctccgactcag
N509_TSIT_A AATGATACGGCGACCACCGAGATCTACACTCGCTAGA
ccatctcatccctgcgtgtctccgactcag
N510_TSIT_A AATGATACGGCGACCACCGAGATCTACACCTATCTCT
ccatctcatccctgcgtgtctccgactcag
N511_TSIT_A AATGATACGGCGACCACCGAGATCTACACCTCTTATC
ccatctcatccctgcgtgtctccgactcag
N512_TSIT_A AATGATACGGCGACCACCGAGATCTACACTAGAAGAG
ccatctcatccctgcgtgtctccgactcag
N513_TSIT_A AATGATACGGCGACCACCGAGATCTACACGGAGGTAA
ccatctcatccctgcgtgtctccgactcag
N514_TSIT_A AATGATACGGCGACCACCGAGATCTACACCATAACTG
ccatctcatccctgcgtgtctccgactcag
N515_TSIT_A AATGATACGGCGACCACCGAGATCTACACAGTAAAGG
ccatctcatccctgcgtgtctccgactcag
N516_TSIT_A AATGATACGGCGACCACCGAGATCTACACGCCTCTAA
ccatctcatccctgcgtgtctccgactcag
N517_TSIT_A AATGATACGGCGACCACCGAGATCTACACGATCGCTA
ccatctcatccctgcgtgtctccgactcag
N518_TSIT_A AATGATACGGCGACCACCGAGATCTACACCTCTATCT
ccatctcatccctgcgtgtctccgactcag
N519_TSIT_A AATGATACGGCGACCACCGAGATCTACACTCCTCTTA
ccatctcatccctgcgtgtctccgactcag
N520_TSIT_A AATGATACGGCGACCACCGAGATCTACACAGTAGAAG
ccatctcatccctgcgtgtctccgactcag
N521_TSIT_A AATGATACGGCGACCACCGAGATCTACACAAGGAGGT
ccatctcatccctgcgtgtctccgactcag
N522_TSIT_A AATGATACGGCGACCACCGAGATCTACACTGCATAAC
ccatctcatccctgcgtgtctccgactcag
N523_TSIT_A AATGATACGGCGACCACCGAGATCTACACGGAGTAAA
ccatctcatccctgcgtgtctccgactcag
N524_TSIT_A AATGATACGGCGACCACCGAGATCTACACAAGCCTCT
ccatctcatccctgcgtgtctccgactcag
(continued)
58 Bernard J. Pope et al.
N701_TSIT_P CAAGCAGAAGACGGCATACGAGATTCGCCTTA
ctccgctttcctctctatgggcagtcggtgat
N702_TSIT_P CAAGCAGAAGACGGCATACGAGATCTAGTACG
ctccgctttcctctctatgggcagtcggtgat
N703_TSIT_P CAAGCAGAAGACGGCATACGAGATTTCTGCCT
ctccgctttcctctctatgggcagtcggtgat
N704_TSIT_P CAAGCAGAAGACGGCATACGAGATGCTCAGGA
ctccgctttcctctctatgggcagtcggtgat
N705_TSIT_P CAAGCAGAAGACGGCATACGAGATAGGAGTCC
ctccgctttcctctctatgggcagtcggtgat
N706_TSIT_P CAAGCAGAAGACGGCATACGAGATCATGCCTA
ctccgctttcctctctatgggcagtcggtgat
N707_TSIT_P CAAGCAGAAGACGGCATACGAGATGTAGAGAG
ctccgctttcctctctatgggcagtcggtgat
N708_TSIT_P CAAGCAGAAGACGGCATACGAGATCCTCTCTG
ctccgctttcctctctatgggcagtcggtgat
N709_TSIT_P CAAGCAGAAGACGGCATACGAGATAGCGTAGC
ctccgctttcctctctatgggcagtcggtgat
N710_TSIT_P CAAGCAGAAGACGGCATACGAGATCAGCCTCG
ctccgctttcctctctatgggcagtcggtgat
N711_TSIT_P CAAGCAGAAGACGGCATACGAGATTGCCTCTT
ctccgctttcctctctatgggcagtcggtgat
N712_TSIT_P CAAGCAGAAGACGGCATACGAGATTCCTCTAC
ctccgctttcctctctatgggcagtcggtgat
N713_ TSIT_P CAAGCAGAAGACGGCATACGAGATCTTATCGC
ctccgctttcctctctatgggcagtcggtgat
N714_ TSIT_P CAAGCAGAAGACGGCATACGAGATTACGCTAG
ctccgctttcctctctatgggcagtcggtgat
N715_ TSIT_P CAAGCAGAAGACGGCATACGAGATGCCTTTCT
ctccgctttcctctctatgggcagtcggtgat
N716_ TSIT_P CAAGCAGAAGACGGCATACGAGATAGGAGCTC
ctccgctttcctctctatgggcagtcggtgat
N717_ TSIT_P CAAGCAGAAGACGGCATACGAGATGTCCAGGA
ctccgctttcctctctatgggcagtcggtgat
N718_ TSIT_P CAAGCAGAAGACGGCATACGAGATCCTACATG
ctccgctttcctctctatgggcagtcggtgat
N719_ TSIT_P CAAGCAGAAGACGGCATACGAGATAGAGGTAG
ctccgctttcctctctatgggcagtcggtgat
N720_ TSIT_P CAAGCAGAAGACGGCATACGAGATTCTGCCTC
ctccgctttcctctctatgggcagtcggtgat
N721_ TSIT_P CAAGCAGAAGACGGCATACGAGATTAGCAGCG
ctccgctttcctctctatgggcagtcggtgat
(continued)
Hi-Plex for Simple, Accurate, and Cost-Effective Amplicon-based Targeted DNA… 59
2.6 DNA Quantitation 1. Quibit ds DNA High Sensitivity Assay Kit (ThermoFisher
and Quality Scientific).
Assessment 2. BioAnalyzer reagents: Agilent High Sensitivity DNA chip kit
(Agilent Technologies).
3. 300 bp MiSeq reagent kit v2 (Illumina, San Diego, CA, USA)
(see Note 3).
4. HiSeq Install Accessories box (Illumina).
3 Methods
3.1 Primer Design Refer to Materials for example sequences. The GSPs, each com-
prised of a 3′ gene-specific portion and a 5′ universal portion,
should be designed so that the resulting amplicons vary little in
size (typically, the range of size difference would be about 30 bp).
The authors frequently design the library “inserts” (i.e., the
sequence to be queried, excluding the gene-specific primer por-
tion) to be close to 100 bp (see Note 4). The gene-specific primers
should also be designed to minimize variability in melting tem-
perature and propensity for primer-dimer formation (see Note 5).
The target melting temperature should be set to consider reaction
stringency and design constraints (see Note 6). The sequence
introduced by the 5′ universal portions of the GSPs are used to
template universal primer-driven amplification for the majority of
Hi-Plex amplification and subsequent barcoding adapter primer
extension. They are also used to template sequencing reactions,
along with additional sequences introduced by the barcoding
adapter primers (see Note 7).
3.2 Hi-Plex PCR 1. Prepare a template document to map the pattern of dual index
barcoding adapter primer-pairs to identify each DNA specimen
in the PCR plate (see Note 8).
2. Prepare a 10 μM (each barcoding adapter primer should be at
10 μM) working stock plate of barcoding adapter primer-pairs
arrayed in the designated pattern. To avoid the risk of PCR
contamination, re-use of working stock barcoding adapter
primer plates should be minimized. Higher concentration
stock plates can be stored frozen in a no-copy facility for ease
of working stock generation.
3. Importantly, the following PCR set-up steps should be con-
ducted on ice so that, when ready, the PCR plate can be trans-
ferred immediately from ice to the PCR machine block
preheated (paused) at 98 °C (see Note 9). Prepare a PCR mas-
ter mix on ice by pipetting up and down and without vortex-
ing. The following example is typical for applying Hi-Plex to
96 samples in a 96-well plate (see Note 10):
Hi-Plex for Simple, Accurate, and Cost-Effective Amplicon-based Targeted DNA… 61
Fig. 3 (a) Representative agarose gel images illustrating the excision of a variety of different multi-specimen
Hi-Plex libraries for different targets and levels of amplicon-plexity. Arrows indicate the libraries. “M” indicates a
size standard marker. (b) A representative BioAnalyser profile for a multi-specimen Hi-Plex library produced using
the current example protocol (688 amplicons for each specimen). The y-axis numbers relate to fluorescence units
(FUs). The outer peaks represent the reference markers and the central peak represents the multi-specimen
library
6. Measure the molarity of the library using the Qubit High sen-
sitivity DNA kit according to the manufacturer’s instructions
(see Note 16). Use 5 μL of library to quantitate by Qubit.
Ultimately, we are aiming to adjust the library to 450 pg/μL
which equates to the approx. 2.3 nM that our current example
protocol requires for denaturation prior to loading on the
MiSeq (see Note 17).
64 Bernard J. Pope et al.
4 Notes
Acknowledgments
References
1. Nguyen-Dumont T, Hammet F, Mahmoodi M 3. Nguyen-Dumont T, Mahmoodi M, Hammet F
et al (2015) Abridged adapter primers increase et al (2015) Hi-Plex targeted sequencing is
the target scope of hi-Plex. BioTechniques effective using DNA derived from archival dried
58:33–36 blood spots. Anal Biochem 470:48–51
2. Nguyen-Dumont T, Pope BJ, Hammet F et al 4. Pope BJ, Nguyen-Dumont T, Hammet F et al
(2013) A high-plex PCR approach for massively (2014) ROVER variant caller: read-pair overlap
parallel sequencing. BioTechniques 55:69–74 considerate variant-calling software applied to
70 Bernard J. Pope et al.
PCR-based massively parallel sequencing datas- clinically ascertained families from the breast
ets. Source Code Biol Med 9:3. https://doi. cancer family registry. Breast Cancer Res Treat
org/10.1186/1751-0473-9-3 149:547–554
5. Park DJ, Li R, Lau E et al (2016) UNDR 8. Hasmademail HN, Laiemail KN, Wenemail WX
ROVER—a fast and accurate variant caller for et al (2016) Evaluation of germline BRCA1
targeted DNA sequencing. BMC Bioinformatics and BRCA2 mutations in a multi-ethnic Asian
17:165. https://doi.org/10.1186/s12859- cohort of ovarian cancer patients. Gynecol
016-1014-9 Oncol 141(2):318–322. https://doi.org/
6. Nguyen-Dumont T, Teo ZL, Pope BJ et al 10.1016/j.ygyno.2015.11.001
(2013) Hi-Plex for high-throughput mutation 9. Nguyen-Dumont T, Pope BJ, Hammet F et al
screening: application to the breast cancer suscep- (2013) Cross-platform compatibility of
tibility gene PALB2. BMC Med Genomics 6: Hi-Plex, a streamlined approach for targeted
48. https://doi.org/10.1186/1755-8794-6-48 massively parallel sequencing. Anal Biochem
7. Nguyen-Dumont T, Hammet F, Mahmoodi M 442:127–129. https://doi.org/10.1016/j.
et al (2015) Mutation screening of PALB2 in ab.2013.07.046
Chapter 6
Abstract
We recently reported a fragmentation-free method for the synthesis of Next-Generation Sequencing
libraries called “ClickSeq” that uses biorthogonal click-chemistry in place of enzymes for the ligation of
sequencing adaptors. We found that this approach dramatically reduces artifactual chimera formation,
allowing the study of rare recombination events that include viral replication intermediates and defective-
interfering viral RNAs. ClickSeq illustrates how robust, bio-orthogonal chemistry can be harnessed in vitro
to capture and dissect complex biological processes. Here, we describe an updated protocol for the synthe-
sis of “ClickSeq” libraries.
Key words ClickSeq, RNAseq, Click-chemistry, Next-generation sequencing, Flock house virus
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_6, © Springer Science+Business Media, LLC 2018
71
72 Elizabeth Jaworski and Andrew Routh
Fig. 1 Prototypical click-chemistry reactions: (a) Copper Catalyzed Alkyne-Azide Cycloaddition (CuAAC); and (b)
copper-free Strain Promoted Alkyne-Azide Cycloaddition (SPAAC)
Fig. 2 A varied range of unnatural triazole-linked nucleic acids generated by click-ligation and demonstrated
to be bio-compatible have been reported
5’ 3’ Steps 3.1.1-3.1.5:
NNNNNN Reverse transcription supplemented with
p7
5’ 5’ 3’ AzNTPs and initiated from semi-random (6N) primer
N3 NNNNNN
containing a partial p7 adaptor sequence generates
p7
a random distribution of azido-terminated cDNAs.
5’ 5’ 3’
N3 NNNNNN
p7
5’
Azido-nucleotide (AzNTP)
NNNNNN
p7
5’
N3 NNNNNN
Step 3.1.6:
p7 RNaseH Treatment to remove template RNA
5’
N3 NNNNNN
p7
5’
Steps 3.2.1-3.2.4:
DNA Zymo Clean to remove RT-PCR reaction
components including free azido-nucleotides.
p5
NNNNNN
p7
5’
p5
N3 NNNNNN
p7
5’
NNNNNN
p7
p5
5’
p5
N
N
NNNNNN
p7
N
p5
5’
N
N NNNNNN
p7
5’
Steps 3.4.1-3.4.5:
DNA Zymo Clean to remove copper ions, DMSO
p5
N
and excess adapters to yield cleaned unnatural
N
NNNNNN
triazole-linked single-stranded DNA.
p7
N
p5
5’
N
N NNNNNN
p7
5’
p5 p7
Fig. 3 Schematic of the ‘ClickSeq’ protocol illustrating the individual steps and the Click-ligation reaction
74 Elizabeth Jaworski and Andrew Routh
2 Materials
2.3 PCR Reaction OneTaq DNA Polymerase 2× Master Mix with standard buffer
(New England Biolabs, Ipswhich, MA, USA).
2.4 Other Reagents 1. E-Gel Precast Agarose electrophoresis system with 2% Agarose
and Equipment gels (ThermoFisher Scientific).
2. Blue light Transilluminator.
3. 100 bp DNA ladder.
4. Zymo DNA Clean and Concentrator-5 (Zymo Research,
Irvine, CA, USA).
5. Zymo Gel DNA Recovery Kit (Zymo Research). (This kit is
the same as the “Clean & Concentrator” with the addition of
the agarose dissolving buffer.)
6. Qubit fluorimeter (ThermoFisher Scientific).
7. Standard Thermocyclers.
8. Standard Tabletop centrifuges.
76 Elizabeth Jaworski and Andrew Routh
2.5 Primers
and Oligos
3 Methods
3.1 Reverse 1. Input RNA: in principle, any input RNA can be used to gener-
Transcription ate RNAseq libraries. We have successfully sequenced viral
genomic RNA, total cellular RNA, poly(A)-selected RNA, and
ribo-depleted RNA. RNA should be provided in pure water,
following standard precautions to avoid RNase activity. In our
lab, we usually aim to provide 100 ng of RNA (see Note 2). No
sample fragmentation is required.
2. The reverse transcription is performed using standard proto-
cols, with the exception that the reaction is supplemented with
small amounts of azido-nucleotides (AzNTPs). Set up RT-PCR
reaction as follows for a 13 μL reaction: 10 μL of H2O, 1 μL of
dNTP:AzNTP mixture at 10 mM (see Notes 3 and 4), 1 μL of
3′Genomic Adapter-6N primer at 100 μM, 1 μL of RNA at
100 ng/μL.
3. Incubate the mixture at 95 °C for 2 min to melt RNA and
immediately cool on ice for >1 min to anneal semi-random
primer. This high melting temperature is tolerated as small
ClickSeq: Replacing Fragmentation and Enzymatic Ligation with Click-Chemistry… 77
3.2 Azido- After cDNA synthesis and RNA digestion, the azido-terminated
Terminated cDNA cDNA must be purified away from the AzNTPs present in the
Purification RT-PCR reaction mix. These small molecules will be in molar excess
of azido-terminated cDNA by many orders of magnitude and will
compete for ligation to the alkyne-modified “click-adaptor” if not
completely removed. This can be achieved in a number of ways; we
prefer to use the Zymo DNA clean protocol due to its ability to
elute in small volumes with minimal carry-over:
1. Take the 20.5 μL RT-PCR reaction, and add 140 μL of Zymo
DNA binding buffer (7:1 binding buffer:DNA).
2. Apply to silica column, and centrifuge for 30–60 s at 14,000 rpm
in a tabletop microfuge, as per the manufacturer’s protocol.
3. Wash with 200 μL of ethanol-containing wash buffer and cen-
trifuge for 30–60 s at 14,000 rpm as per the manufacturer’s
protocol. Repeat for two washes.
4. Elute by centrifugation for 60 s at 14,000 rpm into fresh non-
stick Eppendorf tubes using 10 μL of 50 mM HEPES pH 7.2
or water (see Note 6).
3.4 Click-Ligated To remove the components of the click-ligation we use the Zymo
cDNA Purification DNA clean protocol due to its ability to elute in small volumes
with minimal carry-over (see Note 12):
1. The click-ligation reaction is first diluted with 60 μL water to a
total volume of 100 μL prior to the addition of the DNA bind-
ing buffer in order to dilute the DMSO.
2. Take 100 μL click-ligation reaction, and add 700 μL Zymo
DNA binding buffer (7:1 binding buffer:DNA).
3. Apply to silica column, and centrifuge for 30–60 s at 14,000
rpm, as per the manufacturer’s protocol.
4. Wash with 200 μL ethanol-containing wash buffer and centri-
fuge for 30–60 s at 14,000 rpm as per the manufacturer’s pro-
tocol. Repeat for two washes.
5. Elute by centrifugation for 60 s at 14,000 rpm into fresh non-
stick Eppendorf tubes using 10 μL 10 mM Tris pH 7.4 or
water.
3.5 Final PCR We have screened a number of cycling conditions and have found
Amplification the following to give the best results:
1. Mix at room temperature for a 50 μL reaction: 5 μL Clean
Click-ligated DNA (in 10 mM Tris pH 7.4) (see Note 13),
2.5 μL of 3′ Indexing Primer (1 barcode/sample) at 5 μM,
2.5 μL of Universal Primer Short [UP-S] at 5 μM, 15 μL of
H2O, 25 μL of 2× One Taq Standard Buffer Master Mix.
2. Cycle on a standard thermocycler using the following steps (see
Note 14):
94 °C 1 min;
55 °C 30 s,
68 °C 10 min
[94 °C 30 s,
55 °C 30 s,
ClickSeq: Replacing Fragmentation and Enzymatic Ligation with Click-Chemistry… 79
68 °C 2 min] × 17 cycles
68 °C 5 min;
4 °C ∞
3. Purify the PCR product with another Zymo DNA clean protocol
(see Note 15):
(a) Take the 50 μL PCR reaction and add 250 μL of Zymo
DNA binding buffer (5:1 binding buffer:DNA).
(b) Apply to silica column, and centrifuge for 30–60 s at
14,000 rpm, as per the manufacturer’s protocol.
(c) Wash with 200 μL of ethanol-containing wash buffer and
centrifuge for 30–60 s at 14,000 rpm as per the manufac-
turer’s protocol. Repeat for two washes.
(d) Elute by centrifugation for 60 s at 14,000 rpm into fresh
non-stick Eppendorf tubes using 20 μL of 10 mM Tris
pH 7.4 or water.
3.6 Gel Extraction 1. Add 20 μL eluted cDNA library onto a 2% agarose precast
and Size Selection prestained e-gel. For multiple samples, run empty wells in
between each sample to prevent cross-contamination of final
libraries. Also run a 100 bp MW ladder.
2. Run using 1–2% agarose protocol for 10 min (E-Gel iBASE
Version 1.4.0; #7).
3. After run has completed, image gel on blue transilluminator
and keep image for records (e.g., Fig. 4a).
Fig. 4 The final cDNA library is analyzed by gel electrophoresis. The library should
appear as a smooth smear as per the example in (a). (b) A library of the desired
size is excised and an image is retained for records
80 Elizabeth Jaworski and Andrew Routh
3.7 Sequencing ClickSeq Libraries can be submitted for either paired-end or sin-
and ClickSeq-Specific gle-end sequencing on Illumina platforms using the adaptor
Data Preprocessing sequences described here. The first read is obtained from the
Illumina universal primer end (p5) end of the cDNA fragment
which is the location of the triazole ring in the original cDNA. The
second read starts from the indexing (p7) adaptor, which is the
site of the random priming in the RT-PCR. During data prepro-
cessing, we recommend trimming the first six nucleotides from
the beginning of both the forward and reverse reads, which cor-
respond to the random “N” nucleotides included in the sequenc-
ing adaptors (see Subheading 2.5). Additionally, in the forward
read, we have found that there is sequence bias in the 4th to 6th
nucleotides for the forward read. These nucleotides correspond to
those flanking the unnatural triazole linkage. In particular, posi-
tion 5 can be occupied with an “A” in up to 80% of the sequence
reads. This position corresponds to the base complementary to
the terminating azido--nucleotide introduced during RT-PCR,
suggesting that either AzTTP is inserted more readily than the
other azido-nucleotides during reverse transcription or that the
click-ligation reaction favors terminal azido-thymine in the cDNA.
Alternatively, it is possible that the PCR amplification step may
preferentially insert an “A” opposite the triazole-linkage regardless
ClickSeq: Replacing Fragmentation and Enzymatic Ligation with Click-Chemistry… 81
4 Notes
l
l
/µ
l
/µ
/µ
l
/µ
ng
ng
ng
ng
8
er
SO
5
% SO
SO
er
SO
SO
.4
.6
.3
9
dd
dd
-5
.3
-3
M
-4
20 M
M
M
La
-2
La
D
D
s
D
in
a b
in
in
p
%
in
p
%
%
m
0b
0b
m
m
50
30
40
10
10
10
10
5
1
Fig. 5 Optimizations of the ClickSeq protocol are demonstrated by gel electrophoresis. (a) Increasing concen-
trations of DMSO in the Click-Ligation reaction improve the final yield of the amplified cDNA library. (b)
Increasing the extension time
Acknowledgments
References
1. Birts CN et al (2014) Transcription of click- DNA): design, synthesis, and d ouble-strand
linked DNA in human cells. Angew Chem Int formation with natural DNA. Org Lett
Ed Engl 53(9):2362–2365 10(17):3729–3732
2. Chen X, El-Sagheer AH, Brown T (2014) 9. Isobe H, Fujino T (2014) Triazole-linked ana-
Reverse transcription through a bulky triazole logues of DNA and RNA ((TL)DNA and (TL)
linkage in RNA: implications for RNA RNA): synthesis and functions. Chem Rec
sequencing. Chem Commun (Camb) 50(57): 14(1):41–51
7597–7600 10. Fujino T et al (2011) Triazole-linked DNA as a
3. Dallmann A et al (2011) Structure and dynam- primer surrogate in the synthesis of first-strand
ics of triazole-linked DNA: biocompatibility cDNA. Chem Asian J 6(11):2956–2960
explained. Chemistry 17(52):14714–14717 11. Shivalingam A, Tyburn AE, El-Sagheer AH,
4. El-Sagheer AH, Sanzone AP, Gao R, Tavassoli Brown T (2017) Molecular requirements of
A, Brown T (2011) Biocompatible artificial high-fidelity replication-competent DNA back-
DNA linker that is read through by DNA bones for orthogonal chemical ligation. J Am
polymerases and is functional in Escherichia Chem Soc 139(4):1575–1583
Coli. Proc Natl Acad Sci U S A 108(28): 12. Kolb HC, Finn MG, Sharpless KB (2001) Click
11338–11343 chemistry: diverse chemical function from a few
5. el-Sagheer AH, Brown T (2011) Efficient RNA good reactions. Angew Chem Int Ed Engl
synthesis by in vitro transcription of a triazole- 40(11):2004–2021
modified DNA template. Chem Commun 13. Baskin JM et al (2007) Copper-free click chem-
(Camb) 47(44):12057–12058 istry for dynamic in vivo imaging. Proc Natl
6. Qiu J, El-Sagheer AH, Brown T (2013) Solid Acad Sci U S A 104(43):16793–16797
phase click ligation for the synthesis of very long 14. El-Sagheer AH, Brown T (2009) Synthesis
oligonucleotides. Chem Commun (Camb) and polymerase chain reaction amplification
49(62):6959–6961 of DNA strands containing an unnatural tri-
7. Sanzone AP, El-Sagheer AH, Brown T, azole linkage. J Am Chem Soc 131(11):
Tavassoli A (2012) Assessing the biocompati- 3958–3964
bility of click-linked DNA in Escherichia Coli. 15. Routh A, Head SR, Ordoukhanian P, Johnson
Nucleic Acids Res 40(20):10567–10575 JE (2015) ClickSeq: fragmentation-free next-
8. Isobe H, Fujino T, Yamazaki N, Guillot- generation sequencing via click ligation of
Nieckowski M, Nakamura E (2008) Triazole- adaptors to stochastically terminated 3′-Azido
linked analogue of deoxyribonucleic acid ((TL) cDNAs. J Mol Biol 427(16):2610–2616
ClickSeq: Replacing Fragmentation and Enzymatic Ligation with Click-Chemistry… 85
Abstract
DNA methylation is an epigenetic mark implicated in the regulation of key biological processes. Using
high-throughput sequencing technologies and bisulfite-based approaches, it is possible to obtain compre-
hensive genome-wide maps of the mammalian DNA methylation landscape with a single-nucleotide reso-
lution and absolute quantification. However, these methods were only applicable to bulk populations of
cells. Here, we present a protocol to perform whole-genome bisulfite sequencing on single cells (scBS-
Seq) using a post-bisulfite adapter tagging approach. In this method, bisulfite treatment is performed prior
to library generation in order to both convert unmethylated cytosines and fragment DNA to an appropri-
ate size. Then DNA fragments are pre-amplified with concomitant integration of the sequencing adapters,
and libraries are subsequently amplified and indexed by PCR. Using scBS-Seq we can accurately measure
DNA methylation at up to 50% of individual CpG sites and 70% of CpG islands.
Key words DNA methylation, High-throughput sequencing, Bisulfite sequencing, Single cell,
Epigenetics
1 Introduction
Methylation of cytosine bases (DNAme) is an epigenetic mark that
regulates gene expression in diverse biological contexts, including
cell fate specification and reprogramming, parental imprinting,
repression of repetitive elements, and X-chromosome inactivation
[1–4]. Development of High-throughput/Next-Generation
Sequencing technologies now enables comprehensive genome-
wide assessment of epigenetic modifications of mammalian cells and
tissues. Regarding DNA methylation (DNAme) of cytosines resi-
dues (CpGs), the gold-standard approach to investigate the distri-
bution and dynamic regulation of this epigenetic mark is whole
genome bisulfite sequencing (WGBS) which offers single-cytosine
resolution and absolute quantification of DNAme levels [5].
WGBS has now been performed on a variety of mouse and
human tissues, in particular but not only, via large consortium
such as ENCODE and NIH Roadmap Epigenomics [6–9].
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_7, © Springer Science+Business Media, LLC 2018
87
88 Heather J. Lee and Sébastien A. Smallwood
2 Materials
1. RNase Away.
2. Low-binding PCR tubes.
3. Low-binding pipette tips (Starlab).
4. Tris–HCl 1 M solution, pH 8.0 (Gibco, 15568-025).
5. Sodium dodecyl sulfate solution 10% (Sigma, 71736).
6. EDTA solution 0.5 M (Sigma, E7889).
7. Proteinase K (Sigma, P4850).
8. H2O molecular biology grade (Gibco, 10977-035).
Genome-Wide Analysis of DNA Methylation in Single Cells Using a Post-bisulfite… 89
3 Methods
This protocol describes the preparation of scBS-Seq libraries. It is
recommended to perform a negative control (no DNA) in parallel
to detect potential contaminations or other artifacts, and a posi-
tive control with 20–50 pg of DNA. Bench, pipettes and racks
used should be thoroughly cleaned with DNA decontamination
solution (RNase away). It is necessary to use low-DNA-binding
plastic ware. A pre-PCR environment is absolutely required to
limit contamination, and the use of a PCR cabined is strongly sug-
gested (in case, UV-treat all reagents not containing oligos or
enzymes prior to use).
3.1 Cell Lysis 1. Prepare a 6.25× lysis buffer stock as follows: for 100 μL, add
6.25 μL of 1 M Tris–HCl, pH 8.0, 37.5 μL of SDS 10%,
0.625 μL of EDTA 0.5 M, 55.6 μL of H2O. This stock can be
kept at 4 °C for 4 weeks.
2. Prepare the active lysis buffer freshly by adding per cells/PCR
tubes: 6 μL of H2O, 2 μL of 6.25× lysis buffer stock, 1 μL of
proteinase K, 1 μL of Lambda DNA (60 fg per μL). It is
important to avoid vortexing the samples, and instead to care-
fully “flick” the PCR tubes with the fingers. Lambda DNA is
used to assess bisulfite conversion efficiency (see Note 1).
3. Incubate at 55 °C for 5 min, 37 °C for 50 min, 55 °C for
5 min. This cell lysate can be stored at −20 °C for 4 weeks (see
Note 2).
3.2 Bisulfite 1. Use the Imprint DNA modification kit, and prepare the DNA
Conversion modification solution according to the manufacturer’s
protocol.
2. Add 0.5 μL of balance solution to the cell lysate from
Subheading 3.1 and mix by pipetting. Incubate for 10 min at
37 °C.
3. Add 50 μL of prepared DNA modification solution, mix by
pipetting, and incubate as follows: 65 °C for 90 min, 95 °C for
3 min, 65 °C for 20 min and hold at 4 °C (see Note 3).
4. Bind the DNA to Purelink PCR micro kit columns using
240 μL of provided Binding buffer (B2). Centrifuge, discard
the supernatant, and wash the column once by centrifugation
using 500 μL of Wash Buffer (W1).
5. Add 100 μL of M-Desulphonation Buffer and incubate at
room temperature for 15 min.
6. Centrifuge the column and wash it twice using 200 μL of Wash
Buffer (W1) following the manufacturer’s recommendations.
7. Elute in 20.5 μL of 10 mM Tris–HCl, pH 8.0.
8. Proceed to Subheading 3.3.
Genome-Wide Analysis of DNA Methylation in Single Cells Using a Post-bisulfite… 91
3.3 Oligo-1 Tagging 1. Add 4.5 μL of complementary strand synthesis mix (1 μL of
dNTPs 10 mM; 2.5 μL of 10× Blue buffer; 1 μL of oligo-1
10 μM) to the sample, and mix by pipetting.
2. Incubate at 65 °C for 3 min and cool down to 4 °C.
3. Add 1 μL of Klenow exo- enzyme and mix by pipetting (see
Note 4).
4. Incubate for 5 min at 4 °C and then rise to 37 °C at a rate of
1 °C every 15 s, and incubate at 37 °C for 30 min. Pause at
4 °C.
5. Proceed to Subheading 3.4.
3.4 Pre-amplification 1. Prepare the pre-amplification mix as follows: for one sample,
add 0.1 μL dNTPs (10 mM stock), 1 μL of oligo-1 (10 μM
stock), 0.25 μL of 10× Blue buffer, 0.5 μL of Klenow exo- and
0.65 μL of H2O.
2. Incubate samples from Subheading 3.3 at 95 °C for 1 min and
cool straight to 4 °C on ice.
3. Add 2.5 μL of pre-amplification and mix by pipetting.
4. Incubate for 5 min at 4 °C and then rise to 37 °C at a rate of
1 °C every 15 s, and incubate at 37 °C for 30 min. Pause at 4 °C.
5. Repeat steps 2–4 for another three times.
6. Proceed to Subheading 3.5.
3.5 Exonuclease 1. Complete the reaction volume to 98 μL with H2O (see Note 5).
I Treatment 2. Add 2 μL of Exonuclease I and mix by pipetting.
and Purification
3. Incubate at 37 °C for 60 min. Pause at 4 °C.
4. Purify using AMPure XP beads according to the manufactur-
er’s instructions with a bead ratio of 0.8× (i.e., add 80 μL of
beads).
5. Elute from beads in 50 μL of 10 mM Tris–HCl, pH 8.0 and
transfer to new PCR tubes.
6. Proceed to Subheading 3.6.
3.7 Oligo-2 Tagging 1. Incubate samples from Subheading 3.6 at 95 °C for 1 min and
cool straight to 4 °C on ice.
2. Add 1 μL of Klenow exo- and mix by pipetting.
3. Incubate for 5 min at 4 °C and then rise to 37 °C at a rate of
1 °C every 15 s, and incubate at 37 °C for 30 min. Pause at
4 °C.
4. Elute from beads in 24 μL of 10 mM Tris–HCl, pH 8.0 and
transfer 23 μL to new PCR tubes.
5. Proceed to Subheading 3.8.
3.9 Library 1. Place the samples from Subheading 3.8 on a PCR magnet.
Purification, QC, 2. Transfer the supernatant into new PCR tubes.
and Quantification
3. Dilute the reaction by adding 50 μL of 10 mM Tris–HCl,
pH 8.0 (see Note 5).
4. Purify using AMPure XP beads according to the manufactur-
er’s instructions with a bead ratio of 0.8× (i.e., add 80 μL of
beads).
5. Elute the library in 15 μL of 10 mM Tris–HCl, pH 8.0.
3.10 Quality Control Prior to sequencing, perform quality control using the BioAnalyzer
platform according to the manufacturer’s instructions (1 μL
required). scBS-Seq libraries are generally characterized by a
unique wide peak (~300–600 bp) (see Fig. 1 and Note 7). In addi-
Genome-Wide Analysis of DNA Methylation in Single Cells Using a Post-bisulfite… 93
Fig. 1 Quality assessment of scBS-seq libraries using Bioanalyzer. (a, b) Typical Bioanalyzer profiles of good
quality and successfully sequenced scBS-seq libraries generated on two separate experiments. This highlights
the variability in protocol efficiency (these examples are on the extremes). (c, d) Below are the corresponding
bioanalyzer profiles for the experimental negative controls (cell omission). Note that size distribution shift
toward shorter fragments compared to single-cell samples, and the yield is much lower. It is nevertheless
required to sequence negative controls
3.11 Sequencing Libraries can be sequenced on the Illumina MiSeq (for test pur-
and Methylation Calls poses) or HiSeq. To obtain the maximum number of informative
CpGs, we recommend using 200 cycle paired-end read (2 × 100 bp)
sequencing kits. The number of reads required per sample depends
obviously on the biological question asked (and the genome size);
in “routine” we normally multiplex 24–48 diploid mammalian
cells per lane resulting in an average of 10–25% informative CpGs.
For subsequent bioinformatics analysis, we recommend that
sequencing quality be first assessed using “FastQC” and poor qual-
ity sequences be trimmed using “Trim Galore!”. In addition, the
first 6 bp corresponding to the oligo-1 random nucleotides can be
trimmed. Subsequently, mapping and methylation calls can be per-
formed using “Bismark” [19]; because of the PBAT strategy and
pre-amplification steps, mapping should be done in single-end/
non-directional mode. These three tools can be freely downloaded
at the Babraham Bioinformatics website (http://www.bioinfor-
matics.babraham.ac.uk/projects/).
94 Heather J. Lee and Sébastien A. Smallwood
4 Notes
References
1. Smith ZD, Meissner A (2013) DNA meth- 11. Smallwood SA, Kelsey G (2012) Genome-
ylation: roles in mammalian development. Nat wide analysis of DNA methylation in low cell
Rev Genet 14:204–220 numbers by reduced representation bisulfite
2. Ferguson-Smith AC (2011) Genomic imprint- sequencing. In: Genomic imprinting. Humana
ing: the emergence of an epigenetic paradigm. Press, Totowa, NJ, pp 187–197
Nat Rev Genet 12:565–575 12. Miura F, Enomoto Y, Dairiki R et al (2012)
3. Smallwood SA, Kelsey G (2012) De novo Amplification-free whole-genome bisulfite
DNA methylation: a germ cell perspective. sequencing by post-bisulfite adaptor tagging.
Trends Genet 28:33–42 Nucleic Acids Res 40:e136
4. Seisenberger S, Peat JR, Hore TA et al (2012) 13. Shirane K, Toh H, Kobayashi H et al (2013)
Reprogramming DNA methylation in the Mouse oocyte methylomes at base resolution
mammalian life cycle: building and breaking reveal genome-wide accumulation of non-CpG
epigenetic barriers. Philos Trans R Soc Lond methylation and role of DNA methyltransfer-
Ser B Biol Sci 368:20110330 ases. PLoS Genet 9:e1003439
5. Harris RA, Wang T, Coarfa C et al (2010) 14. Kobayashi H, Sakurai T, Imai M et al (2012)
Comparison of sequencing-based methods to Contribution of intragenic DNA methyla-
profile DNA methylation and identification tion in mouse gametic DNA methylomes to
of monoallelic epigenetic modifications. Nat establish oocyte-specific heritable marks. PLoS
Biotechnol 28:1097–1105 Genet 8:e1002440
6. Bernstein BE, Stamatoyannopoulos JA, 15. Peat JR, Dean W, Clark SJ et al (2014)
Costello JF et al (2010) The NIH road- Genome-wide bisulfite sequencing in zygotes
map Epigenomics mapping consortium. Nat identifies demethylation targets and maps the
Biotechnol 28(10):1045–1048 contribution of TET3 oxidation. Cell Rep
7. ENCODE Project Consortium (2004) The 9(6):1990–2000
ENCODE (ENCyclopedia Of DNA Elements) 16. Smallwood SA, Lee HJ, Angermueller C et al
project. Science 306(5696):636–640 (2014) Single-cell genome-wide bisulfite
8. Hon GC, Rajagopal N, Shen Y et al (2013) sequencing for assessing epigenetic heteroge-
Epigenetic memory at embryonic enhanc- neity. Nat Methods 11(8):817–820
ers identified in DNA methylation maps 17. Nashun B, Hill PW, Smallwood SA et al (2015)
from adult mouse tissues. Nat Genet Continuous histone replacement by Hira is
45(10):1198–1206 essential for normal transcriptional regulation
9. Xie W, Schultz MD, Lister R et al (2013) and de novo DNA methylation during mouse
Epigenomic analysis of multilineage differen- oogenesis. Mol Cell 60(4):611–625
tiation of human embryonic stem cells. Cell 18. Quail MA, Otto TD, Gu Y et al (2012)
153(5):1134–1148 Optimal enzymes for amplifying sequencing
10. Smallwood SA, Tomizawa S-I, Krueger F et al libraries. Nat Methods 9:10–11
(2011) Dynamic CpG island methylation land- 19. Krueger F, Andrews SR (2011) Bismark: a flex-
scape in oocytes and preimplantation embryos. ible aligner and methylation caller for bisulfite-
Nat Genet 43:811–814 Seq applications. Bioinformatics 27:1571–1572
Chapter 8
Abstract
Sequencing of single bacterial and archaeal cells is an important methodology that provides access to the
genetic makeup of uncultivated microorganisms. We here describe the high-throughput fluorescence-
activated cell sorting-based isolation of single cells from the environment, their lysis and strand displace-
ment-mediated whole genome amplification. We further outline 16S rRNA gene sequence-based screening
of single-cell amplification products, their preparation for Illumina sequencing libraries, and finally pro-
pose computational methods for read and contig level quality control of the resulting sequence data.
Key words Single amplified genomes, Fluorescence-activated cell sorting, Multiple displacement
amplification, Tagmentation, Illumina genome sequencing
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_8, © Springer Science+Business Media, LLC 2018
97
98 Robert M. Bowers et al.
2 Materials
2.3 Single-Cell Lysis 1. Qiagen REPLI-g Single Cell Kit (Qiagen, Valencia, CA, USA).
and Whole Genome 2. SYTO 13, 5 mM (ThermoFisher Scientific).
Amplification by MDA
3. Bleach (5% solution of sodium hypochlorite).
4. Spectraline XL-1500 UV Cross-linker.
5. PCR hood with UV light for decontamination.
6. Plate reader with temperature control, or a real-time thermo-
cycler, e.g., LightCycler 480 (Roche Diagnostics, Indianapolis,
IN, USA).
7. Eppendorf Safe-Lock microcentrifuge tubes, 1.5 mL.
Single-Cell Sequencing 101
3 Methods
3.1 Sample Sample processing and preservation differ depending on the type
Preparation of sample. With the current protocol, we provide a recommended
workflow (Fig. 1) for the production of SAGs from two commonly
surveyed sample types: a soil sample (Subheading 3.1.1) and a
biofilm sample (Subheading 3.1.2).
3.1.1 Sediment or Soil 1. For a sediment or soil sample, thoroughly mix 5 g of sediment
Sample or soil with 10–30 mL sterile buffer in a 50 mL falcon tube.
For soils and freshwater sediments, use 1× PBS as buffer. For
marine sediments, use sterile-filtered seawater as buffer.
2. Vortex at 14,000 rpm for 30 s to dislodge microbes from soil.
3. Centrifuge at 2000 × g for 30 s to remove large particles.
4. Collect the supernatant and proceed to Subheading 3.2.
3.1.2 Biofilm Sample 1. Using a sterile cotton swab, collect a sample of biomass into
microcentrifuge tubes already containing a sterile-filtered buf-
fer solution (e.g., 1× PBS or seawater).
2. Sonicate the sample in the tube in an ultrasonic water bath for
10 min.
102 Robert M. Bowers et al.
Fig. 1 Workflow for the preparation of sequencing libraries from environmental single cells. Environmental
samples are typically preprocessed and cryopreserved. Single cells are then isolated using high throughput
FACS, followed by lysis and MDA based whole genome amplification. MDA generated products are used as
template in 16S rRNA gene based phylogenetic screening and identification. Finally, based on the results of the
phylogenetic screen, a subset of single-cell MDA products is chosen for genome sequencing, following
Illumina’s tagmentation based Nextera XT library preparation protocol. Individual libraries may be sequenced
on the MiSeq platform or pools of many barcoded libraries on higher throughput platforms such as NextSeq
3.2 Sample For best results and to minimize cell lysis during freeze-thawing,
Preservation we recommend to cryo-protect the environmental sample and
store at −80 °C.
1. Cryopreservation of cells should be done with sterile-filtered
glycerol. Make Gly-TE stock by mixing 20 mL of 100× TE
(pH 8.0), 60 mL of DIW (deionized water), and 100 mL of
molecular-grade glycerol (see Note 1). Mix solution by vortex-
ing thoroughly and pass through a 0.2 μm filter.
2. Transfer 100 μL of prepared GlyTE stock and 1 mL of sample
solution to a sterile cryovial. Mix well by inverting.
3. Prepare several replicate vials for each sample. Store in liquid
nitrogen or at −80 °C.
3.3 Preparation As single-cell sorting and MDA are highly susceptible to exoge-
of FACS for Sterile Sort nous contamination, it is essential to have a clean cell sorter with
sterile sheath fluid. Briefly, fluidic lines in the cell sorter can be
decontaminated by running bleach through the lines, and sheath
fluid can be sterilized by UV treatment. This decontamination pro-
cess is sufficient for a clean sorting process, provided it is performed
before every run.
1. Within a clean hood, prepare 4 L of 1× PBS in two 2 L quartz
flasks. Add magnetic stir bars to the flasks and place over plates
Single-Cell Sequencing 103
to begin stirring. Also include the empty sheath fluid tank and
lid and arrange such that UV light will reach the inner surfaces.
Begin the overnight UV exposure (at least 16 h). After UV
exposure is complete, transfer the sterile sheath fluid to the
sterile tank within the clean hood (see Note 2). Reserve at least
10 mL of clean sheath fluid for later use while sorting.
2. Prepare a second sheath tank with 1 L of 10% bleach (0.5%
sodium hypochlorite final concentration) and run through the
cell sorter for 2 h to decontaminate fluidic lines.
3. Dispose of any remaining bleach and rinse the sheath tank with
sterile water. Run 1 L sterile water through the cell sorter for
30 min to rinse fluidic lines.
4. Install the dedicated clean tank with the sterile sheath fluid and
begin running.
5. Prior to sorting, sterilize microtiter plates by UV treatment for
10 min. Do not seal or cover plates during UV treatment. To
each well, add 2 μL of UV-treated, 1× PBS (see Note 3). Cover
with optical seal and perform an additional UV treatment for
10 min.
3.4 Cell Separation Due to the large diversity of types of environmental samples and
by Flow Cytometry variation in technical operation of cell sorters, we here outline a
general protocol for sorting single cells from environmental
samples.
1. To avoid clogging the nozzle of the cell sorter, filter each envi-
ronmental sample through an appropriately sized filter, relative
to the nozzle size (e.g., use a 20 μm filter when working with
a 70 μm nozzle).
2. Stain the cells in the sample with 1× SYBR green at 4 °C for
15 min in the dark (see Note 4).
3. Run the stained sample through the cell sorter and target the
selected population with a sort gate (see Note 5).
4. Sort the targeted cell population into the UV-treated optical
microtiter plates containing 2 μL 1× PBS per well (see step 5 in
Subheading 3.3). Any exogenous extracellular DNA that may
be sorted with a single cell could present as contamination in
downstream WGA. A two-step sort may be utilized to dilute
this DNA (see Note 6). We recommend sorting into 96-well or
384-well plates and including positive and negative controls
(see Note 7).
5. Seal plates with sterile foil and store at −80 °C.
3.5 Single-Cell Lysis The Qiagen REPLI-g Single Cell Kit is used for both single-cell
lysis and WGA steps. The WGA is based on the Phi29 catalyzed
MDA reaction. Refer to the manufacturer’s instructions for a
104 Robert M. Bowers et al.
3.6 Whole Genome For single-cell genome amplification, we recommend using the
Amplification Qiagen REPLI-g Single Cell Amplification Kit in conjunction with
SYTO staining for real-time amplification monitoring for the pur-
pose of quality control (Fig. 2). Further, the recently developed
WGA-X method is recommended for high GC templates [28].
1. Thaw the Qiagen REPLI-g Single Cell Amplification Kit
reagents according to the manufacturer’s instructions. Briefly,
thaw the Phi29 DNA Polymerase on ice. Thaw all other
reagents at room temperature.
Fig. 2 Real-time kinetics for the whole genome amplification of single cells and controls in a 384-well plate. Blue
lines show the amplification kinetics of the positive controls (100 cells sorted into a single well), yellow lines show
the amplification kinetics of sorted single cells and red lines represent negative controls (no cells sorted)
Single-Cell Sequencing 105
2. Prepare enough master mix for all the reactions, adding Phi29
DNA Polymerase at the end. Each reaction should contain:
4.5 μL water (provided in the Qiagen kit, specifically for use in
single-cell genomics), 14.5 μL Reaction Buffer, 1 μL Phi29
DNA Polymerase. Combine the water and Reaction Buffer first
and mix thoroughly by vortexing before the addition of the
polymerase last. Mix by vortexing and spin down (see Note 9).
3. To track amplification of single cells in real time, add SYTO 13 to
the master mix for a final concentration of 0.5 μM (see Note 10).
Mix by vortexing and spin down.
4. To each well containing 5 μL of lysed cells, add 20 μL of MDA
master mix, for a total reaction volume of 25 μL. Cover the
plate with an optical seal and centrifuge at 1000 × g for 1 min
(see Note 11).
5. Incubate the plate at 30 °C for 2–5 h in a real-time thermocycler
or plate reader, e.g., the Roche LightCycler 480 (see Note 12).
6. Heat-inactivate the Phi29 DNA Polymerase by incubating at
65 °C for 10 min. MDA products can be stored for multiple
months at −80 °C (see Note 13).
3.7 Phylogenetic Phylogenetic screening is not mandatory, but useful for the accurate
Screening identification of potential target cells for genome sequencing. We
here outline the use of universal 16S rRNA gene primers, but prim-
ers for specific functional genes may be used as well.
1. Make a 1:100 dilution of the MDA product using nuclease-
free water. Mix dilution extremely thoroughly (see Note 14).
Hand-pipetting is most effective, or alternatively, use a plate-
shaker for 15 min at the maximum setting.
2. Transfer 2 μL of diluted MDA product as template to an opti-
cal microtiter plate.
3. Thaw the PCR reagents on ice: SsoAdvanced SYBR Green
Supermix, 10 μM 926wF primer, 10 μM 1392R primer (see
Note 15).
4. Prepare enough master mix for all PCR reactions. Each reac-
tion should contain: 2.6 μL nuclease-free water, 5 μL
SsoAdvance SYBR Green Supermix (2×), 0.2 μL 926wF primer
(10 μM), 0.2 μL 1392R primer (10 μM). Mix by vortexing and
spin down.
5. To each well containing 2 μL of diluted MDA product as tem-
plate, add 8 μL of PCR master mix, for a total reaction volume
of 10 μL. Cover the plate with an optical seal and centrifuge at
1000 × g for 1 min.
6. Refer to the manufacturer’s protocol when selecting a cycling
program. Adding a melt curve step to the cycling program will
aid in analyzing and choosing PCR products for downstream
106 Robert M. Bowers et al.
3.8 Library The Illumina Nextera XT DNA Sample Preparation Kit is used to
Preparation prepare indexed paired-end libraries from single-cell MDA prod-
and Sequencing ucts for next-generation sequencing. Briefly, libraries are prepared
of SAGs using a single-step reaction where the transposase simultaneously
fragments and ligates Illumina sequencing adapters in a single
5 min reaction. This library preparation is cost-effective and
Illumina is currently the most prevalent next-generation sequenc-
ing platform.
1. Before performing any work, wipe down and clean hood sur-
faces, pipettes, and equipment with 10% bleach (see Note 16).
2. Thaw reagents for tagmentation on ice. Mix by inverting gen-
tly and spin down. To sterile microcentrifuge tubes, add: 10 μL
of Tagment DNA buffer, 5 μL of input DNA (1 ng total), and
5 μL of Amplicon Tagment Mix.
3. Mix by pipetting and spin down. Incubate on a thermocycler
at 55 °C for 5 min and hold at 10 °C.
4. Once the thermocycler reaches 10 °C, immediately neutralize
the reaction by adding 5 μL of Neutralize Tagement Buffer.
Mix by pipetting and spin down. Incubate at room tempera-
ture for 5 min.
5. Thaw the appropriate indexes and the Nextera PCR master
mix. To each reaction, add 15 μL of PCR master mix, 5 μL of
Index 1, and 5 μL of Index 2 (see Note 17). Mix by pipetting
and spin down. PCR amplify according to the manufacturer’s
protocol.
6. Perform cleanup of PCR product by using a 1.8× ratio of
AMPure XP beads (see Note 18). For example, add 90 μL of
beads to each 50 μL PCR reaction. Mix by vortexing and quick
spin.
7. Shake samples at 1800 rpm for 2 min, then incubate at room
temperature for 5 min.
8. Place tubes on a magnetic stand to pellet the beads (see Note
19). Carefully remove and discard the supernatant once the
solution has become clear. Make sure not to remove any beads.
9. Without removing tubes from the magnetic stand, add 200 μL
of freshly prepared 80% ethanol to each sample. Let samples sit
for 30 s. Remove and discard ethanol. Repeat, for a total of
two 80% ethanol washes (see Note 20).
Single-Cell Sequencing 107
Fig. 3 Library QC of a pool of barcoded Nextera XT single-cell libraries on Agilent HS Bioanalyzer. 35 bp and
10,380 bp peaks represent markers used as internal standards. When using the protocol, library insert size
distributions typically ranges from 300 to 2000 bp
10. Leave tube-caps open and let the samples air-dry on the
magnetic stand for 15 min until residual 80% ethanol has
evaporated.
11. Add 52 μL of Resuspension Buffer to each tube. Resuspend
the beads and mix thoroughly.
12. Shake the samples at 1800 rpm for 2 min, then incubate at
room temperature for 2 min.
13. Place the tubes on the magnetic stand to pellet the beads.
Once the liquid is clear, carefully transfer 50 μL of eluted PCR
product to a new tube.
14. Perform quality control on the final library. Run 1 μL of library
on the Agilent 2100 Bioanalyzer using the High Sensitivity Kit
for size and quantification (Fig. 3) (see Note 21).
15. Barcoded libraries can be pooled in equimolar ratios according
to Illumina’s pooling protocols and procedures for sequencing
on the Illumina platform. For library concentrations, refer to
the Illumina platform-specific protocol.
16. Sequence individual libraries or library pools using the Illumina
platform, such as MiSeq or NextSeq, according to the manu-
facturer’s protocol. NextSeq is specifically recommended for
library pools [28].
3.9 Assembly of SAG To ensure high-quality genomes, a few steps need to be taken
Sequences and Data before data are useable for downstream biological inference.
Quality Assurance Contaminant sequences can be identified and removed at both the
read and contig (post-assembly) levels. At the read level, data are
108 Robert M. Bowers et al.
4 Notes
Acknowledgment
References
1. Amann RI, Ludwig W, Schleifer KH (1995) 5. Thomas T, Gilbert J, Meyer F (2012)
Phylogenetic identification and in situ detec- Metagenomics – a guide from sampling to data
tion of individual microbial cells without culti- analysis. Microb Inform Exp 2:3
vation. Microbiol Rev 59:143–169 6. Stepanauskas R (2012) Single cell genomics: an
2. Eloe-Fadrosh EA, Ivanova NN, Woyke T, individual look at microbes. Curr Opin
Kyrpides NC (2016) Metagenomics uncovers Microbiol 15:613–620
gaps in amplicon-based detection of microbial 7. Clingenpeel S, Schwientek P, Hugenholtz P,
diversity. Nat Microbiol 1:15032 Woyke T (2014) Effects of sample treatments on
3. Rinke C, Schwientek P, Sczyrba A, Ivanova genome recovery via single-cell genomics. Isme
NN, Anderson IJ, Cheng J-F, Darling A, J:1–4. https://doi.org/10.1038/ismej.2014.92
Malfatti S, Swan BK, Gies EA, Dodsworth JA, 8. Marcy Y, Ouverney C, Bik EM, Lösekann T,
Hedlund BP, Tsiamis G, Sievert SM, Liu W-T, Ivanova NN, Martin HG, Szeto E, Platt D,
Eisen JA, Hallam SJ, Kyrpides NC, Hugenholtz P, Relman DA, Quake SR (2007)
Stepanauskas R, Rubin EM, Hugenholtz P, Dissecting biological ‘dark matter’ with single-
Woyke T (2013) Insights into the phylogeny cell genetic analysis of rare and uncultivated
and coding potential of microbial dark matter. TM7 microbes from the human mouth. Proc
Nature 499:431–437 Natl Acad Sci U S A 104:11889–11894
4. McLean JS, Lombardo M-J, Badger JH, 9. Gawad C, Koh W, Quake SR (2016) Single-cell
Edlund A, Novotny M, Yee-Greenbaum J, genome sequencing: current state of the sci-
Vyahhi N, Hall AP, Yang Y, Dupont CL, ence. Nat Rev Genet 17:175–188
Ziegler MG, Chitsaz H, Allen AE, Yooseph S,
Tesler G, Pevzner PA, Friedman RM, Nealson 10. Lo S-J, Yao D-J (2015) Get to understand
KH, Venter JC, Lasken RS (2013) Candidate more from single-cells: current studies of
phylum TM6 genome recovered from a hospi- microfluidic-based techniques for single-cell
tal sink biofilm provides genomic insights into analysis. Int J Mol Sci 16:16763–16777
this uncultivated phylum. Proc Natl Acad Sci U 11. Stepanauskas R, Sieracki ME (2007) Matching
S A 110:E2390–E2399 phylogeny and metabolism in the uncultured
Single-Cell Sequencing 111
marine bacteria, one cell at a time. Proc Natl Stepanauskas R (2011) Decontamination of
Acad Sci U S A 104:9052–9057 MDA reagents for single cell whole genome
12. Swan BK, Tupper B, Sczyrba A, Lauro FM, amplification. PLoS One 6:e26161
Martinez-Garcia M, González JM, Luo H, 23. Schmieder R, Edwards R (2011) Fast identifi-
Wright JJ, Landry ZC, Hanson NW, Thompson cation and removal of sequence contamination
BP, Poulton NJ, Schwientek P, Acinas SG, from genomic and metagenomic datasets.
Giovannoni SJ, Moran MA, Hallam SJ, PLoS One 6:e17288
Cavicchioli R, Woyke T, Stepanauskas R (2013) 24. Tennessen K, Andersen E, Clingenpeel S, Rinke
Prevalent genome streamlining and latitudinal C, Lundberg DS, Han J, Dangl JL, Ivanova
divergence of planktonic bacteria in the surface NN, Woyke T, Kyrpides N, Pati A (2015)
ocean. Proc Natl Acad Sci U S A 110: ProDeGe: a computational protocol for fully
11463–11468 automated decontamination of genomes. ISME
13. Zong C, Lu S, Chapman AR, Xie XS (2012) J. https://doi.org/10.1038/ismej.2015.100
Genome-wide detection of single-nucleotide 25. Parks, D. H., Imelfort, M., Skennerton, C. T.,
and copy-number variations of a single human Hugenholtz, P. & Tyson, G. W. (2015)
cell. Science 338:1622–1626 CheckM: assessing the quality of microbial
14. Landry ZC, Giovanonni SJ, Quake SR, Blainey genomes recovered from isolates, single cells,
PC (2013) Optofluidic cell selection from com- and metagenomes. https://doi.org/10.7287/
plex microbial communities for single-genome peerj.preprints.554v2
analysis. Methods Enzymol 531:61–90
26. Lasken RS, McLean JS (2014) Recent
15. Hoeijmakers W, Bártfai R, Françoijs K-J, advances in genomic DNA sequencing of
Stunnenberg HG (2011) Linear amplification microbial species from single cells. Nat Rev
for deep sequencing. Nat Protoc 6:1026–1036 Genet 15:577–584
16. Duhaime MB, Deng L, Poulos BT, Sullivan 27. Rinke C, Lee J, Nath N, Goudeau D, Thompson
MB (2012) Towards quantitative metagenom- B, Poulton N, Dmitrieff E, Malmstrom R,
ics of wild viruses and other ultra-low concen- Stepanauskas R, Woyke T (2014) Obtaining
tration DNA samples: a rigorous assessment genomes from uncultivated environmental
and optimization of the linker amplification microorganisms using FACS-based single-cell
method. Environ Microbiol 14:2526–2537 genomics. Nat Protoc 9:1038–1048
17. Kashtan N, Roggensack SE, Rodrigue S, 28. Stepanauskas R, Fergusson EA, Brown J,
Thompson JW, Biller SJ, Coe A, Ding H, Poulton NJ, Tupper B, Labonté JM, Becraft
Marttinen P, Malmstrom RR, Stocker R, ED, Brown JM, Pachiadaki MG, Povilaitis T,
Follows MJ, Stepanauskas R, Chisholm SW Thompson BP, Mascena CJ, Bellows WK,
(2014) Single-cell genomics reveals hundreds Lubys A (2017) Improved genome recovery
of coexisting subpopulations in wild and integrated cell-size analyses of individual
Prochlorococcus. Science 344:416–420 uncultured microbial cells and viral particles.
18. Marine R, McCarren C, Vorrasane V, Nasko D, Nat Commun 8:84. https://doi.org/10.1038/
Crowgey E, Polson SW, Wommack KE (2014) s41467-017-00128-z
Caught in the middle with multiple displace- 29. Bankevich A, Nurk S, Antipov D, Gurevich AA,
ment amplification: the myth of pooling for Dvorkin M, Kulikov AS, Lesin VM, Nikolenko
avoiding multiple displacement amplification SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin
bias in a metagenome. Microbiome 2:3 AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner
19. Lasken RS, Stockwell TB (2007) Mechanism of PA (2012) SPAdes: a new genome assembly
chimera formation during the multiple dis- algorithm and its applications to single-cell
placement amplification reaction. BMC sequencing. J Comput Biol 19:455–477
Biotechnol 7:19 30. Bolger AM, Lohse M, Usadel B (2014)
20. Woyke T, Xie G, Copeland A, González JM, Trimmomatic: a flexible trimmer for Illumina
Han C, Kiss H, Saw JH, Senin P, Yang C, sequence data. Bioinformatics 30:2114–2120.
Chatterji S, Cheng J-F, Eisen JA, Sieracki ME, https://doi.org/10.1093/bioinformatics/
Stepanauskas R (2009) Assembling the marine btu170
metagenome, one cell at a time. PLoS One 31. Markowitz VM, Chen I-MA, Chu K, Szeto E,
4:e5299 Palaniappan K, Pillay M, Ratner A, Huang J,
21. Blainey PC (2013) The future is now: single- Pagani I, Tringe S, Huntemann M, Billis K,
cell genomics of bacteria and archaea. FEMS Varghese N, Tennessen K, Mavromatis K, Pati
Microbiol Rev 37:407–427 A, Ivanova NN, Kyrpides NC (2014) IMG/M
22. Woyke T, Xie G, Copeland A, González JM, 4 version of the integrated metagenome com-
Han C, Kiss H, Saw JH, Senin P, Yang C, parative analysis system. Nucleic Acids Res
Chatterji S, Cheng J-F, Eisen JA, Sieracki ME, 42:D568–D573
Chapter 9
Abstract
Population genetic studies of non-model organisms often rely on initial ascertainment of genetic markers
from a single individual or a small pool of individuals. This initial screening has been a significant barrier
to beginning population studies on non-model organisms (Aitken et al., Mol Ecol 13:1423–1431, 2004;
Morin et al., Trends Ecol Evol 19:208–216, 2004). As genomic data become increasingly available for
non-model species, SNP ascertainment from across the genome can be performed directly from published
genome contigs and short-read archive data. Alternatively, low to medium genome coverage from shotgun
NGS library sequencing of single or pooled samples, or from reduced-representation libraries (e.g., cap-
ture enrichment; see Ref. “Hancock-Hanser et al., Mol Ecol Resour 13:254–268, 2013”) can produce
sufficient new data for SNP discovery with limited investment. We describe protocols for assembly of short
read data to reference or related species genome contig sequences, followed by SNP discovery and filtering
to obtain an optimal set of SNPs for population genotyping using a variety of downstream high-through-
put genotyping methods.
Key words Bioinformatics, Short read archive, Reference-guided assembly, Single-nucleotide poly-
morphism, Non-model organisms
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_9, © Springer Science+Business Media, LLC 2018
113
114 Phillip A. Morin et al.
2 Materials
Table 1
Required and optional software
3 Methods
3.1 Publicly The benefits of a reference genome assembly include longer con-
Available Genome tigs (or scaffolds), quality-checked assemblies, repeat masking,
Assembly and Short and in some cases, annotation and chromosome assignment.
Read Archive (SRA) These can all facilitate high-quality SNP discovery and selection
Data based on additional knowledge of genome position and associa-
tion with known genes or chromosomes. The NCBI web site is a
good place to search for reference genomes (http://www.ncbi.
nlm.nih.gov/genome/browse/), but the organization can be
confusing and there are important issues to be aware of. As an
example, browsing for the genus “Orcinus” results in one record
for the genome of Orcinus orca, the killer whale. This is a member
of the family Delphinidae and could be used as the reference for
other species in the family, and even in other families of cetaceans,
though potentially at the loss of coverage and some bias in genome
region coverage due to divergence. The Genome overview page for
Orcinus orca (www.ncbi.nlm.nih.gov/genome/14034) summarizes
the project information, publications, summaries, and links to the
mitochondrial and nuclear genome information (under
“Replicon Region”), and Genome Region, with links to the
assembled scaffolds. We recommend starting under the Summary,
and reviewing the BioProjects links to review the projects con-
tributing to the killer whale genome. These may include multi-
ple biosamples (different animals) and data types. Every project
is different, so these need to be reviewed to determine which
project(s) has the appropriate assembled contigs/scaffolds and
SRA data for re-assembly and SNP discovery. For this example
there are two BioProject IDs, and the project data for them are
shown in Fig. 1a and b.
Both BioProjects are based on the same, single biosample,
and link to the same assembly details, but one links to the SRA
SNP Discovery from NGS Data 117
Fig. 1 Screen captures from GenBank genome projects web page for the killer
whale genome (Orcinus orca; http://www.ncbi.nlm.nih.gov/genome/14034).
Project data table for two BioProjects associated with this genome, (a)
PRJNA189949 and (b) PRJNA167475
HiSeq 2000, paired-end) and the size of the individual files. Once
you know which files you want to use, you can download individ-
ual files from the SRA portal http://sra.dnanexus.com. At the site,
you can search for your species, select the appropriate study (if
there is more than one), and click the “related runs” button to
show all of the sequencing runs associated with that project. For
this example, the project with accession SRP015826 has seven
experiments with 16 runs total, representing the 16 short-read
archive files. The files can be selected by checking the boxes of
those you want and using the “Download” button. This generates
a web page with the download buttons for individual files, or you
can select files and click the button for “download SRA URLs” to
download a text file containing all of the URL links to facilitate
command-line transfer of the files using the “wget” command in a
Linux environment:
Example SRA URLs file: download_sra_urls.txt (for the
Orcinus orca project):
tp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-
f
instant/reads/ByRun/sra/SRR/SRR574/SRR574967/
SRR574967.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/
reads/ByRun/sra/SRR/SRR574/SRR574968/SRR574968.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/
reads/ByRun/sra/SRR/SRR574/SRR574969/SRR574969.sra
(etc.)
Transfer each of the files using “wget” and
#
# the URLs in the download_sra_urls.txt
# document:
wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/
reads/ByRun/sra/SRR/SRR574/SRR574967/SRR574967.sra
These files are often very large, and the combined files can be
several hundred GB, so the transfer may take a long time and
require a lot of storage space on your system. It is important to
determine how much storage space will be needed prior to starting
to be sure you have sufficient storage capacity for the raw files and
the assemblies. Although this is difficult to estimate and depends
on whether or not you delete file that are no longer needed (e.g.,
delete SAM files after converting to BAM format), you should
expect to use about 20 times as much disk space as the size of your
decompressed short-read archive fastq file(s).
SRA files are compressed binary files and need to be expanded
into FASTQ files prior to use in an assembly. This requires the
fastq-dump function from the SRAtoolkit software. Files size will
expand substantially, typically by 4–5× the original SRA file size.
# Convert SRA files into FASTQ files
fastq-dump SRR574967.sra --split-3
120 Phillip A. Morin et al.
3.2 Creating a De A draft genome sequence is not necessary to align sequence reads for
Novo Set of Reference SNP discovery, but some reference is necessary. In the absence of a
Contigs reference genome, a set of reference contigs can be created by de
novo assembly of short-read data. The details of de novo assembly
are not the focus of this chapter, but there are commercially available
programs (e.g., CLC Genomics Workbench (CLCbio), Geneious
(BioMatters Limited)) and freely available programs such as
SOAPdenovo2 [32], DISCOVAR de novo (Broad Institute),
SGA [33], some of which are optimized for different operating
systems or types of data (e.g., see list at http://omictools.com/
genome-assembly-category). The goal of using these packages is to
generate high-quality contigs, then use them as the reference
sequences to re-align reads for SNP discovery. Once a set of de novo
assembled contigs has been produced (in FASTA format), they can
be treated the same as a reference genome sequence for read align-
ment and SNP discovery (see Ref. 31).
When there are multiple smaller FASTQ files for a single indi-
vidual, they can be concatenated into a single file for each read
direction. This is a Linux function:
# Concatenate forward read files.
cat filename1_read1.fastq filename2_read1.fastq >
forward_reads.fastq
However, when the files are large (e.g., >10 GB), running sep-
arate alignments of each FASTQ file (or pair of files for paired-end
data) will improve speed by allowing parallel processes run on sep-
arate cores (see Note 1). The resulting alignments can be com-
bined after conversion to the smaller BAM files (see below).
Although there are many options that can be altered to vary
alignment parameters, the updated aligner “BWA-MEM” algo-
rithm [34] is implemented in BWA and has been optimized for
aligning short reads to a large genome such as human without the
need to specify alignment parameters, and performs as well as or
better than many other aligners with short read (100 bp) and lon-
ger read-length data [34].
Mitochondrial DNA can make up a significant portion of
sequence reads, so removal of reads that map to the mitogenome
can reduce file sizes and speed up subsequent analyses. A refer-
ence mitogenome FASTA file can typically be downloaded from
GenBank for the target species or a closely related species, or the
mitogenome sequence can often be assembled de novo from the
NGS reads (e.g., [35]). Map the reads to the mitogenome with
BWA, convert the SAM file to a BAM file, and sort with
SAMtools.
# Align reads (reads.fq) to indexed reference
# mitogenome # (ref_mtgenome.fasta)
bwa index ref_mtgenome.fasta
bwa mem ref_mtgenome.fasta reads.fq >
mapped_to_mt.sam
# Note, if there are paired-end reads, the two
# respective files are listed, e.g., reads_1.fq
# reads_2.fq.
# Convert the SAM file to a sorted BAM file.
samtools view -b -h mapped_to_mt.sam | samtools
sort - mapped_to_mt_sorted
# output = “mapped_to_mt_sorted.bam”
The BAM file will also retain the reads that did not map to the
mitogenome reference, i.e., the reads from the nuclear genome
that we are interested in. These unmapped reads can be extracted
to a new BAM file, and then converted to a FASTQ file.
# Copy reads that did not map to the mitogenome
# to a new file and convert from BAM file to
# FASTQ format.
122 Phillip A. Morin et al.
The output SAM files are typically very large, and can be reduced
to the binary form, or BAM file format, for subsequent analyses. The
SAM files can then be deleted, as they are no longer needed.
#Convert SAM file to BAM file and sort BAM
# file.
samtools view -@1 -T ref.fa -b aln_se1.sam |
samtools -@1 sort - aln_se1_sorted
# -@ indicates the number of threads to use.
# Default = 1. This option is only available in
# SAMtools for the ‘view’ and ‘sort’ functions.
# -b Output in the BAM format
# -T FASTA format reference file.
At this stage, the sorted BAM file contains both mapped and
unmapped sequencing reads. You may therefore want to remove
the unmapped sequencing reads to save on disk storage space and
improve the efficiency of your downstream analysis. But beware
that your unmapped reads may contain useful data, such as reads
associated with the Y-chromosome if you mapped your reads to a
female reference, or microbiome data, etc.
# Keep reads that have mapped to the reference
genome with a qscore ≥30.
samtools view -bh -F4 -q 30 individual1.bam >
individual1_q30.bam
3.4 Pipeline 1: SNP SNP discovery from alignment files (BAM) with medium to high
Discovery from High- average coverage (>10×) uses depth and quality criteria, and
Coverage Genome extracts SNPs with flanking sequence data for assay design (e.g.,
Assembly for AmpliSeq, GTseq, TaqMan, or capture-enrichment GBS). We
describe a simple SNP discovery pipeline that uses the variant
finder function of BCFtools [25] to scan an alignment (BAM file)
for SNPs that meet specified criteria. The minimum and maximum
coverage criteria are dependent on the average coverage of the
genome alignment. When coverage is moderate (i.e., 10–30×), we
set the minimum depth of coverage for a SNP at 10 reads, and a
maximum depth of 100 in order to ignore potentially collapsed
duplicated sequence in the assembly. However, when the average
depth of an alignment is high (i.e., >30×), the criteria should be
changed to reflect the range from 1/3 to twice the average cover-
age (see “Estimate coverage” in Subheading 3.8).
124 Phillip A. Morin et al.
The output files from this script are a FASTA file containing
the designated target sequences surrounding each SNP, and a gen-
otype VCF file. The VCF file format is described in detail else-
where (https://github.com/samtools/hts-specs). This file
contains information about each SNP, starting with the contig ID
(“chrom”), the SNP position, the reference and alternate alleles, a
quality score, and a list of additional “info” about the SNP, includ-
ing the depth of coverage (DP), various quality measures, the
count of forward and reverse reads for each allele used in variant
calling (DP4), and the consensus quality (FQ) (see Fig. 2). The
total number of reads in DP4 may be lower than DP because low-
quality bases are not counted. Given one sample in the alignment,
the consensus quality (FQ) is typically positive for heterozygous
loci, and negative for homozygous loci (see more details at http://
SNP Discovery from NGS Data 125
Fig. 2 Example of Variant Call Format (VCF) file format. Column headings are CHROM (contig ID), POS (SNP
position), ID (sample IDs for some data types), REF (reference allele), ALT (alternate allele), QUAL (quality score),
FILTER (filters that have been applied to the data), INFO (combined information types), FORMAT (genotype
format codes), and additional genotype or sample file information. Additional details can be found at https://
samtools.github.io/hts-specs/VCFv4.1.pdf
3.5 Pipeline 2: SNP SNPs can be called from low-coverage shotgun sequencing data
Discovery from Low- (<1×) from multiple samples using genotype likelihoods, therefore
Coverage Multiplex taking uncertainty in base-calling into account [36]. This section
Shotgun Data describes read sequence data processing, genotype likelihoods
126 Phillip A. Morin et al.
3.6 Sequence Data This section assumes that single- or paired-end read data have been
Processing de-multiplexed to separate reads by the index sequences of each
sample in the multiplex pool. SRA files downloaded from GenBank
have presumably been processed to remove adapters and filtered for
quality. For data generated specifically for SNP discovery, these
steps need to be done prior to mapping reads to the reference
genome. For each FASTQ file, ADAPTER-REMOVAL (or another
program, e.g., Trimmomatic; [38]) can be used to remove adapter
sequences from the sequencing reads and remove sequence reads
that are ≤30 bp or another length threshold following trimming.
# Remove adapter sequences.
AdapterRemoval --file1 ind1.fastq --basename
ind1trimmed --minlength 30 --trimns --
trimqualities --minquality --gzip
# ind1.fastq is the input file and ind1trimmed
# the output file.
# --trimns: Remove stretches of Ns from the
# sequencing reads in the 5' and 3' end.
# --trimqualities: consecutive stretches of low
# quality bases (the quality threshold is set
# by minquality) are removed from the 5' and 3'
# end of the reads. All bases with minquality
# or lower are trimmed. If ‘trimns’ is also
# indicated, stretches of low-quality bases
# and/or Ns are trimmed.
# --minquality: quality threshold for the
# trimming of low quality bases. Default is 2.
# --minlength 30: reads that are > 30 bp are
# kept.
# --gzip: the output file is compressed. There
# will be 2 output files: one with the
# discarded sequencing reads
# (ind1trimmed.discarded.gz) and one with the
# retained reads (ind1trimmed.truncated.gz).
# --qualitybase: specifies the platform-
SNP Discovery from NGS Data 127
The output files include a masked copy of the input file, which
contains the query sequences, with identified repeats and low com-
plexity sequences masked and replaced with “N”s. A map file is also
generated, in which coordinates in terms of contig, start and end
positions are given, which can then be used as a .bed file (see below).
The advantage of this is that reads can be mapped to the unmasked
reference and then removed, which increases coverage in regions
adjacent to repetitive elements.
# Run BEDtools to remove repeated regions.
intersectBed -abam data.bam -b RepeatMask.bed -
v | samtools view -h - | samtools view -bSh - >
data_noRepeats.bam
# The BAM file from which the repeated regions
# should be removed is given with the option
# -abam and the file containing the repeated
# regions with –b.
3.7 SNP Calling SNP calling is performed using genotype likelihoods that take
Using Genotype uncertainty into account and can be estimated in ANGSD [24]
Likelihoods for Low- from multiple samples as recommended by Nielsen et al. [36].
Coverage Alignments Uncertainty in SNP discovery in low coverage data can be the result
of sequencing, base-calling, mapping, and alignment errors. Briefly,
genotype likelihoods are calculated using probabilistic methods and
take the quality scores of the sequencing reads into account [36]. A
base is considered a SNP if the minor allele frequency is significantly
different from 0 as inferred from a likelihood ratio test [39]. The
threshold is set with the “-SNP_pval” option and P < 0.000001 is
considered as conservative. The genotype likelihood is calculated
using the SAMtools method with the “-GL1” option [24, 25] to
infer the major and minor alleles with –doMajorMinor1 and to esti-
mate major and minor allele frequencies using the EM algorithm
with –doMaf 2 [39, 40](see Note 3).
# Infer SNPs from the alignments listed in
# “data.bamlist” using genotype likelihoods in
# ANGSD.
angsd -GL 1 -out genolike_data -nThreads 2 -
doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 2
-minQ 30 -bam data.bamlist
# -out indicates the output file (followed by
# the filename)
# -bam indicates the input file (followed by
# the file name), which is the data.bamlist
# file created previously.
# -doGlf 2 creates a genotype likelihood beagle
# file.
# -nThreads indicates the number of threads.
# minQ 30 to set the minimum base quality
# (minQ) to 30
# Optional settings: -minMapQ 30 to set the
# minimum mapping quality (minMapQ) to 30 (if
# not filtered before).
SNP Discovery from NGS Data 129
3.8 SNP Filtering A filtering pipeline is then applied to further avoid bias linked to
next-generation-sequencing and low coverage data. The different
filters are:
–– Filter 1: discarding SNPs in regions of poor mapping quality
(MAPQ ≤ 30).
–– Filters 2 and 3: removing SNPs in regions of excessive coverage
and high numbers of individuals: Filter 2 removes SNPs that
are in regions with twice the mean coverage. As this is a rather
arbitrary threshold, we advise plotting and exploring the distri-
bution of coverage at this stage. Illumina shotgun sequencing
data typically results in a Poisson distribution of different cov-
erage depths, and so selecting the point at which the upper
bound of depth of coverage starts to deviate from this distribu-
tion can be a more formal approach to set a threshold for
determining regions of excessive coverage.
–– Filter 3 removes SNPs that are found (i.e. for which there is
coverage) in a high number of individuals. The cutoff is defined
by the upper tail of the distribution of the plot of the number
of SNPs against the number of individuals. SNPs in the upper
tail of the distribution are removed. The high coverage of these
SNPs may be because they are in unmasked repeated regions,
in nuclear mitochondrial DNA inserts (NUMTs), or other
mapping artifacts (i.e., paralogous loci). This filter is only rec-
ommended with ultra-low coverage data.
–– Filter 4: discarding SNPs that have an estimated minor allele fre-
quency (MAF) ≤ 0.05 as it is the smallest MAF that we could
theoretically have with a cutoff of ten individuals (threshold
defined earlier but it may change depending on the data ana-
lyzed). However, estimations of genotype likelihoods should
reduce the error rate. Additionally, rare variants are important for
several applications such as the inference of demographic history
based on the Site Frequency Spectrum and the estimation of sev-
eral population genetic parameters [41, 42]. Hence, depending
on the downstream analyses it may be worth keeping those SNPs.
–– Filter 5: Select only SNPs with flanking sequence of a specified length
(e.g., 150 bp on either side of the target SNP) and with no N’s
in the flanking region, to allow design of primers and/or probes.
–– Filter 6: SNP validation. (Optional) This step is not necessary
but if available, the identified SNPs could be compared to
already existing genomic resources such as published high cov-
130 Phillip A. Morin et al.
Then, the SNPs that are in regions of poor mapping quality are
removed from the genolike_data.mafs file (from the Genotype
Calling section above) using a second R script (see Script 4a below)
and the “poor_mapping_quality.txt” filter.
The genolike_data.mafs (from Subheading 3.7) file looks like
the following:
chromo position major minor unknownEM pu-EM nInd
JH472447 724 G C 0.406400 2.160509e-07 5
134 Phillip A. Morin et al.
Filter 5 (scripts 7 and 8): Select SNPs with specified length flanking
sequences for primer/probe design. This step only selects SNPs that
have at least the specified length of sequence on either side of the
SNP, without N’s.
# Scripts 7 and 8 (Appendix 9 and 10) must both
# be in the target directory along with the
# output of Script6 (“SNPs_MAF_good.txt”) and
the reference FASTA file.
./Script7_SNP_call_filter_GATK.sh
SNPs_MAF_good.txt ref.fa 300 output
# ‘300’ specifies the sequence length,
# specifying 150bp on either side of the target
# SNP.
# Substitute the name of the reference sequence
# FASTA file for “ref.fa”
# Substitute a descriptive name for “output”.
# This will be the base name for 2 output
# files, output.geno.fasta and output.geno.txt.
# The latter file contains columns for SNP
# locus (“chromo” from GATK output), position,
# major allele, minor allele, unknownEM (minor
# allele frequency), pu.EM (probability of
# SNP), nInd (number of individuals with data
# at SNP site).
# SNPs_hweExcessHet.txt.
Rscript Script11_Filter7_Remove_HWEexcessHet.R
#output = good_hwe_SNPs.txt
3.9 Applications This approach could be applied to any species for which a reference
genome or a reference of a closely related species exists to map the
reads. The main advantage of this method is that it results in the
discovery of a large number of SNPs. As this approach does not
allow simultaneous SNP discovery and genotyping, the identified
SNPs could then be target-sequenced using custom-produced SNP
arrays or target enrichment capture arrays for many applications in
population genomics. These SNPs would be particularly useful for
applications that need a large number of SNPs such as inferences of
demographic history based on the site frequency spectrum [43] or
inferences of selective sweeps [44]. Samples used for SNP discovery
should ideally span the geographical range of populations that will
be target-sequenced to avoid ascertainment bias [10].
SNP discovery in the absence of population data can result in
some false positives, where identified SNP loci are either the result
of sequencing errors, or, more commonly, assembly errors that
result in duplicated loci or repeat regions being combined into
single assemblies. The effect of including false positives that are in
fact monomorphic (false positives due to sequencing error) is rela-
tively minor, resulting typically in a small portion of loci being
uninformative. The effect of including false positives that fall in
repeated regions can be much more significant, as highly repeated
loci can heavily bias the read distribution in genotyping methods
that rely on capture enrichment or amplification of groups of loci.
Figure 3 shows the distribution of reads from a set of 384 SNP loci
genotyped from multiplexed amplicons (lifetechnologies.com/
ampliseqcustom) of 95 DNA samples from blue whales. The first
set of 384 loci was obtained from a draft blue whale genome
assembly that had not been masked for repeats. Seven of the SNPs
138 Phillip A. Morin et al.
a)
384 SNP coverage, before repeat removal
1000000
100000
10000
1000
100
10
1
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
b)
1000
100
10
1
1
13
25
37
49
61
73
85
97
289
109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
301
313
325
337
349
361
373
Fig. 3 Distribution of total reads assembled to 384 SNP loci (a) before removal of seven repeat loci, and (b)
after removal and replacement putative repeat loci. Loci falling below the threshold coverage for SNP genotyp-
ing of any samples were excluded. The Y axis is logarithmic. Plate 1 was sequenced on an Ion Torrent (458,208
aligned reads), and plate 2 was sequenced on an Ion Proton (6,198,109 aligned reads), so the Plate 2 per-locus
read depth was scaled by the ratio of plate 2 to plate 1 total reads for comparison purposes
SNP Discovery from NGS Data 139
4 Notes
Table 2
Useful Linux commands
wget wget stands for “web get”, used for non-interactive downloading of files using HTTP,
HTTPS and FTP protocols.
nohup Creates “nohup” file and relegates screen output to that file. When used with “&” at
the end of the command line, the process is run in the background with no screen
output. This process is incompatible with some programs and functions, so it is best
to run commands from within a shell script, and use nohup/& when executing the
shell script.
top Typing “top” shows the current processes on the system, including how the percent
CPU (i.e., 100% = 1 CPU) and percent of the total memory being used by each
process. Exit with the keyboard combination “control” and “c”
df –h Shows current system status, including size, amount used, amount available, % used
control-c For a process that is active (i.e., not run in background mode), it can be cancelled with
the keyboard combination “control” and “c”
kill When a process is running in the background, it can be “killed” by using the “top”
command to find the process ID (PID), then typing “kill” and the PID
Rscript Run an R script (follow by specifying the path/name of the script)
Chmod 755 Change permissions on a file to allow everyone to read and execute the file, and the
file owner is allowed to write to the file. Follow with file name
grep Useful for extracting text. For subsetting a fasta file from a list of loci:
grep -Fwf SNP_list.txt -A1 file.fasta | grep -v '^--$' >subset_file.fasta
# file.fasta contains a list of sequence name texts that will be found and selected from
the fasta file, along with the following DNA sequence line for each locus
Acknowledgments
Appendix 1
Script1_SNP_call_filter.sh
Appendix 2
Script2_generate_genotype_blocks.py
Appendix 3
Script3a_Filter1_mapping_quality.R
Appendix 4
Script3b_Filter2_excessive_coverage.R
Appendix 5
Script4a_Filter1_Remove_poor_mapping_quality.R
Appendix 6
Script4b_Filter2_Remove_excessive_coverage.R
Appendix 7
Script5_Filter3_excessive_individuals.R
Appendix 8
Script6_Filter4_Remove_rare_SNPs.R
Appendix 9
Script7_SNP_call_filter_GATK.sh
Appendix 10
Script8_generate_genotype_blocks_GATK.py
142 Phillip A. Morin et al.
Appendix 11
Script9_Filter6_compare_SNP_datasets.R
Appendix 12
Script10_plotQC.R
Appendix 13
Script11_Filter7_Remove_HWEexcessHet.R
References
1. Goodwin S, McPherson JD, McCombie WR 8. Campbell NR, Harmon SA, Narum SR (2015)
(2016) Coming of age: ten years of next-gener- Genotyping-in-thousands by sequencing
ation sequencing technologies. Nat Rev Genet (GT-seq): a cost effective SNP genotyping
17:333–351. https://doi.org/10.1038/ method based on custom amplicon sequenc-
nrg.2016.49 ing. Mol Ecol Resour 15:855–867. https://
2. Narum SR, Campbell NR, Meyer KA, Miller doi.org/10.1111/1755-0998.12357
MR, Hardy RW (2013) Thermal adaptation 9. Aitken N, Smith S, Schwarz C, Morin PA
and acclimation of ectotherms from differing (2004) Single nucleotide polymorphism
aquatic climates. Mol Ecol 22:3090–3097. (SNP) discovery in mammals: a targeted-gene
https://doi.org/10.1111/mec.12240 approach. Mol Ecol 13:1423–1431
3. Seeb JE, Carvalho G, Hauser L, Naish K, 10. Morin PA, Luikart G, Wayne RK, SNP
Roberts S, Seeb LW (2011) Single-nucleotide Workshop Grp (2004) SNPs in ecology, evo-
polymorphism (SNP) discovery and applications lution and conservation. Trends Ecol Evol
of SNP genotyping in nonmodel organisms. 19:208–216. https://doi.org/10.1016/j.
Mol Ecol Resour 11(Suppl 1):1–8. https:// tree.2004.01.009
doi.org/10.1111/j.1755-0998.2010.02979.x 11. Hancock-Hanser B, Frey A, Leslie M,
4. Morin PA et al (2015) Geographic and temporal Dutton PH, Archer EI, Morin PA (2013)
dynamics of a global radiation and diversification Targeted multiplex next-generation sequenc-
in the killer whale. Mol Ecol 24:3964–3979. ing: advances in techniques of mitochondrial
https://doi.org/10.1111/mec.13284 and nuclear DNA sequencing for population
5. Bryant D, Bouckaert R, Felsenstein J, Rosenberg genomics. Mol Ecol Resour 13:254–268.
NA, RoyChoudhury A (2012) Inferring spe- https://doi.org/10.1111/1755-0998.12059
cies trees directly from biallelic genetic markers: 12. Faircloth BC, McCormack JE, Crawford NG,
bypassing gene trees in a full coalescent analy- Harvey MG, Brumfield RT, Glenn TC (2012)
sis. Mol Biol Evol 29:1917–1932. https://doi. Ultraconserved elements anchor thousands of
org/10.1093/molbev/mss086 genetic markers spanning multiple evolution-
6. Richards PM, Liu MM, Lowe N, Davey JW, ary timescales. Syst Biol 61:717–726. https://
Blaxter ML, Davison A (2013) RAD-Seq derived doi.org/10.1093/sysbio/sys004
markers flank the shell colour and banding loci 13. Lemmon AR, Emme SA, Lemmon EM (2012)
of the Cepaea nemoralis supergene. Mol Ecol Anchored hybrid enrichment for massively
22:3077–3089. https://doi.org/10.1111/ high-throughput phylogenomics. Syst Biol
mec.12262 61:727–744. https://doi.org/10.1093/
7. Takahashi T, Sota T, Hori M (2013) sysbio/sys049
Genetic basis of male colour dimorphism 14. Eck SH, Benet-Pages A, Flisikowski K,
in a Lake Tanganyika cichlid fish. Mol Ecol Meitinger T, Fries R, Strom TM (2009) Whole
22:3049–3060. https://doi.org/10.1111/ genome sequencing of a single Bos taurus
mec.12120 animal for single nucleotide polymorphism
SNP Discovery from NGS Data 143
discovery. Genome Biol 10:R82. https://doi. 26. Li H, Durbin R (2009) Fast and accurate short
org/10.1186/gb-2009-10-8-r82 read alignment with Burrows-Wheeler trans-
15. Pavy N, Gagnon F, Deschenes A, Boyle B, form. Bioinformatics 25:1754–1760. https://
Beaulieu J, Bousquet J (2016) Development of doi.org/10.1093/bioinformatics/btp324
highly reliable in silico SNP resource and geno- 27. DePristo MA et al (2011) A framework for
typing assay from exome capture and sequenc- variation discovery and genotyping using next-
ing: an example from black spruce (Picea generation DNA sequencing data. Nat Genet
mariana). Mol Ecol Resour 16:588–598. 43:491–498. https://doi.org/10.1038/
https://doi.org/10.1111/1755-0998.12468 ng.806
16. Aslam ML et al (2012) Whole genome 28. McKenna A et al (2010) The Genome Analysis
SNP discovery and analysis of genetic Toolkit: a MapReduce framework for analyz-
diversity in Turkey (Meleagris gallopavo). ing next-generation DNA sequencing data.
BMC Genomics 13:391. https://doi. Genome Res 20:1297–1303. https://doi.
org/10.1186/1471-2164-13-391 org/10.1101/gr.107524.110
17. Baird NA et al (2008) Rapid SNP discovery 29. R Core Team (2015) R: a language and environ-
and genetic mapping using sequenced RAD ment for statistical computing. R Foundation
markers. PLoS One 3:e3376. https://doi. for Statistical Computing, Vienna, Austria
org/10.1371/journal.pone.0003376 30. Li H et al (2009) The sequence alignment/
18. Foote AD, Morin PA (2016) Genome-wide map format and SAMtools. Bioinformatics
SNP data suggests complex ancestry of sympat- 25:2078–2079. https://doi.org/10.1093/
ric North Pacific killer whale ecotypes. Heredity. bioinformatics/btp352
https://doi.org/10.1038/hdy.2016.54 31. Card DC et al (2014) Two low coverage bird
19. Narum SR, Buerkle CA, Davey JW, Miller genomes and a comparison of reference-guided
MR, Hohenlohe PA (2013) Genotyping-by- versus de novo genome assemblies. PLoS One
sequencing in ecological and conservation 9:e106649. https://doi.org/10.1371/jour-
genomics. Mol Ecol 22:2841–2847. https:// nal.pone.0106649
doi.org/10.1111/mec.12350 32. Luo R et al (2012) SOAPdenovo2: an empiri-
20. Andrews KR, Good JM, Miller MR, Luikart G, cally improved memory-efficient short-read de
Hohenlohe PA (2016) Harnessing the power novo assembler. Gigascience 1:18. https://
of RADseq for ecological and evolutionary doi.org/10.1186/2047-217X-1-18
genomics. Nat Rev Genet 17:81–92. https:// 33. Simpson JT, Durbin R (2012) Efficient de novo
doi.org/10.1038/nrg.2015.28 assembly of large genomes using compressed
21. Koepfli KP, Paten B, Genome KCS, O’Brien SJ data structures. Genome Res 22:549–556.
(2015) The genome 10K project: a way forward. https://doi.org/10.1101/gr.126953.111
Annu Rev Anim Biosci 3:57–111. https://doi. 34. Li H (2013) Aligning sequence reads, clone
org/10.1146/annurev-animal-090414-014900 sequences and assembly contigs with BWA-
22. i5K Consortium (2013) The i5K initiative: MEM. arXiv:13033997v1 [q-bioGN]
advancing arthropod genomics for knowledge, 35. Lounsberry ZT, Brown SK, Collins PW, Henry
human health, agriculture, and the environ- RW, Newsome SD, Sacks BN (2015) Next-
ment. J Hered 104:595–600. https://doi. generation sequencing workflow for assembly
org/10.1093/jhered/est050 of nonmodel mitogenomes exemplified with
23. Schubert M, Lindgreen S, Orlando L (2016) North Pacific albatrosses (Phoebastria spp.)
AdapterRemoval v2: rapid adapter trimming, iden- Mol Ecol Resour 15:893–902. https://doi.
tification, and read merging. BMC Res Notes 9:88. org/10.1111/1755-0998.12365
https://doi.org/10.1186/s13104-016-1900-2 36. Nielsen R, Paul JS, Albrechtsen A, Song YS
24. Korneliussen TS, Albrechtsen A, Nielsen R (2011) Genotype and SNP calling from next-gen-
(2014) ANGSD: analysis of next generation eration sequencing data. Nat Rev Genet 12:443–
sequencing data. BMC Bioinformatics 15:356. 451. https://doi.org/10.1038/nrg2986
https://doi.org/10.1186/s12859-014-0356-4 37. Cammen KM, Andrews KR, Carroll EL,
25. Li H (2011) A statistical framework for SNP Foote AD, Humble E, Khudyakov JI, Louis
calling, mutation discovery, association map- M, McGowen MR, Olsen MT, Van Cise
ping and population genetical parameter esti- AM (2016) Genomic methods take the
mation from sequencing data. Bioinformatics plunge: recent advances in high-throughput
27:2987–2993. https://doi.org/10.1093/ sequencing of marine mammals. J Hered
bioinformatics/btr509 107(6):481–495
144 Phillip A. Morin et al.
38. Bolger AM, Lohse M, Usadel B (2014) bias in studies of human genome-wide polymor-
Trimmomatic: a flexible trimmer for Illumina phism. Genome Res 15:1496–1502. https://
sequence data. Bioinformatics 30:2114–2120. doi.org/10.1101/gr.4107905
https://doi.org/10.1093/bioinformatics/ 43. Excoffier L, Dupanloup I, Huerta-Sanchez E,
btu170 Sousa VC, Foll M (2013) Robust demographic
39. Kim SY et al (2011) Estimation of allele frequency inference from genomic and SNP data. PLoS
and association mapping using next-generation Genet 9:e1003905. https://doi.org/10.1371/
sequencing data. BMC Bioinformatics 12:231. journal.pgen.1003905
https://doi.org/10.1186/1471-2105-12-231 44. Chen H, Patterson N, Reich D (2010)
40. Skotte L, Korneliussen TS, Albrechtsen A Population differentiation as a test for selective
(2012) Association testing for next-gener- sweeps. Genome Res 20:393–402. https://doi.
ation sequencing data using score statistics. org/10.1101/gr.100545.109
Genet Epidemiol 36:430–437. https://doi. 45. Durvasula A, Hoffman PJ, Kent TV, Liu
org/10.1002/gepi.21636 C, Kono TJ, Morrell PL, Ross-Ibarra
41. Nielsen R (2004) Population genetic analysis of J (2016) ANGSD-wrapper: utilities for
ascertained SNP data. Hum Genomics 1:218–224 analyzing next generation sequencing
42. Clark AG, Hubisz MJ, Bustamante CD, data. Mol Ecol Resour. https://doi.
Williamson SH, Nielsen R (2005) Ascertainment org/10.1111/1755-0998.12578
Chapter 10
Abstract
Next-generation small RNA sequencing is a valuable tool which is increasing our knowledge regarding
small noncoding RNAs and their function in regulating genetic information. Library preparation protocols
for small RNA have thus far been restricted due to higher RNA input requirements (>10 ng), long work-
flows, and tedious manual gel purifications. Small RNA library preparation methods focus largely on the
prevention or depletion of a side product known as adapter dimer that tends to dominate the reaction.
Adapter dimer is the ligation of two adapters to one another without an intervening library RNA insert or
any useful sequencing information. The amplification of this side reaction is favored over the amplification
of tagged library since it is shorter. The small size discrepancy between these two species makes separation
and purification of the tagged library very difficult. Adapter dimer hinders the use of low input samples and
the ability to automate the workflow so we introduce an improved library preparation protocol which uses
chemically modified adapters (CleanTag) to significantly reduce the adapter dimer. CleanTag small RNA
library preparation workflow decreases adapter dimer to allow for ultra-low input samples (down to approx.
10 pg total RNA), elimination of the gel purification step, and automation. We demonstrate how to carry
out this streamlined protocol to improve NGS data quality and allow for the use of sample types with
limited RNA material.
Key words Small RNA, Next-generation sequencing, NGS, Library preparation, Gel-free, MicroRNA,
Adapter dimer, piwiRNA, Non coding RNA, tRNA
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_10, © Springer Science+Business Media, LLC 2018
145
146 Sabrina Shore et al.
RNA, are 18–23 nt and play a major role in gene regulation [3, 4].
Other types of sRNA are comprised of various sizes and as more
NGS data is gathered, their regulatory roles, and functions are
being further classified.
Traditional library preparation protocols for sRNA are plagued
with a specificity issue which is caused by the favored formation of
a well known side product called adapter dimer (see Fig. 1, lane a).
Adapter dimer is an “empty” product where the two adapters have
ligated to themselves without an RNA insert in the middle which
thus produces no valuable sequencing information. A difference of
20–30 nt between a tagged small RNA library and an adapter
dimer makes separation of these products very difficult [3].
Therefore, protocols for small RNA library preparation include
methods to reduce adapter dimer formation or improve size selec-
tion and purification of the desired target. The main solution to
this problem thus far has been to perform a gel extraction of the
desired tagged library. Gel purifications however result in tremen-
dous variability, loss of desired material, and inhibit the automa-
tion of this technique. Furthermore, gel purification doesn’t
completely remove all traces of adapter dimer resulting in contami-
nation of the downstream sequencing reads. The adapter dimer
challenge becomes more prominent when limiting RNA material is
available. At lower RNA inputs the adapter dimer preferentially
amplifies and dominates the PCR reaction leaving very little ampli-
fied library. For this reason, many commercially available kits rec-
ommend 100 ng and more recently 10 ng as the lowest RNA input
to be used. This higher input requirement excludes the use of
many biological type samples which are used for diagnostic testing
since their RNA content is inherently low [5, 6].
Small RNA library preparation is carried out in six major steps
(see Fig. 2): ligation of 3′ adapter to RNA [7], ligation of 5′ adapter
onto half tagged library, cDNA synthesis of full tagged library,
amplification and barcoding of final library, crude library analysis
and quantification, and library purification. This protocol is com-
patible with Illumina sequencing technology.
In this chapter, we provide a detailed protocol for improved
small RNA library preparation (CleanTag small RNA library prepa-
ration workflow) (see Note 1) which uses chemically modified
adapters to suppress adapter dimer formation (see Fig. 1, lane b)
and allows for (1) lower RNA inputs, (2) gel-free purification, and
(3) potential for automation and higher throughput. Chemical
modifications were placed on the 5′ and 3′ Illumina compatible
adapters near the ligation junctions to prevent ligation of the
adapters to one another but allow efficient ligation of the adapters
to the library. Furthermore, the modifications help to suppress
read through by the reverse transcription enzyme if any residual
adapter dimer is formed. Reagent and workflow optimizations
CleanTag Adapters Improve Small RNA Next-Generation Sequencing Library… 147
Fig. 1 Agarose gel results of a small RNA library preparation. (a) Traditional small
RNA library preparation. (b) CleanTag small RNA library preparation
2 Materials
1. Nuclease-free water.
2. 200 μL thin-walled PCR tubes.
3. 1.7 mL eppendorf tubes, pipet tips.
4. Thermal cycler with heated lid.
5. BioAnalyzer 2100 (Agilent Technologies, Santa Clara, CA,
USA) (optional, but recommended).
6. Real Time Thermal Cycler (optional).
7. Qubit fluorometer (ThermoFisher Scientific, Waltham, MA,
USA) (optional).
8. Gel electrophoresis apparatus.
9. Magnetic bead separator.
10. PAGE purified oligonucleotides (Tables 1 and 2). (abbrev:
ddC = 2΄,3΄-dideoxycytidine, rApp = adenylate, MP = meth-
ylphosphonate, 2΄OMe = 2΄-O-methyl, r = RNA).
CleanTag Adapters Improve Small RNA Next-Generation Sequencing Library… 149
2.4 PCR 1. 2× High Fidelity PCR Master Mix: Q5 High Fidelity PCR
Master Mix (New England Biolabs, Ipswich, MA, USA).
2. Forward primer, 20 μM (see Table 1).
3. Reverse Index Primer, 20 μM (see Tables 1 and 2).
2.5 Analysis (See 1. High Sensitivity DNA chip for Agilent Bioanalyzer (Agilent
Note 2) Technologies).
2. 4% Agarose EX Gels (ThermoFisher Scientific).
3. KAPA Quantitative library qPCR kit for Illumina Sequencing
(KAPA Biosystems, Wlmington, MA, USA).
4. Qubit dsDNA HS assay (ThermoFisher Scientific).
Table 1
CleanTag small RNA library preparation oligonucleotide sequences
Name Sequence 5΄ to 3΄
CleanTag 3΄ (rApp)T(MP)GGAATTCTCGGGTGCCAAGG (ddC)
Adapter
CleanTag 5΄ r[GUUCAGAGUUCUACAGUCCGACGAU (C) 2΄OMe]
Adapter
RT primer GCCTTGGCACCCGAGAATTCCA
Forward primer AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA
a
Reverse primer CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA
(Index Primer)
a
Underlined sequence represents unique barcode that is interchangeable with sequences listed below
CleanTag Adapters Improve Small RNA Next-Generation Sequencing Library… 151
Table 2
Illumina barcoding sequences used in index primers (#1–24)
2.6.2 Gel Purification 1. Zymoclean Gel DNA Recovery Kit (Zymo Research, Irvine,
CA, USA).
2. Scalpel.
3 Methods
3.1 Ligation 1. Thaw all the necessary reagents for Subheading 3.1.
of CleanTag 3′ Adapter 2. Determine if modified 3′ adapter requires dilution based on
to RNA Template available RNA input as suggested in Note 3. If necessary dilute
adapter in nuclease-free water and mix well before use. Dilute
enough adapter for all samples plus 10% excess if preparing
master mixes (see Note 5). Each sample requires 1 μL CleanTag
3′ adapter.
3. Prepare a reaction mixture by adding the following compo-
nents into thin-walled PCR tubes in the following order: 1 μL
nuclease-free water, 1 μL CleanTag 3′ adapter, 1 μL RNase
Inhibitor, 1 μL Enzyme 1, and 5 μL Buffer 1, per reaction.
Buffer 1 is extremely viscous, pipet slowly. Keep mix cold. If
preparing multiple reactions refer to Note 5 for tips on making
a master mix and add 9 μL of the master mix to individual reac-
tion tubes. Mix the reaction by pipetting up and down and
pulse spin.
152 Sabrina Shore et al.
3.2 CleanTag 5′ 1. Thaw all the necessary reagents for Subheading 3.2.
Adapter Ligation 2. Determine if dilution of CleanTag 5′ adapter is necessary.
to Half Tagged Small Refer back to Note 3 and make any dilutions necessary in
RNA Library nuclease-free water. Be sure to prepare enough adapter for all
samples with 10% extra to account for pipetting error. Each 5′
adapter ligation reaction requires 2 μL CleanTag 5′ adapter.
3. Add reagents in the following order to the thin-walled PCR
tubes which contain the ligation mixture from Subheading
3.1: 4 μL nuclease-free water, 1 μL Buffer 2, 1 μL RNase
Inhibitor, and 2 μL Enzyme 2 per reaction. Keep cold and do
not vortex enzymes. If preparing multiple reactions a master
mix with 10% excess of each reagent can be mixed and 8 μL
aliquoted to individual reaction tubes (see Note 5). Mix the
reaction by pipetting up and down and pulse spin.
4. Heat CleanTag 5′ adapter for 2 min at 70 °C in thermal cycler
with heated lid. Immediately place on ice.
5. Add 2 μL denatured CleanTag 5′ adapter to reaction mix for a
total of 20 μL. Mix by pipetting.
6. Incubate for 1 h at 28 °C followed by 20 min at 65 °C.
7. Place full tagged library sample on ice.
3.3 cDNA Synthesis 1. Thaw all necessary reagents for Subheading 3.3.
of Full Tagged Library 2. Add 2 μL RT Primer to full tagged library product from
Subheading 3.2 for a total of 22 μL. Mix gently by pipetting
up and down.
3. Place tubes in thermal cycler with heated lid and incubate for
2 min at 70 °C. Immediately place on ice.
4. Add the following to the thin-walled PCR tubes: 1.92 μL
nuclease-free water, 5.76 μL 5× RT Buffer, 1.44 μL 10 mM
dNTP mix, 2.88 μL DTT, 1 μL RNase Inhibitor, and 1 μL RT
Enzyme per reaction for a total of 36 μL. Mix by pipetting up
and down and keep mix cold. If performing multiple reactions
CleanTag Adapters Improve Small RNA Next-Generation Sequencing Library… 153
3.4 Amplification 1. Determine proper index barcode primers to use based on the
and Barcoding of Final number of library samples to be sequenced (see Note 4).
Library 2. Determine appropriate number of PCR cycles to use based on
RNA input as suggested in Note 3.
3. Thaw necessary reagents for Subheading 3.4.
4. Add the following to the thin-walled PCR tubes from
Subheading 3.3: 40 μL 2× High Fidelity PCR Master Mix and
2 μL Forward Primer per reaction (see Note 7). Mix gently by
pipetting up and down. Keep cold. If performing multiple
library reactions a master mix with 10% excess of each reagent
can be prepared and 42 μL of master mix can be added to indi-
vidual reaction tubes.
5. Add 2 μL appropriate Index Reverse Primer to each individual
tube for a total of 80 μL. Mix gently by pipetting up and down.
6. Place tubes in thermal cycler with heated lid and run the follow-
ing cycling conditions: 98 °C for 30 s; x cycles of 98 °C for 10 s,
60 °C for 30 s, 72 °C for 15 s; 72 °C for 10 min, then 4 °C hold,
where x is number of PCR cycles as determined by user.
7. Proceed to analysis and quantification of crude library or place
samples at −20 °C until ready for use.
3.5 Crude Library Several options exist for crude library analysis and quantification
Analysis including (1) Agilent Bioanalyzer by High Sensitivity DNA Chip
and Quantification (Recommended), (2) qPCR (Library Quantification Kit for
Illumina platforms), or (3) gel electrophoresis. Visualization by gel
electrophoresis is an alternative method if Bioanalyzer or qPCR are
not available but gel images are difficult to quantify precisely and
can lead to improper pooling of samples. As well, large sample
volumes are often necessary for visualization on gel which could
consume valuable library sample before being purified and
sequenced.
3.5.1 Crude Library This option is best for visualization of all library peak sizes while
Analysis and Quantification conserving the majority of your sample for purification and
by Agilent Bioanalyzer sequencing. It is important to follow the manufacturer’s instruc-
tions for the proper use and maintenance of this instrument. Clean
electrode pins and change syringe in priming station often.
1. Vortex and spin down PCR products before use. Dilute sam-
ples 1:10 by combining 2 μL PCR product to 18 μL nuclease-
free water. Vortex well and collect liquid at bottom of tubes.
154 Sabrina Shore et al.
3.5.2 Crude Library Several kits are available for quantitative analysis of Illumina com-
Analysis by qPCR patible libraries. It is important to follow the manufacturer’s
instructions in order to obtain accurate results. Library quantifica-
tion of individual crude samples will be necessary prior to gel puri-
fication if pooling samples together.
3.6 Library Multiple options exist for library purification, two of which are: (1)
Purification bead-based size selection or (2) gel extraction. Identify the best
purification method for your experiment (see Note 9).
3.6.1 Bead-Based Size 1. Bring AMPure XP beads to room temperature for 30 min.
Selection Vortex well to ensure a homogenized mixture before use.
Transfer library samples from 0.2 mL PCR tubes into 1.7 mL
microcentrifuge tubes to account for larger volumes used in
this protocol.
2. Add 1× volume of AMPure XP beads to library. For example,
use 80 μL beads with 80 μL PCR product. Vortex well to mix
beads and DNA, then spin down briefly to collect liquid at the
bottom of the tube.
3. Incubate the tubes at room temperature for 10 min.
4. Place the tubes on magnetic rack for 4 min to separate the
beads onto the side of the tube nearest the magnetic rack. The
supernatant will become clear.
CleanTag Adapters Improve Small RNA Next-Generation Sequencing Library… 155
Fig. 3 Bioanalzyer traces of crude small RNA libraries from total human brain RNA at a) high input (1000 ng)
miRNA at 141 nt, or (b) low input (1 ng) miRNA at 142 nt
Fig. 4 Example of manual adjustments in 2100 Expert BioAnalyzer software. (a) Crude trace with miRNA library
at 154 nt and lower marker selected as the first peak. (b) Baseline correction adjustment. (c) Manually adjust
lower marker to the second peak. (d) Crude trace with miRNA library at true size of 142 nt and RT primer at
30 nt (first peak)
3.6.2 Gel Extraction 1. Pool individual samples equally into one mixture according to
Note 10.
2. Load mixture onto a 4% EX Agarose Gel (or best available gel
option). If the sample does not fit into one lane, split the sam-
ple into several lanes. Be sure to include a known marker or
ladder in one of the lanes for size reference. Run gel for 30 min
and then visualize band safely using proper eye protection
against UV.
3. Using scalpel cut out target of interest.
4. Follow Zymoclean Gel DNA Recovery Kit manufacturer’s
instructions for gel purification procedure. Note: May use
alternative gel purification kits if desired.
5. Quantify purified samples. We recommend visualizing sample
with Bioanalyzer to ensure library of interest was properly
extracted (see Fig. 5b). At this point, Qubit may be used to
quantify samples or pools (see Note 11).
4 Notes
Fig. 5 Purified small RNA libraries from total human brain RNA. (a) Bead-based purification of one sample at
1000 ng input. (b) Gel purification of six ultra-low input (100–10 pg) samples which were pooled together prior
to gel purification
CleanTag Adapters Improve Small RNA Next-Generation Sequencing Library… 159
Table 3
Optimization table based on total RNA input: adapter dilution and PCR
cycles
References
1. Lopez JP et al (2015) Biomarker discovery: 7. Song Y, Liu KJ, Wang TH (2014) Elimination
quantification of microRNAs and other small of ligation dependent artifacts in T4 RNA
non-coding RNAs using next generation ligase to achieve high efficiency and low
sequencing. BMC Med Genet 8:35 bias microRNA capture. PLoS One
2. Huttenhofer A, Brosius J, Bachellerie JP (2002) 9(4):e94619
RNomics: identification and function of small, 8. Shore S, Henderson JM, Chow FWN, Tam
non-messenger RNAs. Curr Opin Chem Biol EWT, Quintana JF, Chhatbar K, McCaffrey AP,
6(6):835–843 Zon G, Buck A, Hogrefe RI (2016) Improved
3. Head SR et al (2014) Library construction for next- small RNA library preparation for ultra-low
generation sequencing: overviews and challenges. input sample types, in NGS, SCA, SMA, and
Biotechniques 56(2):61–64. 66, 68, passim Mass Spec: research to diagnostics, San Diego,
4. Vickers KC et al (2015) Mining diverse small CA. http://www.trilinkbiotech.com/work/
RNA species in the deep transcriptome. Trends sRNALibPrep.pdf
Biochem Sci 40(1):4–7 9. Shore S, Henderson JM, Quintana JF, Chow
5. Etheridge A et al (2011) Extracellular FWN, Chhatbar K, McCaffrey AP, Zon G,
microRNA: a new source of biomarkers. Mutat Buck A, Hogrefe RI (2016) Improving NGS
Res 717(1–2):85–90 small RNA discovery in biological fluids and
other low input samples, in keystone sympo-
6. Sterling CH, Veksler-Lublinsky I, Ambros V sia—small RNA silencing: little guides, big
(2015) An efficient and sensitive method for biology (A6), Keystone, CO. http://www.
preparing cDNA libraries from scarce biological trilinkbiotech.com/work/sRNASilencing.
samples. Nucleic Acids Res 43(1):e1 pdf
Chapter 11
Abstract
The emergence of high-throughput sequencing technologies has deepened our understanding of complex
microbial communities and greatly facilitated the study of as-yet uncultured microbes and viruses. Studies
of complex microbial communities require high-quality data to generate valid results. Here, we detail cur-
rent methods of microbial and viral community sample acquisition, DNA extraction, sample preparation,
and sequencing on Illumina high-throughput platforms. While using appropriate analytical tools is impor-
tant, it must not overshadow the need for establishing a proper experimental design and obtaining suffi-
cient numbers of samples for statistical purposes. Researchers must also take care to sample biologically
relevant sites and control for potential confounding factors (e.g., contamination).
Key words 16S/18S ribosomal RNA gene sequences, DNA extraction, Microbiome, Viral purification,
Viral-like particles (VLPs)
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_11, © Springer Science+Business Media, LLC 2018
163
164 Pedro J. Torres and Scott T. Kelley
1.1 Sampling Proper sample handling is essential for avoiding microbial growth and/
Handling or death post-collection, which could significantly alter the organismal
and molecular composition and result in erroneous conclusions.
Improper handling of the samples, such as exposing them for extended
periods at room or hotter temperatures, promotes growth of aerobic
microbes while hindering anaerobic microbial survival [1]. Other
forms of improper handling, such as not using gloves or protective gear
when collecting samples, using unsterilized equipment to gather sam-
ples, or repeated use of once sterile or non-sterile equipment, may
result in permanent sample contamination. It goes without saying that
every effort should be made to use sterile technique and eliminate
potential sources of contamination during the collection phase.
2 Materials
2.5 16S rRNA 1. 515 forward primer (see Notes 2 and 3) [4].
Amplicon PCR 2. 805 reverse primers (see Notes 2 and 3) [4].
3. PCR-grade water.
4. Thin-walled PCR tubes.
5. Platinum Hot Start PCR Master Mix (2×) from ThermoFisher.
4. CsCl or sucrose.
5. Sterile 1.5 mL centrifuge tube.
3 Methods
3.1.1 Oral Microbiome Different oral microbiomes can be obtained by swabbing the gums,
Swabbing Gums, Cheeks, inside of cheeks, tongue, tooth surface, or throat, respectively [7].
Tongue, Tooth Surface, or The microbes obtained by swabbing these inner surfaces contain
Throat not only microbes found in the tongue but also those that may
adhere to the epithelial cell surfaces.
1. The swab can be any type of sterile cotton-tip collection swab
and 20 mL conical tube.
2. Instruct the patient not to eat/drink/smoke/chew gum for at
least 30 min prior to giving a sample.
3. Use the sterile swab tip to rub the inside of oral cavity. Site of
swabbing will depend on question being asked and area you are
interested at. Key is to be consistent in order to replicate results.
4. Store the samples at −80 °C until they are ready to be processed.
5. DNA can then be extracted following steps 5, 8, and 9 above.
Alternate Protocol
1. Instruct the patient not to eat/drink/smoke/chew gum for at
least 30 min prior to giving a sample.
2. Have the patient rinse the oral cavity by swishing 10 mL of
sterile water for 10 s.
3. Open 50 mL conical tube.
4. Have the patient spit the water into the 50 mL conical tube.
5. Screw the lid tightly onto the collection tube and store the oral
wash samples at −80 °C until the samples are ready to be
processed.
6. DNA can then be extracted following steps 5 and 7 above.
3.4 Sample Stool has been used in a number of studies as a proxy for investi-
Collection and DNA gating the gut microbial composition [11–14]. However, stool
Extraction of Gut subsampling can result in large variability of gut microbiome data
Microbiome due to different microenvironments harboring various taxa within
an individual stool [15]. Homogenizing the entire stool sample in
liquid nitrogen and subsampling from this prior to DNA extrac-
tion may achieve reduction of intra-sample variability. There are
numerous ways to collect stool samples and the approach will
largely depend on the model organism used in the study. Protocols
below are used for human [16] and mouse samples.
3.4.1 Human Sample 1. Instruct subjects to place a disposable commode specimen con-
Collection tainer under their toilet seat prior to bowel movement (see Note 5).
2. Using the sterile spatula or plastic spoon, put approx. 1 g into
a 50 mL conical tube and store at −80 °C until the samples are
ready to be processed.
3. If the samples are to be shipped make sure to use next day
delivery.
4. After bowel movement, stool can be wiped off of used bathroom
tissue or by directly swabbing into the largest piece of feces; just
enough to change the color of the swab should suffice.
5. Place swabs into a sterile 15 mL conical tube and store at
−80 °C, or fully submerged in ≥ 95% ethanol, until they are
ready to be processed.
6. If the samples are to be shipped make sure to use next day
delivery.
Sampling, Extraction, and High-Throughput Sequencing Methods for Environmental… 169
3.4.2 Mouse Feces In order to maximize the study group replicates, efforts should be
Sample Collection made to collect from many individual mice over time. To maintain
consistent sampling across all replicates, a daily/weekly collection
schedule is recommended.
1. Mice can individually be placed in single-use containers such as
Solo-32 ounce poly-coated paper food container with lid.
2. Keep mouse in a container for about 10 min and collect 2–3
fecal pellets into a 1.5 mL vial with sterile forceps. Be sure to
continuously switch to a clean sterile forceps between different
mouse sample collections.
3. Quickly store at −80 °C or fully submerged in ≥ 95% ethanol
until processing.
3.5 16S rRNA Once the sample DNA is prepared, PCR can be used with unique
Amplicon PCR barcoded primers to amplify the V4 hypervariable region of the
16S rRNA gene to create an amplicon library for Illumina sequenc-
ing [4] for individual samples (see Note 6).
1. Combine the following into a PCR tube: 13 μL of PCR grade
water, 10 µL Platinum Hot Start PCR Master Mix (2×) from
ThermoFisher, 0.5 μL forward primer (10 μM initial concen-
tration, 0.2 μM final), 0.5 μL reverse primer (10 μM initial
concentration, 0.2 μM final), 1 μL template DNA (5–30 ng/
μL), in a final volume of 25 μL.
2. Cycle the PCR reaction as follows:
Step1: 3min 94 °C
Step 2: 45s 94 °C
Step 3: 60s 50 °C 35 ×
Step 4: 90s 72 °C
Step 5: 10 min 72 °C
Step 6: Hold 4 °C
3. Check amplified PCR products on a 1% agarose gel (w/v) con-
taining DNA gel stain. Expected band size for 515f/806r is
approx. 300–350 bp.
4. Quantify PCR product using a NanoDrop, Picogreen, or
Qubit assay (see the manufacturer’s protocols).
170 Pedro J. Torres and Scott T. Kelley
3.6.1 Soil, Animal 1. Place the sample into appropriate buffer, e.g., PBS or saline
Tissues, or Clinical magnesium buffer [19].
Samples 2. Disrupt the sample with an electric homogenizer at approx.
5000 rpm for 30–60 s or use a mortar and pestle.
3.6.3 Density Gradients 1. Pour each density into clear ultracentrifugation tubes (e.g.,
and Ultracentrifugation Beckman Coulter tubes). For phages use 1 mL of: 1.7, 1.5,
and 1.35 g/mL CsCl densities, respectively. For other viruses
consult the literature for appropriate densities and type of gra-
dient (e.g., CsCl versus sucrose) [19].
2. After the layers are poured, slowly add identical volumes of
samples to the top of each gradient.
3. You must exactly balance the tubes prior to centrifugation.
Add or remove small drops of the sample to the centrifugation
tube until the difference between tubes is less than 1 mg.
4. For phages, centrifuge for 2 h at ~60,000 × g and
4 °C. Depending on the density of the viruses, different speeds
and times will apply. To determine the correct centrifuge speed,
consult literature sources [17–19].
5. Using a sterile 18-gauge needle, place the needle with the
mouth facing upward, just below the appropriate gradient
density where the virus is expected to localize.
6. Carefully withdraw the plunger of the syringe and pull the
desired volume into the syringe barrel. Transfer the collected
virion layers into a sterile 1.5 mL centrifuge tube.
3.6.4 Nucleic Acid Isolation of viral nucleic acids can depend on whether the researcher
Extraction is studying DNA or RNA viruses. If working with DNA viruses
from blood or tissue samples, it may be required to DNase I treat
the samples to ensure removal of background host DNA. This part
Sampling, Extraction, and High-Throughput Sequencing Methods for Environmental… 171
3.6.5 Random cDNA will need to be synthesized if working with RNA (see Note
Amplification 7). There are plenty of reverse transcription reagent kits out there.
Once you have DNA you can go ahead and amplified using the
Genomiphi amplification kit.
1. Mix 1–10 ng of template DNA with 9 μL of sample buffer, mix
and spin down.
2. Denature the sample by heating for 3 min at 95 °C.
3. Cool on ice for 3 min.
4. For each amplification reaction, combine 9 μL of reaction buf-
fer with 1 μL of enzyme, mix and place on ice, add to the
cooled sample from step before.
5. Incubate the sample for 16–18 h at 30 °C (for a maximum of
18 h, but not shorter than 6 h).
6. Inactivate the enzyme by heating the sample for 10 min at
65 °C.
7. Cool to 4 °C.
8. Clean up the amplified DNA with a modified version of a
Qiagen DNeasy Kit.
9. Add 180 μL of buffer ATL, 200 μL of buffer AL and 200 μL
of 100% ethanol to the 20 μL of reaction volume. If the sample
is too viscous heat to 55–65 °C.
10. Add to the column provided with the kit, and proceed with the
purification steps as specified in the manual.
3.6.6 Sequencing DNA generated from the above steps can be taken into any stan-
of Libraries dard Illumina compatible library preparation kit and sequenced on
an Illumina platform sequencing instrument.
172 Pedro J. Torres and Scott T. Kelley
4 Notes
1. There are other kits that may be used to isolate DNA, including
the ones that are suited for higher throughput of samples such
as, QIAamp DNA mini kit and QIAamp DNA stool kit (Qiagen,
Valencia, CA). Consult the literature for collection protocols of
fecal samples from other organisms.
2. 16S rRNA universal bacterial primers are: 515 forward primer
(5′-AAT GAT ACG GCG ACC ACC GAG ATC TAC ACT
ATG GTA ATT GTG TGC CAG CMG CCG CGG TAA-3′)
and 805 reverse primers that also contain unique 12 base pair
Golay barcodes (5′-CAA GCA GAA GAC GGC ATA CGA
GAT NNNNNNNNNNNN GTG ACT GGA GTT CAG
ACG TGT GCT CTT CCG ATC TGG ACT ACH VGG
GTW TCT AAT-3′) [4].
3. Primer set kits (NEXTflex® 16S V4 Amplicon-Seq Kit 2.0 and
NEXTflex® 16S V1-V3 Amplicon-Seq Kit) are commercially
available from BioScientific, Austin, TX, USA for amplifying
the V1–V3 and V4 variable regions of bacteria rRNA.
4. Universal 18S eukaryotic primers are: 1A (5′-AAC CTG GTT
GAT CCT GCC AGT-3′) [20] and 564R (5′-GGC ACC AGA
CTT GCC CTC-3′) [21].
5. Ensure no urine is collected in the sample.
6. Samples should be thawed on ice.
7. Depletion of host DNA extracted RNA samples can be done
using DNase treatment in RNeasy plus mini kit (Qiagen
Hilden, Germany) following the manufacturer’s instructions.
Random hexamer oligonucleotides coupled with unique bar-
codes are used for first-strand cDNA (Life Technologies, Inc.)
and second-strand DNA (New England Biolabs Beverly, MA,
USA) synthesis [22].
References
1. Nechvatal JM, Ram JL, Basson MD, environmental bacterial community samples.
Namprachan P, Niec SR, Badsha KZ et al FEMS Microbiol Ecol 83(2):468–477
(2008) Fecal collection, ambient preserva- 4. Caporaso JG, Lauber CL, Walters WA, Berg-
tion, and DNA extraction for PCR amplifica- Lyons D, Huntley J, Fierer N et al (2012)
tion of bacterial and human markers from Ultra-high-throughput microbial community
human feces. J Microbiol Methods 72(2): analysis on the Illumina HiSeq and MiSeq plat-
124–132 forms. ISME J 6(8):1621–1624
2. Hale VL, Tan CL, Knight R, Amato KR (2015) 5. Torres PJ, Fletcher EM, Gibbons SM, Bouvet
Effect of preservation method on spider mon- M, Doran KS, Kelley ST (2015) Characterization
key (Ateles geoffroyi) fecal microbiota over 8 of the salivary microbiome in patients with pan-
weeks. J Microbiol Methods 113:16–26 creatic cancer. PeerJ 3:e1373
3. Gray MA, Pratte ZA, Kellogg CA (2013) 6. Meadow JF, Altrichter AE, Kembel SW,
Comparison of DNA preservation methods for Moriyama M, O’Connor TK, Womack AM
Sampling, Extraction, and High-Throughput Sequencing Methods for Environmental… 173
et al (2014) Bacterial communities on class- human gut microbiome. Nature 505(7484):
room surfaces vary with human contact. 559–563
Microbiome 2(1):7 15. Gorzelak MA, Gill SK, Tasnim N, Ahmadi-
7. Zaura E, Keijser BJ, Huse SM, Crielaard W Vand Z, Jay M, Gibson DL (2015) Methods
(2009) Defining the healthy “core microbi- for improving human gut microbiome data by
ome” of oral microbial communities. BMC reducing variability through sample processing
Microbiol 9:259 and storage of stool. PLoS One 10(8):e0134802
8. Meadow JF, Bateman AC, Herkert KM, 16. Kumar R, Eipers P, Little RB, Crowley M,
O’Connor TK, Green JL (2013) Significant Crossman DK, Lefkowitz EJ et al (2014)
changes in the skin microbiome mediated by Getting started with microbiome analysis: sam-
the sport of roller derby. PeerJ 1:e53 ple acquisition to bioinformatics. Curr Protoc
9. Forney LJ, Gajer P, Williams CJ, Schneider Hum Genet 82:18.8.1–18.829
GM, Koenig SS, McCulle SL et al (2010) 17. Weynberg KD, Wood-Charlson EM, Suttle
Comparison of self-collected and physician- CA, Oppen MJ v (2014) Generating viral
collected vaginal swabs for microbiome analy- metagenomes from the coral holobiont. Front
sis. J Clin Microbiol 48(5):1741–1748 Microbiol 5:206
10. Dominguez-Bello MG, Costello EK, Contreras 18. Marhaver KL, Edwards RA, Rohwer F (2008)
M, Magris M, Hidalgo G, Fierer N et al (2010) Viral communities associated with healthy and
Delivery mode shapes the acquisition and bleaching corals. Environ Microbiol 10(9):
structure of the initial microbiota across multi- 2277–2286
ple body habitats in newborns. Proc Natl Acad 19. Thurber RV, Haynes M, Breitbart M, Wegley
Sci U S A 107(26):11971–11975 L, Rohwer F (2009) Laboratory procedures to
11. Gill SR, Pop M, Deboy RT, Eckburg PB, generate viral metagenomes. Nat Protoc 4(4):
Turnbaugh PJ, Samuel BS et al (2006) 470–483
Metagenomic analysis of the human distal gut 20. Medlin L, Elwood HJ, Stickel S, Sogin ML
microbiome. Science 312(5778):1355–1359 (1988) The characterization of enzymatically
12. Turnbaugh PJ, Ley RE, Mahowald MA, amplified eukaryotic 16S-like rRNA-coding
Magrini V, Mardis ER, Gordon JI (2006) An regions. Gene 71(10):491–499
obesity-associated gut microbiome with 21. Wang Y, Tian RM, Gao ZM, Bougouffa S,
increased capacity for energy harvest. Nature Qian PY (2014) Optimal eukaryotic 18S and
444(7122):1027–1031 universal 16S/18S ribosomal RNA primers and
13. Koren O, Goodrich JK, Cullender TC, Spor A, their application in a study of symbiosis. PLoS
Laitinen K, Bäckhed HK et al (2012) Host One 9(3):e90053
remodeling of the gut microbiome and meta- 22. Moser LA, Ramirez-Crvajal L, Puri V, Pauszek
bolic changes during pregnancy. Cell 150(3): SJ, Matthews K, Diley KA et al (2016) A uni-
470–480 versal next-generation sequencing protocol to
14. David LA, Maurice CF, Carmody RN, generate noninfectious barcoded cDNA librar-
Gootenberg DB, Button JE, Wolfe BE et al ies from high-containment RNA viruses. mSys-
(2014) Diet rapidly and reproducibly alters the tems 1(3):e00039–e00015
Chapter 12
Abstract
RNA sequencing is a powerful technology that allows for unbiased profiling of the entire transcriptome.
The analysis of transcriptome profiles from heterogeneous tissues, cell admixtures with relative proportions
that can vary several fold across samples, poses a significant challenge. Blood is perhaps the most egregious
example. Here, we describe in detail a computational pipeline for RNA-Seq data preparation and statistical
analysis, with development of a means of estimating the cell type composition of blood samples from their
bulk RNA-Seq profiles. We also illustrate the importance of adjusting for the potential confounding effect
of cellular heterogeneity in the context of statistical inference in a whole blood RNA-Seq dataset.
Key words RNA-Seq, Transcriptomics, Whole blood, Cellular heterogeneity, Cell type-specific
deconvolution
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_12, © Springer Science+Business Media, LLC 2018
175
176 Casey P. Shannon et al.
carried out using tissue samples, but the conclusions drawn from
them are about the behavior of average cells. The aggregate tran-
script abundance across a large population of cells is used as a proxy
for studying the behavior of individual cells (though single-cell
RNA-Seq is growing in popularity). This is reasonable if the tissue
under study is largely composed of a single-cell type. In practice,
however, this assumption is often violated. Tissues are typically cell
admixtures with relative proportions that can vary several folds
across samples.
Perhaps the most egregious example is blood. Blood is often
the tissue specimen of choice for large-scale studies of human biol-
ogy and disease. It is easily collected and provides a global picture
of the status of the immune system. It is, however, a complex tissue,
composed of many different cell types, each with highly specialized
function. The sensitivity and interpretability of analyses carried out
in blood are significantly affected by its dynamic heterogeneity.
Quantifying this heterogeneity is important, allowing us to account
for its effect or to better model interactions that may be present
between the abundance of certain transcripts, some cell types, and
the indication under investigation. Accurate enumeration of the
many component cell types that make up peripheral whole blood
can be costly, however, and further complicates the sample collec-
tion process. Often, such quantification is an afterthought. When
complete enumeration of the blood is not available, its composition
may be estimated, at least to some degree of granularity, directly
from its transcriptomic profile.
In this chapter, we introduce various approaches for doing
this and illustrate the importance of accounting for the cellular
heterogeneity present in the data. We use this admittedly more
advanced topic to introduce RNA-Seq analysis from first princi-
ples. First, we present a simple pipeline for going from the raw
files produced by the sequencing instrument to summarized read
counts suitable for analysis. Next, we take the resulting read
count data into the R statistical computing environment, imple-
ment a simple normalization scheme, and develop a means of
estimating the composition of mixed blood samples from their
RNA-Seq profiles. We do this using freely available data—RNA-
Seq reads obtained from some of the component cell types of
peripheral whole blood—to identify genes that exhibit distinct
patterns of expression across these cell types, termed marker
genes. Finally, we use a second publicly available whole blood
RNA-Seq dataset to illustrate the importance of adjusting for the
potential confounding effect of cellular heterogeneity in the con-
text of statistical inference. We hope that this treatment will be of
interest to RNA-Seq beginners and experts alike.
Cellular Deconvolution of RNA-Seq 177
cd rnaseq_workflow
In the first step of our data processing pipeline, FASTQ files are
subjected to quality control using the FastQC program. FastQC
performs simple quality control checks on raw sequence data com-
ing from high-throughput sequencing pipelines. It provides a
number of analyses that can help determine whether your data has
any problems that you should be aware of before doing any further
analysis. Guidance on interpretation of the various diagnostic plots
is beyond the scope of this chapter, but useful discussion can be
found on FastQC website.
Our Snakemake rule takes each (compressed) FASTQ file in turn
and passes it to the FastQC program, which produces an HTML-
formatted report. Running time will depend on the total number of
reads (approximately 10 min/FASTQ file for 50M reads).
5.1 Read Alignment Reads are first aligned to the reference genome sequence by the
STAR aligner. STAR is a multi-threaded, high-performance align-
ment program, purpose-built for RNA-Seq. In particular, it greatly
improves alignment accuracy for intron-spanning reads. Here, we
primarily favor STAR over alternatives, such as bowtie, for its flexi-
bility and speed. It achieves a very high alignment rate by using a
specialized in-memory reference genome index. This index can be
quite large: if working with the human genome, your system should
have >30 GB of random access memory (RAM).
5.1.1 Generating Before we can process any FASTQ files, we must first build this
the STAR Index index, so that it may subsequently be loaded into memory prior to
carrying out any alignments. This only needs to be done once
before reads can be aligned to a given reference genome sequence.
To do this, we will need:
●● The reference genome sequence in a FASTA-formatted file.
●● A gene annotation file in GTF format.
The most current human reference genome and transcriptome
sequences, and corresponding annotation files, can be obtained
from the GENCODE website. Below, we define a pair of Snakemake
rules to download these files from the Sanger Institute FTP servers
to the ~/genome/ directory on our system. This location is speci-
fied by the downloadDir parameter in the config.yaml file and can
be modified by the user.
1. Snakemake Rule
rule downloadGTF:
output:
expand("{downloadDir}gtf/gencode.v25.anno-
tation.gtf", downloadDir = config["downloadDir"])
params:
url = "ftp://ftp.sanger.ac.uk/pub/gencode/
Gencode_human/release_25/gencode.v25.annotation.
gtf.gz",
downloadDir = config["downloadDir"]
shell:
"wget -O - {params.url} | gunzip -c > {output}"
Cellular Deconvolution of RNA-Seq 181
2. Snakemake Rule
rule downloadFASTA:
output:
expand("{downloadDir}fasta/GRCh38.
p7.genome.fa", downloadDir = config["downloadDir"])
params:
url = "ftp://ftp.sanger.ac.uk/pub/gencode/
Gencode_human/release_25/GRCh38.p7.genome.fa.gz",
downloadDir = config["downloadDir"]
shell:
"wget -O - {params.url} | gunzip -c > {output}"
5.1.2 Aligning Reads Once a suitable genome index has been created, we can begin
aligning reads. STAR will take either a single, or a pair of (com-
pressed) FASTQ files for each sample, take each read or read-
pair (sometimes referred to as a fragment) in turn, and find the
optimal alignment along the reference genome sequence. Reads
(or fragments), along with location of the optimal alignment,
are returned and written to a new file, called a BAM file. This
is the binary version of a SAM file, a tab-delimited text file
standard for sequence alignment data.
We have now described all the rules required to go from raw RNA-
Seq reads contained in FASTQ files to summarized read counts for
a set of annotated genomic features. In this section, we pull
together all these rules into a single Snakefile that both powers, and
serves as a record of, our data processing pipeline, and introduce
the config.yaml file—a mechanism that allows us to customize this
pipeline for different analytical requirements.
expand("rsem/{sample}.genes.results", sample =
config["samples"]),
expand("rsem/{sample}.isoforms.results", sample
= config["samples"])
shell:
"conda env export > environment.yaml"
rule downloadGTF:
output:
expand("{downloadDir}gtf/gencode.v25.annota-
tion.gtf", downloadDir = config["downloadDir"])
params:
url = "ftp://ftp.sanger.ac.uk/pub/gencode/
Gencode_human/release_25/gencode.v25.annotation.gtf.gz",
downloadDir = config["downloadDir"]
shell:
"wget -O - {params.url} | gunzip -c > {output}"
rule downloadFASTA:
output:
expand("{downloadDir}fasta/GRCh38.p7.genome.
fa", downloadDir = config["downloadDir"])
params:
url = "ftp://ftp.sanger.ac.uk/pub/gencode/
Gencode_human/release_25/GRCh38.p7.genome.fa.gz",
downloadDir = config["downloadDir"]
shell:
"wget -O - {params.url} | gunzip -c > {output}"
rule generateRSEMIndex:
input:
gtf = expand("{downloadDir}gtf/gencode.v25.an-
notation.gtf", downloadDir = config["downloadDir"]),
fasta = expand("{downloadDir}fasta/GRCh38.
p7.genome.fa", downloadDir = config["downloadDir"])
output:
expand("{downloadDir}index/rsem/", downloadDir
= config["downloadDir"])
shell:
"rsem-prepare-reference \
-p 8 \
--star \
--gtf {input.gtf} \
{input.fasta} \
{output}GRCh38"
rule fastQC:
input:
lambda wildcards: config["samples"][wildcards.
sample]
output:
"fastQC/{sample}/"
benchmark:
184 Casey P. Shannon et al.
"benchmarks/fastQC/{sample}.benchmark.txt"
threads: 8
params:
outDir = "fastQC/{sample}/"
shell:
"zcat -c {input} | fastqc -t {threads} --out-
dir {params.outDir} stdin"
rule RSEM:
input:
fastqs = lambda wildcards: config["samples"]
[wildcards.sample],
ref = expand("{downloadDir}index/rsem/", down-
loadDir = config["downloadDir"])
output:
"rsem/{sample}.genes.results",
"rsem/{sample}.isoforms.results"
benchmark:
"benchmarks/rsem/{sample}.benchmark.txt"
threads: 8
params:
readsType = config["readsType"],
id = "{sample}"
shell:
"rsem-calculate-expression \
{params.readsType} \
--star \
--star-gzipped-read-file \
-p {threads} \
--no-bam-output \
{input.fastqs} \
{input.ref}GRCh38 \
{params.id} && \
mv -v *.genes.results *.isoforms.results
*.stat/ -t rsem/"
6.2 The config.yaml Some parameters, like directory locations or number of threads
File used, are specific to the system on which the pipeline will be run.
Others, like whether to run STAR or RSEM in paired-end mode,
may need to be modified for every new analysis. We recommend
using the Snakefile as a static description of the data processing
pipeline. Each new project can be initialized with a copy of the
static Snakefile, while the config.yaml file can be modified with
project-specific parameters. We divide the config.yaml file into two
parts: the first part contains the parameters passed to RSEM and
the STAR aligner (e.g., the path to the genome reference folder or
whether the reads are paired-end reads or not), while the second
part lists the samples to be processed, including sample IDs and
paths to their corresponding FASTQ file(s).
Cellular Deconvolution of RNA-Seq 185
6.3 Running A number of programs need to be installed and placed on the sys-
the Pipeline tem's PATH before we can run our RNA-Seq data processing
workflow. We describe below how to set up a suitable environment
6.3.1 Installing
on a Linux system, as a regular user without administrative privi-
the Necessary Software
leges, using Miniconda.
1. At the command line, install Miniconda.
# shell
# run it
bash Miniconda3-latest-Linux-x86_64.sh
# cleanup
rm Miniconda3-latest-Linux-x86_64.sh
6.3.2 A Brief Note The raw read data from an RNA-Seq experiment are delivered in the
on Storage Requirements form of FASTQ files. These are large files and usually compressed.
File size is a factor of read length and depth of coverage. FASTQ file
sizes can be computed as (Read Length × 2 + 50) × (Number of
Reads), where the 2× factor accounts for the read and accompanying
quality scores, and 50 bytes are added for the identifier. Paired end
runs carry an additional 2× multiplier. FASTQ files are typically com-
pressed, and general use compression programs, such as GZIP,
achieve a worst case compression ratio of 0.3. Thus, if our experiment
was a paired end run, with a target of 50M reads per sample, storage
requirements for the raw FASTQ files could be estimated as:
5. Move the FASTQ files from your experiment into the newly
created directory.
# shell
6.3.4 Folder Structure Here is what our home directory should look like after running the
pipeline.
|-[Miniconda3]
|-This is your working environment with all the
tools installed inside.
|-[genome]
|-[gtf]
|-gencode.v25.annotation.gtf
|-[fasta]
|-GRCh38.p7.genome.fa
|-[index]
|-[rsem]
|-This folder contains the RSEM genome
reference files.
|-[rnaseq_workflow]
|-requirements.yaml
|-Snakefile
|-config.yaml
|-[fastq]
|-sample1_1.fastq.gz
Cellular Deconvolution of RNA-Seq 187
|-sample1_2.fastq.gz
|-sample2_1.fastq.gz
|-sample2_2.fastq.gz
...
|-[fastQC]
|-[sample1]
|-stdin_fastqc.html
|-stdin_fastqc.zip
|-[samples2]
|-stdin_fastqc.html
|-stdin_fastqc.zip
...
|-[rsem]
|-sample1.genes.results
|-sample1.isoforms.results
|-[sample1.stat]
|-sample2.genes.results
|-sample2.isoforms.results
|-[sample2.stat]
...
|-[benchmarks]
|-[fastQC]
|-[rsem]
When invoked for the first time, the pipeline will call the down-
loadGTF, downloadFASTA, and generateRSEMIndex rules to
download the GTF annotation and reference genome sequence
FASTA files, and generate the necessary STAR genome index. If
the genome index already exists, however, the pipeline will skip
these rules and immediately.start to process the FASTQ files speci-
fied in the config.yaml file.
7 Analysis in R
7.1 Preparing the R The Conda environment we created and activated above installed
Environment R unto the system. The base R installation includes many useful
functions, but before we start, we need to install a few additional
packages to tackle the proposed analysis:
1. tidyr and dplyr: our preferred functions for general data
manipulation.
2. purrr: enables use of functional programming idioms with the
%>% operator popularized by dplyr.
3. readr: read (compressed) flat text file into the R environment
fast.
4. ggplot2: a powerful, high-level, graphics package for R.
5. matrixStats: fast implementation of row and column-wise
summary statistics.
6. limma: differential gene expression using flexible linear
models.
7. glmnet: fitting generalized linear models via penalized maxi-
mum likelihood.
8. CellCODE: estimate cellular composition of mixed tissue sam-
ples from their gene expression using a latent variable approach.
First, we start R.
# shell
R
# plotting
library(ggplot2)
library(cowplot)
# stats
library(limma)
library(glmnet)
7.2 Loading Read First, we read the summarized counts and associated metadata into
Counts into R R, and tidy up a bit.
# list summarized read count files
tsvs <- list.files(path = 'EGEOD60424', pattern =
'*.genes.results.gz',
recursive = T, full.names = T)
# pretty names
names(tsvs) <- gsub( '^EGEOD60424/(.+)\\.genes.+',
'\\1', tsvs)
The result is a matrix of read counts, with rows for genes and
columns for samples. limma's voom function, which we will need
to invoke before carrying out differential expression analysis,
requires that rows (genes) with zero or very low counts are
removed, so we do that next. We choose to remove any row where
fewer than three samples (corresponding to the size of our smallest
class of samples) have greater than 20 counts.
# filter on low counts
counts_above_20 <- apply(geo, 1, function(x) sum(x >
20))
geo <- geo[counts_above_20 >= 3, ]
7.3 Applying Scale It is usual to apply scale normalization to RNA-Seq read counts.
Normalization We use a simple total count normalization.
# counts per million mapped reads
geo_norm <- t(t(geo) * 1e6/colSums(geo))
asinh() %>%
prcomp(center = T, scale. = T)
# normalized - counts per million mappped reads
pca_norm <- geo_norm %>%
t() %>%
asinh() %>%
prcomp(center = T, scale. = T)
7.4 See Fig. 2 It is clear from these plots that finding genes able to distinguish
between B cells, CD4+, CD8+ T cells and NK cells may be signifi-
cantly more challenging than identifying genes distinctive of
monocytes or neutrophils.
192 Casey P. Shannon et al.
7.4.1 Using limma We first attempt this using limma. The code below is equivalent to
a one-way ANOVA for each gene except that the residual mean
squares have been moderated between genes.
# design matrix
design <- model.matrix(~ 0 + cell, data = meta)
7.4.2 Using Next, we try the approach recommended by Chikina and others.
the CellCODE Method This approach is similar to the above, but marker genes must also
pass a minimum fold-change cutoff criteria. An implementation is
available in the CellCODE package.
# let CellCODE pick marker genes
markers_cellcode <- geo_norm %>%
reshape2::melt() %>%
reshape2::acast(Var1 ~ Var2, mean) %>%
CellCODE::tagData(., cutoff = 2, max = 10)
Fig. 3 Clustered heatmap of the top 60 marker genes identified by this approach.
The expression of 60 marker genes identified by ANOVA is visualized using a
clustered heatmap
7.4.3 Using Penalized Conceptually, the identification of marker genes is related to feature
Regression with glmnet selection in the context of classification. glmnet is an R package that
fits generalized linear models via penalized maximum likelihood. It
performs feature selection and can be used for classification or
regression problems. We have previously used glmnet to derive use-
ful marker genes [10].
The code below uses glmnet to identify a minimal subset of
genes capable of classifying cell types from their gene expression
profiles.
# set seed to make cross-validation reproducible
set.seed(123)
# fit model
model <- cv.glmnet(t(geo_norm), colnames(geo_norm),
family = 'multinomial')
Fig. 4 Clusterd heatmap of the 60 marker genes identified by this approach. The
expression of 60 marker genes identified by CellCODE is visualized using a clus-
tered heatmap
7.5 Estimating Next, we will use these marker genes to infer the composition of
the Cellular whole blood samples from their RNA-Seq profiles (GSE53655)
Composition of Blood using the CellCODE approach.
First, we read in some whole blood RNA-Seq profiles:
# blood transcriptome profiles
# list summarized read count files
tsvs <- list.files(path = 'EGEOD53655', pattern =
'*.genes.results.gz',
Cellular Deconvolution of RNA-Seq 195
Fig. 5 Clustered heatmap of the 43 marker genes identified by this approach. The
expression of 43 marker genes identified by elastic net penalized regression is
visualized using a clustered heatmap
Fig. 6 Overlap between the various marker genesets is visualized using an UpSet plot. The marker genes
identified by the various approaches are largely distinct. Empty sets are omitted
196 Casey P. Shannon et al.
recursive = T, full.names = T)
# pretty names
names(tsvs) <- gsub( '^EGEOD53655/(.+)\\.genes.+',
'\\1', tsvs)
# scale normalization
exp <- t(t(exp) * 1e6/colSums(exp))
# read in metadata
y <- readr::read_tsv('E-GEOD-53655.meta.sdrf.txt')
7.5.1 Predicting Cell Next, we use the getAllSPVs function from the CellCODE pack-
Proportions Using age, to obtain surrogate proportion variables (SPVs) directly from
CellCODE the whole blood RNA-Seq profiles (Fig. 7).
# CellCODE markers
cc <- CellCODE::getAllSPVs(data = exp,
grp = y$sex,
dataTag = markers_cellcode)
pred_cellcode <- cc %>%
as.data.frame() %>%
mutate(run = colnames(exp)) %>%
gather(markers, spv, -run) %>%
select(markers, run, spv)
Cellular Deconvolution of RNA-Seq 197
Fig. 7 CellCODE-derived SPVs vs. leukocyte differential counts. Scatterplots comparing the CellCODE-derived
surrogate proportion variables (SPVs) to cell percentages obtained from a complete blood count, including
leukocyte differential. The coefficient of correlation (Pearson’s r) is shown for each cell-type (top-left of each
plot), as well as a linear best fit line (blue) with 95% confidence intervals (grey)
Fig. 8 glmnet-derived SPVs vs. leukocyte differential counts. Scatterplots comparing the glmnet-derived surrogate
proportion variables (SPVs) to cell percentages obtained from a complete blood count, including leukocyte differen-
tial. The coefficient of correlation (Pearson’s r) is shown for each cell-type (top-left of each plot), as well as a linear
best fit line (blue) with 95% confidence intervals (grey)
7.5.2 Predicting Cell We repeat the procedure, this time using the features retained by
Proportions Using elastic net as marker genes to derive the SPVs (Fig. 8).
glmnet-selected Features
# glmnet markers
pred_enet <- model %>%
predict(type = 'nonzero', s = 'lambda.1se') %>%
map(unlist, use.names = F) %>%
map(~ {
# remove low variance
x <- t(exp[.x, ])
x <- x[ , matrixStats::colVars(x) > 0]
# svd
pca <- prcomp(x, center = T, scale. = T)
# extract 1st PC
pca$x[ , 1]
}) %>%
map(as.list) %>%
198 Casey P. Shannon et al.
map(data.frame) %>%
bind_rows(.id = 'markers') %>%
gather(run, spv, -markers)
7.6 Adjusting Finally, we use the estimates we just derived to compare the results
for Cellular of a simple differential gene expression analysis, comparing males
Composition in Linear to females, with or without adjusting for the potential confound-
Models ing effect of cellular composition.
Again, we use limma to do this. We first fit a linear model that
only includes the response variable (sex) to the data.
# join to predicted cell proportions
y <- pred_enet %>%
spread(markers, spv) %>%
right_join(y, by = 'run')
# unadjusted
d_unadjusted <- model.matrix(~ sex, data = y)
unadjusted <- exp %>%
var_filter() %>%
voom(d_unadjusted) %>%
lmFit(d_unadjusted) %>%
eBayes(robust = T, trend = T)
Fig. 9 Distribution of q-values, with and without adjusting for cellular compo-
sition. The distribution of q-values (false discovery rate) obtained when iden-
tifying differentially abundant genes between males and females is visualized
for a linear model including (bottom) or not (top) cellular proportion estimates
as covariates
8 Notes
References
1. Andrews S. FastQC – a quality control tool for or without a reference genome. BMC
high throughput sequence data. At <http:// Bioinformatics 12:323
www.bioinformatics.babraham.ac.uk/proj- 4. Conda – package, dependency and environ-
ects/fastqc/> ment management for any language. At
2. Dobin A, Davis CA, Schlesinger F, Drenkow J, <http://conda.pydata.org/miniconda.html>
Zaleski C, Jha S, Batut P, Chaisson M, Gingeras 5. Köster J, Rahmann S (2012) Snakemake—a
TR (2013) STAR: Ultrafast universal RNA-seq scalable bioinformatics workflow engine.
aligner. Bioinformatics 29:15–21 Bioinformatics 28:2520–2522
3. Li B, Dewey CN (2011) RSEM: Accurate tran- 6. Chikina M, Zaslavsky E, Sealfon SC (2015)
script quantification from RNA-Seq data with CellCODE: a robust latent variable approach
Cellular Deconvolution of RNA-Seq 201
to differential expression analysis for heteroge- sequencing and microarray studies. Nucleic
neous cell populations. Bioinformatics Acids Res 43:e47–e47
31:1584–1591 9. Zou H, Hastie T (2005) Regularization and
7. Shin H, Shannon CP, Fishbane N, Ruan J, variable selection via the elastic net. J R Statist
Zhou M, Balshaw R, Wilson-McManus JE, Ng Soc B 67:301–320
RT, McManus BM, Tebbutt SJ (2014)
10. Shannon CP, Balshaw R, Ng RT, Wilson-
Variation in RNA-Seq transcriptome profiles of McManus JE, Keown P, McMaster R,
peripheral whole blood from healthy individu- McManus BM, Landsberg D, Isbel NM, Knoll
als with and without globin depletion. PLoS G, Tebbutt SJ (2014) Two-stage, in silico
ONE 9:e91041 deconvolution of the lymphocyte compart-
8. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, ment of the peripheral whole blood transcrip-
Shi W, Smyth GK (2015) Limma powers dif- tome in the context of acute kidney allograft
ferential expression analyses for RNA- rejection. PLoS ONE 9:e95224
Chapter 13
Abstract
Genome-wide functional genomic screens utilizing the clustered regularly interspaced short palindromic
repeats (CRISPR)-Cas9 system have proven to be a powerful tool for systematic genomic perturbation in
mammalian cells and provide an alternative to previous screens utilizing RNA interference technology. The
wide availability of these libraries through public plasmid repositories as well as the decreasing cost and speed
in quantifying these screens using high-throughput next-generation sequencing (NGS) allows for the adoption
of the technology in a variety of laboratories interested in diverse biologic questions. Here, we describe the
protocol to generate next-generation sequencing libraries from genome-wide CRISPR genomic screens.
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_13, © Springer Science+Business Media, LLC 2018
203
204 Edwin H. Yau and Tariq M. Rana
2 Materials
2.1 Genome-Wide 1.
GeCKOv2 library plasmid (Addgene pooled library
CRISPR-Cas9 #1000000048 or #1000000049).
Lentiviral Pooled 2.
Endura electrocompetent cells with Recovery Medium
Library (Lucigen Middleton, WI, USA).
2.1.1 Plasmid Library 3. LB agar bacterial plates.
Amplification 4. Ampicillin 100 mg/mL: made in sterile water and sterilized by
filtration (0.2 μm filter).
5. Incubator 32 °C.
6. Electroporator (i.e., Gene Pulsar II; Bio-Rad, Hercules, CA, USA).
7. Cell scraper (Corning, Corning, NY, USA).
8. DNA Maxi prep kit.
Next-Generation Sequencing of Genome-Wide CRISPR Screens 205
2.2.1 Option A 1. gDNA Lysis Buffer: 50 mM Tris, 50 mM EDTA, 1% SDS,
pH 8. To 75 mL of water add 5 mL of 1 M Tris pH 8.0, 10 mL
of 0.5 M EDTA pH 8.0, and 5 mL of 10% SDS.
2. RNase A (100 mg): add 10 mL of gDNA Lysis Buffer to
100 mg of RNase A for 10 mg/mL stock solution. Store
at 4 °C.
3. Proteinase K (Qiagen, Hilden, Germany).
4. Ammonium Acetate 7.5 M (Sigma, St. Louis, MO, USA):
Dissolve 57.81 g of ammonium acetate in water to a final vol-
ume of 100 mL. Sterilize by filtration (0.2 μm filter). Store at
4 °C.
5. Isopropanol.
6. Ethanol.
7. Incubator 55 °C.
8. Table top centrifuge.
206 Edwin H. Yau and Tariq M. Rana
2.3 PCR NGS 1. NEBNext Q5 Hot Start HiFI PCR Master Mix (New England
Library Prep Biolabs, Ipswich, MA, USA) or Herculase II Fusion DNA poly-
merase (Agilent Technologies, Santa Clara, CA, USA) (see Note 1).
2. PCR grade water.
3. Thermocycler.
4. PCR1 F primer: AATGGACTATCATATGCTTACCGTAA
CTTGAAAGTATTTCG.
5.
PCR1 R primer: TCTACTATTCTTTCCCCTGCACTG
TTGTGGGCGATGTGCGCTCTG.
6. QIAquick PCR purification kit (Qiagen) or Wizard SV PCR
cleanup kit (Promega).
7. PCR2 F :
Primer
no. lllumina P5 and lllumina Seq Stagger Inline barcode Priming site
F01 AATGATACGGCGACCACCGAG t AAGTAGAG tcttgtggaaaggacgaaacaccg
ATCTACACTCTTTCCCTACA
CGACGCTCTTCCGATCT
F02 AATGATACGGCGACCACCGAG at ACACGATC tcttgtggaaaggacgaaacaccg
ATCTACACTCTTTCCCTACA
CGACGCTCTTCCGATCT
F03 AATGATACGGCGACCACCGAG gat CGCGCGGT tcttgtggaaaggacgaaacaccg
ATCTACACTCTTTCCCTACA
CGACGCTCTTCCGATCT
F04 AATGATACGGCGACCACCGAG cgat CATGATCG tcttgtggaaaggacgaaacaccg
ATCTACACTCTTTCCCTACA
CGACGCTCTTCCGATCT
F05 AATGATACGGCGACCACCGAG tcgat CGTTACCA tcttgtggaaaggacgaaacaccg
ATCTACACTCTTTCCCTACAC
GACGCTCTTCCGATCT
F06 AATGATACGGCGACCACCGAGA atcgat TCCTTGGT tcttgtggaaaggacgaaacaccg
TCTACACTCTTTCCCTACACG
ACGCTCTTCCGATCT
F07 AATGATACGGCGACCACCGAGA gatcgat AACGCATT tcttgtggaaaggacgaaacaccg
TCTACACTCTTTCCCTACACG
ACGCTCTTCCGATCT
F08 AATGATACGGCGACCACCGAGA cgatcgat ACAGGTAT tcttgtggaaaggacgaaacaccg
TCTACACTCTTTCCCTACACGA
CGCTCTTCCGATCT
(continued)
Next-Generation Sequencing of Genome-Wide CRISPR Screens 207
Primer
no. lllumina P5 and lllumina Seq Stagger Inline barcode Priming site
F09 AATGATACGGCGACCACCGAGAT acgatcgat AGGTAAGG tcttgtggaaaggacgaaacaccg
CTACACTCTTTCCCTACACGAC
GCTCTTCCGATCT
F10 AATGATACGGCGACCACCGAGAT t AACAATGG tcttgtggaaaggacgaaacaccg
CTACACTCTTTCCCTACACGAC
GCTCTTCCGATCT
8. PCR2 R primers:
Primer
no. lllumina P7 Index barcode lllumina seq R Stagger Priming site
R01 CAAGCAGAA AAGTAGAG GTGACTGGAGTT t TCTACTATTCTTT
GACGGCA CAGACGTGTGC CCCCTGCACTGT
TACGAGAT TCTTCCGATCT
R02 CAAGCAGAA ACACGATC GTGACTGGAGTT at TCTACTATTCTTT
GACGGCAT CAGACGTGTGC CCCCTGCACTGT
ACGAGAT TCTTCCGATCT
R03 CAAGCAGAA CGCGCGGT GTGACTGGAGTT gat TCTACTATTCTTT
GACGGCAT CAGACGTGTG CCCCTGCACTGT
ACGAGAT CTCTTCCGATCT
R04 CAAGCAGAA CATGATCG GTGACTGGAGTT cgat TCTACTATTCTTT
GACGGCAT CAGACGTGTGC CCCCTGCACTGT
ACGAGAT TCTTCCGATCT
R05 CAAGCAGAA CGTTACCA GTGACTGGAGTT tcgat TCTACTATTCTTT
GACGGCAT CAGACGTGTGC CCCCTGCACTGT
ACGAGAT TCTTCCGATCT
R06 CAAGCAGAA TCCTTGGT GTGACTGGAGTT atcgat TCTACTATTCTTT
GACGGCAT CAGACGTGTGC CCCCTGCACTGT
ACGAGAT TCTTCCGATCT
R07 CAAGCAGAA AACGCATT GTGACTGGAGTT gatcgat TCTACTATTCTTT
GACGGCAT CAGACGTGTGC CCCCTGCACTGT
ACGAGAT TCTTCCGATCT
R08 CAAGCAGAA ACAGGTAT GTGACTGGAGTT cgatcgat TCTACTATTCTTT
GACGGCAT CAGACGTGTGC CCCCTGCACTGT
ACGAGAT TCTTCCGATCT
R09 CAAGCAGAA AGGTAAGG GTGACTGGAGTTC acgatcgat TCTACTATTCTTT
GACGGCAT AGACGTGTGCTC CCCCTGCACTGT
ACGAGAT TTCCGATCT
RIO CAAGCAGAAG AACAATGG GTGACTGGAGT t TCTACTATT
ACGGCATAC TCAGACGTGTG CTTTCCCC
GAGAT CTCTTCCGATCT TGCACTGT
208 Edwin H. Yau and Tariq M. Rana
3 Methods
3.1 Screen Design Forward (or functional) genomic screens can fundamentally be
carried out in two ways. One is dropout (or negative selection)
screening in which a gene knockout results in a selective disadvan-
tage in that cell such as decreased proliferation in cancer cells. The
other is enrichment (or positive selection) screening in which gene
knockout results in selective advantage for cells such as drug or
toxin resistance.
After the appropriate phenotype has been selected and a
dropout or resistance screen has been designed, two other critical
factors to take into account in the design of genome-wide screens
is (1) the depth of coverage of each sgRNA guide and (2) the num-
ber of sgRNA guides per gene. Experience from previous genome-
wide shRNA screens suggested that dropout screens require much
higher coverage than resistance screens with the recommendation
for at least 500–1000× coverage of each individual element for
dropout screens to obtain enough signal over noise [15]. Because
of the different mechanisms of CRISPR-Cas9 resulting in gene
knockouts rather than knockdown, it seems that signal-to-noise
ratio is higher than in shRNA screens and the initial CRISPR-Cas9
genomic dropout screens have been carried out at a minimum of
200–300× coverage [9, 11]. For shRNA screens, because of the
significant off-target effects of shRNAs, one typically needed to
include many shRNAs per gene (typically 5–25 shRNAs per gene)
to overcome the noise generated by off-target effects. CRISPR
screening appears to have less off-target effects, and the reason to
include multiple sgRNAs per gene for CRISPR screening appears
to be the variability in the efficiency of each individual sgRNA
rather than the off-target noise such that increasing the amount of
sgRNAs appears to increase the sensitivity of the screen rather than
the specificity [10, 16]. One of the largest hurdles in performing
genome-wide screening is providing for adequate coverage of the
library by using appropriate amounts of starting cells and maintain-
ing that coverage throughout the experiment. For example, if
using the entire human GeCKOv2 library containing 122,411
sgRNA guides, one would need to transduce 2 × 108 cells at 30%
efficiency and maintain 6 × 107 cells at each passage and harvesting
time point for each biological replicate. This amount of cells could
be prohibitive in certain situations. An alternative emerging
Next-Generation Sequencing of Genome-Wide CRISPR Screens 209
3.2.2 Lentiviral Library 1. Culture 293FT cells in DMEM with 10% FBS (D10 media)
Packaging in 293FT HEK without antibiotics, prepare a 10 cm plate about 90% confluent
Cells day of transfection.
2. Wash the cells with 3 mL room temperature PBS.
3. Aspirate PBS, add 3 mL Trypsin/EDTA and incubate at 37 °C
for 2 min.
4. Add 3 mL D10 to inactivate Trypsin/EDTA and collect the
cells by centrifugation (1000 rpm for 10 min).
5. Resuspend the cells in 5 mL pre-warmed DMEM media
without serum.
6. Plate the cells in a 10 cm tissue culture plate and place in an
incubator.
7. Prepare plasmid/PLUS mixture by adding 900 μL room tem-
perature Opti-MEM in a sterile 1.5 mL microcentrifuge tube.
Add 12 μg of amplified GeCKO library plasmid, 9 μg psPAX2
plasmid, 6 μg pMD2.G plasmid, and 48 μL of PLUS reagent.
8. Prepare lipofectamine mixture by adding 60 μL of lipo-
fectamine reagent to 900 μL of room temperature Opti-MEM
in a sterile 1.5 mL microcentrifuge tube.
9. Vortex mixtures briefly and incubate at room temperature for
5 min.
10. Combine plasmid/PLUS mixture with lipofectamine mixture.
Vortex briefly and incubate at room temperature for 15 min.
11. Add plasmid/lipofectamine mixture dropwise to cell suspen-
sion from step 6 and place in an incubator for 4–6 h.
12. Aspirate media and replace with 10 mL fresh D10 media.
13. Harvest the supernatant after 60 h into a 50 mL conical tube.
14. Remove cellular debris by centrifugation at 1600 × g at 4 °C for
10 min.
15. Filter the supernatant through a 0.45 μm syringe filter.
210 Edwin H. Yau and Tariq M. Rana
3.2.3 Functional Titration 1. Harvest target cells and count cells and resuspend cells at a
of Pooled Lentiviral CRISPR concentration of 2–3 × 106 cells/mL in appropriate media for
Library the cell type.
2. Add 1 mL cells into each well of a 12-well tissue culture plate
and add polybrene to a final concentration of 8 μg/mL per well.
3. Add different amounts of concentrated virus (usually in the
range of 0.5–20 μL for virus generated from GeCKOv2 1 plas-
mid system, titers are generally higher if using the 2 plas-
mid system) to each well.
4. Spinfect by centrifugation at 700 × g for 1–2 h at 32 °C.
5. Remove media by aspiration and replace with fresh media
without polybrene.
6. Incubate for 6 h or overnight then collect cells by washing
gently with 500 μL PBS per well.
7. Aspirate PBS and add 500 μL Trypsin/EDTA per well and
incubate until cells detach. Inactivate by adding 1 mL appro-
priate media for cell type with serum per well.
8. Centrifuge the cells at 1000 rpm for 5 min and aspirate media.
9. Resuspend in cells collected from each well in 10 mL of appro-
priate media and add 1 mL of cell suspension to duplicate wells
in a 6-well tissue culture plate. Add another 3 mL of appropri-
ate media to each well.
10. In each duplicate, add puromycin to one replicate well at a
concentration that results in no surviving cells after 3 days.
11. After 3 days of puromycin selection count viable cells using
CellTiterGlo by aspirating media and adding CellTiterGlo
(diluted 1:1 in PBS) to each well.
12. Cover the plates with foil and shake for 2 min followed by
incubation for 10 min at room temperature.
13. Read luminescence on plate reader and compare with control
untreated plate.
14. Choose the concentration of virus resulting in 20–40% of cell
survival compared to untreated replicate well (this corresponds
to a MOI of around 0.3–0.4).
15. Scale up to necessary amount of cells to conduct primary
screen by increasing the number of wells spinfected with
2–3 × 106 cells/well.
Next-Generation Sequencing of Genome-Wide CRISPR Screens 211
3.3 Harvesting 1. For each time point and replicate in the screen, collect and
Genomic DNA freeze the amount of cells required to maintain predetermined
coverage (6 × 107 cells for 500× coverage of full GeCKOv2
human library) at −20 °C in 15 mL conical tubes with up to
200 mg tissue or 3 × 107 cells per tube.
2. Thaw cells until the pellet can be dislodged easily by flicking
tube, proceed to genomic DNA extraction using salt precipita-
tion (option A) or commercial silica-membrane kits (option B).
3.3.1 Option A 1. The salt precipitation method used in the screen conducted by
the Zhang lab [17] provides high yield genomic DNA with
consistent results in sequencing. This option is more cost-
effective and simpler when extracting DNA from large amounts
of cells or tissues compared to using the commercial silica
membrane kits.
2. For 100–200 mg of tissue or 3 × 107–5 × 107 cells in a 15 mL
conical tube, add 6 mL gDNA lysis buffer and 30 μL of pro-
teinase K to tissue/cells.
3. Incubate at 55 °C overnight.
4. After complete lysing of cells/tissue, add 30 μL of 10 mg/mL
RNase A.
5. Incubate at 37 °C for 30 min, then cool on ice.
6. Add 2 mL of cold 7.5 M ammonium acetate and vortex for 20 s.
7. Centrifuge >4000 × g for 10 min.
8. Transfer the supernatant to a new tube and add 6 mL 100%
isopropanol. Mix by inverting tube 50 times.
9. Centrifuge >4000 × g for 10 min. Discard the supernatant, add
6 mL of 70% ethanol, and mix by inverting tube ten times.
10. Centrifuge >4000 × g for 5 min, discard the supernatant by
decanting and aspirating with P200 tip.
11. Air dry 10–30 min until the pellet becomes slightly translucent.
12. Resuspend in 500 μL of TE.
13. Incubate at 65 °C for 1 h, then at room temperature overnight.
14. Measure gDNA concentration on nanodrop 2000 next day.
3.3.2 Option B 1. Extract gDNA from frozen cell pellets using QIAamp Blood
midi kit according to the manufacturer’s protocol.
2. Elute gDNA in EB or TE buffer and incubate at room tem-
perature overnight.
3. Measure gDNA concentration on nanodrop 2000 next day.
3.4 PCR NGS Library 1. NGS libraries are generated by a two-step PCR where the first
Prep PCR amplifies the sgRNA region utilizing primers recogniz-
ing constant lentiviral integration sequence and a second
212 Edwin H. Yau and Tariq M. Rana
Fig. 1 Schematic representation of two step PCR for the generation of NGS libraries (Subheading 3.4)
4 Notes
Acknowledgments
References
1. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna 4. Doudna JA, Charpentier E (2014) Genome
JA, Charpentier E (2012) A programmable editing. The new frontier of genome engineer-
dual-RNA-guided DNA endonuclease in adaptive ing with CRISPR-Cas9. Science 346:1258096
bacterial immunity. Science 337:816–821 5. Hsu PD, Lander ES, Zhang F (2014)
2. Mali P, Yang L, Esvelt KM, Aach J, Guell M, Development and applications of CRISPR-
DiCarlo JE, Norville JE, Church GM (2013) Cas9 for genome engineering. Cell
RNA-guided human genome engineering via 157:1262–1278
Cas9. Science 339:823–826 6. Sander JD, Joung JK (2014) CRISPR-Cas
3. Cong L, Ran FA, Cox D, Lin S, Barretto R, systems for editing, regulating and targeting
Habib N, Hsu PD, Wu X, Jiang W, Marraffini genomes. Nat Biotechnol 4:347–355
LA, Zhang F (2013) Multiplex genome-engi- 7. Sternberg SH, Doudna JA (2015) Expanding
neering using CRISPR/Cas systems. Science the biologist’s toolkit with CRISPR-Cas0. Mol
339:819–823 Cell 58:568–574
216 Edwin H. Yau and Tariq M. Rana
8. Wang T, Birsoy K, Hughes NW, Krupczak Panning B, Pioegh HL, Bassik MC, Qi LS,
KM, Post Y, Wei JJ, Lander ES, Sabatini DM Kampmann M, Weissman JS (2014) Genome-
(2015) Identification and characterization of scale CRISPR-mediated control of gene repres-
essential genes in the human genome. Science sion and activation. Cell 159:647–661
350:1096–1101 14. Virreira Winter S, Zychlinsky A, Bardoel BW
9. Hart T, Chandrashekhar M, Aregger M, Steinhart (2016) Genome-wide CRISPR screen reveals
Z, Brown KR, MacLeod G, Mis M, Zimmermann novel host factors required for Staphylococcus
M, Fradet-Turcotte A, Sun S, Mero P, Dirks P, aureus α-hemolysin-mediated toxicity. Sci Rep
Sidhu S, Roth FP, Rissland OS, Durocher D, 6:24242
Angers S, Moffat J (2015) High-resolution 15. Kampmann M, Bassik MC, Weissman JS
CRISPR screen reveal fitness genes and genotype- (2014) Functional genomics platform for
specific cancer liabilities. Cell 163:1515–1526 pooled screening and mammalian genetic
10. Morgens DW, Deans RM, Li A, Bassik MC interaction maps. Nat Protoc 9:1825–1847
(2016) Systematic comparison of CRISPR/ 16. Doench JG, Fusi N, Sullender M, Hegde M,
Cas9 and RNAi screens for essential genes. Nat Vaimberg EW, Donovan KF, Smith I, Tothova
Biotechnol 34:634–636 Z, Wilen C, Orchard R, Virgin HW, Listgarten
11. Shalem O, Sanjana NE, Hartenian E, Shi X, J, Root DE (2016) Optimized sgRNA design
Scott DA, Mikkelsen TS, Heckl D, Ebert to maximize activity and minimize off-target
BL, Root DE, Doench JG, Zhang F (2014) effects of CRISPR-Cas9. Nat Biotechnol
Genome-scale CRISPR-Cas9 knockout screen- 34:184–191
ing in human cells. Science 343:84–87 17. Chen S, Sanjana NE, Zheng K, Shalem O,
12. Wang T, Wei JJ, Sabatini DM, Lander ES Lee K, Shi X, Scott DA, Song J, Pan JQ,
(2014) Genetic screens in human cells using Weissleder R, Lee H, Zhang F, Sharp PA
the CRISPR-Cas9 system. Science 343:80–84 (2015) Genome-wide CRISPR screen in a
13. Gilbert LA, Horlbeck MA, Adamson B, Villalta mouse model of tumor growth and metastasis.
JE, Chen Y, Whitehead EH, Gulmaraes C, Cell 160:1246–1260
Chapter 14
Abstract
The paucity of pathogenic T cells in circulating blood limits the information delivered by bulk analysis.
Toward diagnosis and monitoring of treatments of autoimmune diseases, we have devised single-cell analysis
approaches capable of identifying and characterizing rare circulating CD4 T cells.
Key words CD4 T cells, Single-cell analysis, TCR sequences, Autoimmunity, Peripheral blood,
Microfluidics
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_14, © Springer Science+Business Media, LLC 2018
217
218 Marie Holt et al.
have revolutionized the field over the past 5 years. We have moved
from days when the efficiency of paired sequencing of α and β TCR
chains was exceptional [3], to a situation where this technique is
routine and whole exome sequencing from single cells is the new
frontier [4]. Like for FACS analysis, a small number of platforms
are now available for the genomic interrogation of single cells.
Droplet-based microfluidic is one system [5], whereas purely
microfluidic system is the other [6]. We used the latter, mainly
because it entered the field first and we committed to this approach
early on. In any case, we have used this approach to study antigen-
specific CD4+ T cells in the context of vaccines and autoimmunity.
For efficiency, and in order to collect the most information from a
single cell, we have developed a workflow that allows the profiling
of 96 genes by quantitative PCR (qPCR), as well as the sequencing
of paired α-β TCR chains from the same original reverse transcrip-
tion of pMHC-FACS sorted T cells. The technique has proven
robustness, high reproducibility, and can be used for routine analy-
sis of antigen-specific T cells [7]. For our purpose, we have been
using a Fluidigm Biomark HD system and its two main compo-
nents, the Dynamic Array for qPCR, and the Access Array for the
making of libraries for next-generation sequencing.
2 Materials
2.1 Cell Sorting 1. Allegra X-14R centrifuge (Beckman Coulter, Inc., Brea, CA,
Components USA).
2. PBS: Dulbecco’s Phosphate-Buffered Saline, 1× (Corning,
Corning, NY, USA).
3. Cell strainer, 70 μm (ThermoFisher Scientific, Waltham, MA,
USA).
4. FACS buffer: 2% FCS, 2 mM EDTA in PBS, sterile filtered.
5. RBC Lysis Buffer: 0.165 M NH4Cl in deionized water, sterile
filtered.
6. Fc block: 20 μg/mL in FACS buffer.
7. Tetramers: In a molar ratio of 1:5, biotinylated MHC mole-
cules expressed with peptides of interest are mixed in sterile
PBS with PE-labeled streptavidin (Life Technologies, Carlsbad,
CA, USA) (see Note 1). The reaction is incubated overnight,
in darkness, at room temperature.
8. Antibodies: For the samples the following antibodies are mixed
together 1:200 in FACS buffer: FITC anti-mouse CD3ε, APC/
Cy7 anti-mouse CD4, APC anti-mouse/human CD11b, APC
anti-mouse/human CD45R/B220, APC anti-mouse CD49b,
and APC anti-mouse CD8a (all from B ioLegend, San Diego,
CA, USA). For the compensations the following antibodies are
Gene Profiling and T Cell Receptor Sequencing from Antigen-Specific CD4 T Cells 219
(continued)
221
Table 1
222
(continued)
(continued)
3 Methods
3.1.1 Single-Cell Sorting 1. Harvest organs of interest from a mouse and produce single-
cell suspensions using a 70 μm cell strainer and the piston of a
3 mL sterile syringe. Harvest and wash in sterile PBS. Spin cells
down (1200 rpm (350 × g), 5 min, 20 °C) (see Note 7).
2. Wash samples once in FACS buffer and lyse red blood cells
(RBC) by incubating for 5 min in 5 mL RBC lysis buffer, fol-
lowed by another wash in FACS buffer (see Note 8).
3. Incubate cells in Fc-block for 15 min at room temperature.
Meanwhile, distribute the cells into a conical bottom 96-well
plate. Extra wells are included for compensation of each color
and a blank, in addition to a negative control (see Note 9).
4. Wash once in FACS buffer and stain samples with PE-labeled
tetramers for 1 h at room temperature in darkness. The com-
pensations and blank samples are incubated in FACS buffer
only.
5. Wash the cells three times. Samples are stained with the follow-
ing mix of antibodies: FITC anti-mouse CD3ε, APC/Cy7
anti-mouse CD4, APC anti-mouse/human CD11b, APC anti-
mouse/human CD45R/B220, APC anti-mouse CD49b, and
APC anti-mouse CD8a. The compensations are stained with
FITC anti-mouse CD3ε, APC/Cy7 anti-mouse CD4, APC
anti-mouse/human, CD45R/B220, and PE anti-mouse CD4,
respectively. The blank is resuspended in FACS buffer only.
Incubate the cells for 30 min at 4 °C, in darkness.
6. Wash the cells twice in FACS buffer, and transfer each well to
500 μL FACS buffer in 5 mL polystyrene tubes.
Gene Profiling and T Cell Receptor Sequencing from Antigen-Specific CD4 T Cells 227
3.1.2 Reverse The following protocols are based on Fluidigm’s user guide “Real-
Transcription Time PCR Analysis”. All the reactions are performed in a Thermal
and Pre-amplification Cycler and the sample plate briefly spun before and after each reac-
tion. Due to the small volumes and repetitive pipetting, only elec-
tronic pipettes are used.
1. Thaw the sample plate on ice and denature the RNA by incu-
bating for 90 s at 65 °C, followed by immediate chilling on ice
for 5 min.
2. Add 1 μL of RT Mix Solution 2 to each well and do reverse
transcription under the following conditions: 25 °C, 5 min;
50 °C, 30 min; 55 °C, 25 min; 60 °C, 5 min; 70 °C, 10 min;
final hold at 4 °C.
3. Pre-amplify the target cDNA by adding 9 μL STA Reaction
Mix to each well and run the following reaction: 95 °C, 10 min;
96 °C, 5 s and 60 °C, 4 min (20 cycles); final hold at 4 °C (see
Note 12).
4. Remove unincorporated primers by adding 6 μL exonuclease
to each well and vortex the plate for 20 s before spinning and
placing it in the Thermal Cycler: 37 °C, 30 min; 80 °C, 15 min;
final hold on 4 °C.
5. Dilute the final products by adding 54 μL 1× DNA suspension
buffer (fivefold dilution of the STA reaction), and store the
STA pre-amplified sample plate at −20 °C.
3.3 TCR Sequencing 1. Prime a 48.48 Access Array IFC by injecting 300 μL control
line fluid into each of the two accumulators, and add 500 μL
3.3.1 Access Array
1× Access Array Harvest solution to wells H1–H3, and 500 μL
1× Access Array Hydration Reagent v2 to well H4. Place the
chip in an IFC Controller AX (pre-PCR) and run the script
“Prime (151×)” (see Note 13).
2. Make a sample plate in half of a 96-well plate by distributing
7.2 μL freshly made Sample Pre-Mix Solution per well. With a
multi-channeled pipette, transfer 2 μL sample from each well
of one half of the STA pre-amplified sample plate, and 0.8 μL
from the Barcoded Internal TCR 3′ Primer Plate (see Notes 14
and 17).
3. Place adhesive seals and vortex the sample plate and the
Internal TCR 5′ Primer Plate (20×) for 20 s and centrifuge for
30 s.
4. Carefully load 4 μL from each well of the sample plate and the
5′ Primer Plate (20×) into the respective inlets of the primed
IFC. Place the chip in the same IFC Controller AX as used for
priming and run the script “Load Mix (151×)” (see Note 15).
5. Transfer the IFC to the BioMark HD, and using the BioMark
Data Collection software, start the protocol “AA No I 48x48
Standard v1” to run a thermal mix and PCR under the follow-
ing conditions: 50 °C, 2 min; 70 °C, 20 min; 95 °C, 10 min;
95 °C, 15 s, 60 °C, 30 s, and 72 °C, 60 s (10 cycles); 95 °C,
15 s, 80 °C, 30 s, 60 °C, 30 s, and 72 °C, 60 s (2 cycles);
95 °C, 15 s, 60 °C, 30 s, and 72 °C, 60 s (8 cycles); 95 °C,
15 s, 80 °C, 30 s, 60 °C, 30 s, and 72 °C, 60 s (2 cycles);
95 °C, 15 s, 60 °C, 30 s, and 72 °C, 60 s (8 cycles); 95 °C,
15 s, 80 °C, 30 s, 60 °C, 30 s, and 72 °C, 60 s (5 cycles);
72 °C, 3 min; final hold at 10 °C.
Gene Profiling and T Cell Receptor Sequencing from Antigen-Specific CD4 T Cells 229
(continued)
235
Table 4
236
(continued)
3.4 Data Analysis 1. TCR sequencing analysis is always started by the identification
of bar codes.
2. TCR chain identity is determined using the IMGT database
(http://www.imgt.org).
4 Notes
References
1. Flatz L, Roychoudhuri R, Honda M, Filali- RNAseq reveals cell adhesion molecule profiles
Mouhim A, Goulet JP, Kettaf N, Lin M, Roederer in electrophysiologically defined neurons. Proc
M, Haddad EK, Sekaly RP, Nabel GJ (2011) Natl Acad Sci U S A 113:E5222–E5231.
Single-cell gene-expression profiling reveals qual- PMCID: PMC5024636
itatively distinct CD8 T cells elicited by different 5. Brouzes E (2012) Droplet microfluidics for sin-
gene-based vaccines. Proc Natl Acad Sci U S A gle-cell analysis. Methods Mol Biol
108:5724–5729. PMCID: 3078363 853:105–139
2. Narsinh KH, Sun N, Sanchez-Freire V, Lee AS, 6. Yin H, Marshall D (2012) Microfluidics for sin-
Almeida P, Hu S, Jan T, Wilson KD, Leong D, gle cell analysis. Curr Opin Biotechnol
Rosenberg J, Yao M, Robbins RC, Wu JC 23:110–119
(2011) Single cell transcriptional profiling reveals 7. Newell EW, Davis MM (2014) Beyond model
heterogeneity of human induced pluripotent antigens: high-dimensional methods for the
stem cells. J Clin Invest 121:1217–1221. analysis of antigen-specific T cells. Nat
PMCID: 3049389 Biotechnol 32:149–157. PMCID:
3. Yoshida K, Corper AL, Herro R, Jabri B, Wilson PMC4001742
IA, Teyton L (2010) The diabetogenic mouse 8. Dash P, McClaren JL, Oguin TH 3rd, Rothwell
MHC class II molecule I-Ag7 is endowed with W, Todd B, Morris MY, Becksfort J, Reynolds
a switch that modulates TCR affinity. J Clin C, Brown SA, Doherty PC, Thomas PG (2011)
Invest 120:1578–1590. PMCID: 2860908 Paired analysis of TCRalpha and TCRbeta
4. Foldy C, Darmanis S, Aoto J, Malenka RC, chains at the single-cell level in mice. J Clin
Quake SR, Sudhof TC (2016) Single-cell Invest 121:288–295. PMCID: PMC3007160
Chapter 15
Abstract
Hi-C is a methodology developed to reveal chromosomal interactions from a genome-wide perspective.
Here, we described a protocol for generating Hi-C sequencing libraries in resting and activated human
naive CD4 T cells to investigate activation-induced chromatin structure re-arrangement in T cell activation
followed by a section reviewing the general concepts of Hi-C data analysis.
Key words Chromosomal interaction, Cell fixation, Endo-restricted enzyme digestion, De novo
ligation, Biotin-streptavidin, High-throughput sequencing
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_15, © Springer Science+Business Media, LLC 2018
239
240 Xiangzhi Meng et al.
2 Materials
3 Methods
3.1 Crosslinking Isolate peripheral blood mononuclear cells (PBMC) from healthy
donor’s blood by centrifugation through a histopaque (Sigma)
gradient with deceleration break-off. Then purify naive CD4 T
cells via the negative selection naive CD4 T cell isolation kit.
242 Xiangzhi Meng et al.
3.2 Restriction 1. Thaw the pellet on ice, meanwhile prepare the cell lysis buffer
Digestion by combining 450 μL of ice-cold Buffer 1 with 50 μL of 10×
proteinase inhibitor solution. Resuspend the pellet in the lysis
buffer and rotate for 30 min in a cold room.
2. Centrifuge the tube at 5000 rpm (2400 × g) for 10 min at
4 °C, remove the supernatant, and resuspend the pellet in
500 μL 1.2× digestion buffer (combine 60 μL of 10× digestion
buffer with 440 μL of H2O).
3. Add 15 μL of 10% SDS (final concentration is 0.3%), mix up,
and incubate at 62 °C for 10 min, followed by 37 °C for
50 min, both while shaking at 900 rpm.
4. To quench SDS, add 50 μL of 20% Triton X-100 (final concen-
tration is 2%) to the reaction and incubate at 37 °C for 60 min,
shaking at 900 rpm.
5. Bring the tube to room temperature and take a 50 μL aliquot
for the undigested control (store in −20 °C).
6. To the rest, add 200 U of HindIII restriction enzyme (10 μL
of 20 Units/μL) and incubate overnight at 37 °C with shaking
at 900 rpm.
3.3 Re-ligation 1. The next morning, inactive the retriction enzyme, incubate the
and Reverse tube at 62 °C for 20 min while shaking at 900 rpm
Crosslinking 2. Cool down the tube to room temperature and add 50 μL of
the following fill-in master mix (see Note 3):
(a) 37 μL of 0.4 mM biotin-14-dCTP.
(b) 1.5 μL of 10 mM dATP.
(c) 1.5 μL of 10 mM dGTP.
(d) 1.5 μL of 10 mM dTTP.
(e) 8 μL of 5 U/μL DNA Polymerase I, Large (Klenow)
Fragment.
Investigate Global Chromosomal Interaction by Hi-C in Human Naive CD4 T Cells 243
3.4 Digestion 1. To extract DNA from the controls, add 100 μL of phenol-
and Ligation Efficiency chloroform-Isoamyl Alcohol (25:24:1) to each tube and vor-
Assessment tex to completely mix.
2. Centrifuge the tubes at 10,000 rpm (9600 × g) for 5 min.
3. Transfer the upper aqueous phase to a new tube and run the
controls on a 1% agarose gel at 100 V for 20–30 min (see
Note 4).
3.5 DNA Extraction, 1. To extract DNA from the Hi-C samples, add 650 μL of Phenol-
DNA Shearing, Chloroform-Isoamyl Alcohol into each sample tube (1× vol-
and Size Selection ume), vortex to completely mix, and spin down at 10,000 rpm
(9600 × g) at room temperature for 5 min.
2. Transfer the upper phase (about 650 μL) to a new 2 mL tube
and add 1.3 mL (2× volume) of pure ethanol and 65 μL (0.1×
volume) of 3 M sodium acetate, then mix by inverting and
incubate at −80 °C for 2 h.
244 Xiangzhi Meng et al.
17. Leave the beads on the magnet for 5 min to allow remaining
ethanol to evaporate (do not over dry the beads, this could
result in a reduced yield).
18. To elute the DNA, remove the tube from the magnet and add
301 μL of 1× Tris buffer and gently mix by pipetting. After
incubating at room temperature for 5 min, separate the beads
on a magnet and transfer the solution to a new 1.5 mL tube.
19. Take 1 μL of size selected DNA and verify the fragment size on
the Tapestation (D1000 tape) (see Note 6).
3.6 Biotin Pull-Down Perform all the following steps in DNA low-binding tubes.
and Preparation
1. To prepare streptavidin beads for biotin pull-down, add 1 mL
of Illumina
wash buffer to 150 μL of 10 mg/mL streptavidin beads, sepa-
Sequencing rate the beads on a magnet, and discard the solution. Repeat
washing once more (total of 2 washes).
2. Remove the solution after the second wash and resuspend the
beads in 300 μL of binding buffer. Add 300 μL of size-selected
DNA to the beads and incubate at room temperature for
15 min with rotation.
3. Separate the beads on a magnet and discard the solution.
Resuspend the beads in 600 μL of wash buffer and incubate at
55 °C for 2 min with shaking at 900 rpm. Retrieve the beads
using a magnet and discard the supernatant.
4. Repeat the wash.
5. Resuspend the beads in 100 μL of 1× T4 DNA ligase buffer
and retrieve the beads on a magnet. Discard the buffer.
6. Resuspend the beads in 100 μL of end-repair and biotin-
remove master mix.
(a) 88 μL of 1× T4 DNA ligase buffer with 10 mM ATP.
(b) 2 μL of 25 mM dNTP mix.
(c) 5 μL of 10 U/μL T4 PNK.
(d) 4 μL of 3 U/μL T4 DNA Polymerase I.
(e) 1 μL of 5 U/μL DNA polymerase I, Large (Klenow)
Fragment.
7. Incubate at room temperature for 30 min, separate the beads
on a magnet, and discard the supernatant.
8. Wash the beads by adding 600 μL of wash buffer and incubate
at 55 °C for 2 min with mixing. Reclaim the beads and discard
the supernatant.
9. Resuspend the beads in 100 μL dATP attachment buffer.
Reclaim the beads and discard the supernatant.
246 Xiangzhi Meng et al.
4 Notes
Fig. 1 Electrophoresis on undigested, digested-unligated, and ligated controls. Sample in (a) was fixed by 1%
formaldehyde for 5 min and electrophoresis shows high efficiency of digestion whereas sample in (b) was
fixed by 1% formaldehyde for 10 min and the digestion efficiency is lower
Fig. 2 Tape Station results on DNA before (a) and after (b) size selection
tion at both the ends. In such cases, only one copy should be
included. Typically, less than 5% of the library of valid pairs having
redundant reads is acceptable; otherwise concern may be raised
about PCR over-amplification.
Hi-C normalization is still a challenge for bio-informaticians.
Currently, the most widely accepted Hi-C normalization method
appears as iterative correction and eigenvector decomposition
(ICE) [24]. ICE rescales the coefficients of the interaction matrix
to make all rows and columns to have an equal sum. Even though
it provides the most optimized treatment so far, limitations still
exist. First, the diagonal and nearest neighbors are excluded while
rescaling the coefficients, This is based on the assumption that any
chromatin site interacts with a locus that is not its closest neighbor.
Therefore, some true short-distance interactions may be eliminated;
second, it cannot calculate confidence intervals for the interactions;
third, it does not provide goodness-of-fit criterion, so whether ICE
has over-cleaned a matrix or not cannot be determined.
After normalization, a contact matrix could be created which
represents the overall patterns of chromosome interactions. In a
successful Hi-C experiment, a strong diagonal of interactions
between proximal genomic sites which shows overall decay over
increasing distance and clear clusters of intra- and inter-chromo-
somal interactions could be observed.
Due to the huge volume and complexity of the human genome,
most of Hi-C libraries on human cells are often under-sequenced.
For instance, the human genome has about one million HindIII
restriction sites (if HindIII is being used as endo-restricted enzyme
in the experiment), so that 100 million valid pairs of reads could
only provide 200 read ends at each HindIII restriction fragment
on average. Theoretically, every HindIII fragment could physically
interact with each of the other fragments thus only 0.0002 reads
representing any two HindIII fragments contact could be obtained.
This makes it difficult to determine whether a single observed
interaction is a significant event or random noise. One current
solution is to combine multiple biological replicates if they show
high reproducibility (Pearson’s correlation coefficient >0.9 at
1 Mb resolution) to compensate for the sequencing depth. Another
idea is that in identifying significant interactions, reads across large
regions are piled together in order to boost the total number of
reads considered.
Acknowledgment
References
1. Tessarz P, Kouzarides T (2014) Histone core cient package for normalizing large-scale Hi-C
modifications regulating nucleosome struc- data. Bioinformatics 31(6):960–962
ture and dynamics. Nat Rev Mol Cell Biol 14. Hu M et al (2012) HiCNorm: removing
15(11):703–708 biases in Hi-C data via Poisson regression.
2. Negre N et al (2011) A cis-regulatory map of the Bioinformatics 28(23):3131–3133
drosophila genome. Nature 471(7339):527–531 15. Servant N et al (2015) HiC-pro: an optimized
3. Li G et al (2012) Extensive promoter-centered and flexible pipeline for Hi-C data processing.
chromatin interactions provide a topological basis Genome Biol 16:259
for transcription regulation. Cell 148(1–2):84–98 16. Lun AT, Smyth GK (2015) diffHic: a biocon-
4. Apostolou E et al (2013) Genome-wide chro- ductor package to detect differential genomic
matin interactions of the Nanog locus in plu- interactions in Hi-C data. BMC Bioinformatics
ripotency, differentiation, and reprogramming. 16:258
Cell Stem Cell 12(6):699–712 17. Sauria ME, Phillips-Cremins JE, Corces VG,
5. Dekker J, Rippe K, Dekker M, Kleckner N Taylor J (2015) HiFive: a tool suite for easy and
(2002) Capturing chromosome conformation. efficient HiC and 5C data analysis. Genome
Science 295(5558):1306–1311 Biol 16:237
6. Weiner BM, Kleckner N (1994) Chromosome 18. Heinz S et al (2010) Simple combinations of
pairing via multiple interstitial interactions before lineage-determining transcription factors prime
and during meiosis in yeast. Cell 77(7):977–991 cis-regulatory elements required for macrophage
7. Lieberman-Aiden E et al (2009) and B cell identities. Mol Cell 38(4):576–589
Comprehensive mapping of long-range inter- 19. Hu M et al (2013) Bayesian inference of spatial
actions reveals folding principles of the human organizations of chromosomes. PLoS Comput
genome. Science 326(5950):289–293 Biol 9(1):e1002893
8. Dixon JR et al (2015) Chromatin architecture 20. Lajoie BR, Dekker J, Kaplan N (2015) The
reorganization during stem cell differentiation. Hitchhiker's guide to Hi-C analysis: practical
Nature 518(7539):331–336 guidelines. Methods 72:65–75
9. Dixon JR et al (2012) Topological domains 21. Dekker J, Marti-Renom MA, Mirny LA (2013)
in mammalian genomes identified by anal- Exploring the three-dimensional organization
ysis of chromatin interactions. Nature of genomes: interpreting chromatin interaction
485(7398):376–380 data. Nat Rev Genet 14(6):390–403
10. Rao SS et al (2014) A 3D map of the human 22. Langmead B, Trapnell C, Pop M, Salzberg SL
genome at kilobase resolution reveals principles (2009) Ultrafast and memory-efficient align-
of chromatin looping. Cell 159(7):1665–1680 ment of short DNA sequences to the human
11. Jin F et al (2013) A high-resolution map of the genome. Genome Biol 10(3):R25
three-dimensional chromatin interactome in 23. Li R, Li Y, Kristiansen K, Wang J (2008)
human cells. Nature 503(7475):290–294 SOAP: short oligonucleotide alignment pro-
12. Duan Z et al (2012) A genome-wide 3C-method gram. Bioinformatics 24(5):713–714
for characterizing the three-dimensional archi- 24. Imakaev M et al (2012) Iterative correc-
tectures of genomes. Methods 58(3):277–288 tion of Hi-C data reveals hallmarks of
13. Li W, Gong K, Li Q, Alber F, Zhou XJ (2015) chromosome organization. Nat Methods
Hi-corrector: a fast, scalable and memory-effi- 9(10):999–1003
Chapter 16
Abstract
In this chapter, we describe a method for making Illumina-compatible sequencing libraries from RNA.
This protocol can be used for standard RNAseq analysis for detecting differentially expressed genes.
In addition, this protocol is ideally suited for adapting to RIPseq, 5′-RACE, RNA structural probing,
nascent RNA sequencing, and other protocols where polymerase termination sites need to be profiled. The
utilization of solid-phase bead chemistries facilitates simple workflow and efficient library yields.
1 Introduction
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3_16, © Springer Science+Business Media, LLC 2018
253
254 Phillip Ordoukhanian et al.
5’ AAAAAAAAAAAAA
VTTTTTTTTT
reverse transcribe P7
5’ AAAAAAAAAAAAA
VTTTTTTTTT
digest excess biotin-primer P7
with exonuclease I
5’ AAAAAAAAAAAAA
VTTTTTTTTT
capture on streptavidin beads P7
wash away RNA using NaOH
P5
ppA VTTTTTTTTT
P7
ligate adapter to 3’ end of cDNA
P5
VTTTTTTTTT
P7
PCR directly from beads
Sequence
2 Materials
2.2 Reagents 1.
ILMN-primer-adapter dT: /5Biosg/GAGTTCCTTGGCA
and Consumables CCCGAGAATTCCATTTTTTTTTTTTTTTTT
TTV (see Note 5).
2. ILMN-primer-adapter random: /5Biosg/GAGTTCCTTGG
CACCCGAGAATTCCA(N:25,252,525)(N)(N)(N) (N)(N)
(N)(N) (see Note 5).
3. ILMN-ligation-adapter: /5rApp/(N:25,252,525)(N)(N)
GATCGTCGGACTGTAGAACTCTGAACGTGT/3ddC/
(see Note 5).
4. PCR primer 1: AATGATACGGCGACCACCGAGATCTAC
ACGTTCAGAGTTCTACAGTCCGACGATC.
5. PCR primer 2 with sample barcode (X = Illumina barcodes):
C A A G C A G A A G A C G G C ATA C G A G AT X X X X X X G T
GACTGGAGTTCCTTGGCACCCGAGAATTCCA.
6. Low DNA binding microfuge tubes.
7. Thin-walled PCR microfuge tubes.
8. 2% Agarose Egel EX (ThermoFisher Scientific).
9. RNaseOUT™ (ThermoFisher Scientific).
10. Dynabeads Oligo(dT)25 beads (ThermoFisher Scientific).
11. 5 M Lithium Chloride.
12. 1 M Tris–HCl pH 7.5.
13. 0.5 M EDTA.
14. High-Salt (HS) buffer: 2 M NaCl, 10 mM Tris–HCl (pH 8),
1 mM EDTA.
15. RNA Binding buffer: 20 mM Tris–HCl (pH 7.5), 1 M LiCl,
2 mM EDTA.
256 Phillip Ordoukhanian et al.
3 Methods
3.1 Poly A Selection 1. Vigorously resuspend oligo-dT beads, pipette 20 μL into
of mRNA (Optional) 0.5 mL Lo-Bind microfuge tube and place in magnetic stand,
once the beads have cleared the solution, remove the superna-
tant. Wash the beads 2 times with 100 μL RNA Binding Buffer.
Then, resuspend in 30 μL RNA binding buffer.
2. Add at least 100 ng total RNA in a volume of 10 μL to the
tube containing the 30 μL suspension of magnetic beads above
and pipette mix.
3. Place the tube in a thermocycler and incubate at 65 °C for
5 min, then lower to 4 °C. Remove the tube from thermocy-
cler and incubate at room temperature for 7 min, to further
allow the mRNA to bind to the beads. Place the tube in a
magnetic stand, remove and discard the supernatant.
4. Using pipette mixing—wash the beads two times with 100 μL
Wash Buffer 1.
5. Add 20 μL of 10 mM Tris pH 7.5, pipette mix. Heat to 80 °C
for 2 min, then hold at 25 °C.
6. Add 30 μL RNA Binding Buffer, pipette mix. Incubate at
room temperature for 7 min to allow mRNA to re-bind to
beads. Transfer the tube to magnetic rack and remove and dis-
card the supernatant.
RNAseq Library Prep to Identify Termination Sites 257
3.2 Reverse 1. To the 10 μL of eluted mRNA from Subheading 3.1 add 1 μL
Transcription ILMN-primer-adapter dT (20 μM) and 4 μL of 5× First Strand
Buffer.
2. Incubate at 95 °C, 3 min, then place on ice for at least 3 min.
3. Prepare a master mix for the total number of samples being
prepped in a separate 0.5 mL microfuge tube, containing the
following reagents (due to pipetting inaccuracies, an addi-
tional 10% in volume per sample prep per reagent is typically
used): 2 μL DTT (0.1 M),1 μL RNaseOUT™, 1 μL dNTPs
(10 mM), 1 μL Superscript III. 5 μL of this master mix is then
added to each sample from the previous step. The final vol-
ume will be 20 μL.
4. Incubate at 25 °C for 15 min, then increase to 42 °C for
40 min, and finally to 75 °C for 10 min.
5. Digest excess primer: Add 2 μL Exonuclease I and incubate at
37 °C for an additional 30 min.
6. Isolate cDNA/RNA material: Add 5 μL nuclease-free water
and 45 μL AmpureXP beads to the tube containing the reac-
tion from above. Pipette mix up and down ten times to mix.
Allow cDNA/RNA to bind to the beads for 10 min, place the
tube on a magnetic stand, and remove the supernatant. Wash
the beads 2 times with 80% ethanol while on the magnetic
stand (100 μL each). Allow the beads to air-dry at ambient
temperature for 10 min. Remove the tube from the magnetic
stand and add 25 μL 10 mM Tris–HCl (pH 8.0) to the beads,
pipette mix, and let it sit at ambient temperature for 2 min.
Return the tube to the magnetic stand and retrieve 24 μL of
cDNA/RNA material.
3.4 A-Adapter 1. Prepare a master mix for the total number of samples being
Ligation prepped in a separate 0.5 mL microfuge tube, containing the
following reagents (due to pipetting inaccuracies, an additional
10% in volume per sample prep per reagent is typically used):
2 μL 10× NEB1 buffer, 2 μL 50 mM MnCl2, 2 μL ILMN-
ligation-adapter (50 μM), 12 μL nuclease-free water, 2 μL
Thermostable App DNA/RNA Ligase.
2. To the streptavidin beads with the bound cDNA from Step 3,
Subheading 3.3 transfer 20 μL of the reaction cocktail pre-
pared above and pipette up and down until mixed well.
Incubate at 60 °C for 1 h.
3. Wash the streptavidin bead bound cDNA: 3 washes with 1×
TE buffer (100 μL each), and resuspend the streptavidin beads
in 36 μL nuclease-free water, creating a slurry, then transfer the
slurry to a fresh thin-walled PCR tube.
3.5 PCR and DNA 1. Prepare a master mix for the total number of samples being
Purification prepped in a separate 0.5 mL microfuge tube, containing the
following reagents (due to pipetting inaccuracies, an additional
10% in volume per sample prep per reagent is typically used):
2 μL PCR primer 1 (25 μM), 2 μL PCR primer 2 with sample
barcode (25 μM), and 40 μL 2 X KAPA HiFi mix and pipette
mix up and down. Transfer 44 μL of the reaction cocktail to
the thin-walled PCR tube containing the cDNA bound strep-
tavidin beads. The final volume will be 80 μL.
2. Perform thermocycling in a PCR instrument: 1 hold—95 °C,
3 min; 15 cycles—98 °C for 20 s, 60 °C for 30 s, 72 °C, 30 s;
1 hold—72 °C, 2 min.
3. Following the PCR, add 80 μL AmpureXP beads (1× strength)
directly to the PCR tube and pipette mix. Let it stand for
10 min to precipitate the DNA onto the beads. Place the PCR
tube in a magnetic stand and remove the supernatant. With the
tube still in the magnetic stand wash the beads two times with
80% ethanol solution (200 μL each). Allow the beads to air-dry
with the lid of the tube open at ambient temperature for
10 min. Remove the tube from magnetic stand and add 25 μL
10 mM Tris–HCl (pH 8.0), pipette mix. Let it stand for 2 min,
return the tube to the magnetic stand, and allow the beads to
clear the solution, transfer 24 μL to a fresh microfuge tube
(avoiding removal of any beads).
RNAseq Library Prep to Identify Termination Sites 259
4 Notes
References
1. Head SR, Komori HK, LaMere SA, Whisenant 5. Kielpinski LJ, Boyd M, Sandelin A, Vinther
T, Van Nieuwerburgh F, Salomon DR, J (2013) Detection of reverse transcrip-
Ordoukhanian P (2014) Library construction tase termination sites using cDNA ligation
for next-generation sequencing: overviews and and massive parallel sequencing. Methods
challenges. BioTechniques 56(2):61–77 Mol Biol 1038:213–231. https://doi.
2. Podnar J, Deiderick H, Huerta G, Hunicke- org/10.1007/978-1-62703-514-9_13
Smith S (2014) Next-generation sequencing 6. Li H, Lovcib MT, Kwona YS, Rosenfeld MG,
RNA-Seq library construction. Curr Protoc Fua XD, Yeo GW (2008) Determination of
Mol Biol 106:4.21.1–4.21.19 tag density required for digital transcriptome
3. Levin JZ, Yassour M, Adiconis X, Nusbaum C, analysis: application to an androgen-sensitive
Thompson DA, Friedman N, Gnirke A, Regev prostate cancer model. Proc Natl Acad Sci U S
A (2009) Comprehensive comparative analysis A 105(51):20179–20184
of strand-specific RNA sequencing methods. 7. König J, Zarnack K, Luscombe NM, Ule J (2012)
Nat Methods 7(9):709–715 Protein–RNA interactions: new genomic tech-
4. Routh A, Head SR, Ordoukhanian P, Johnson nologies and perspectives. Nat Rev Genet 13:77–
JE (2015) Fragmentation-free next-genera- 83. https://doi.org/10.1038/nrg3141
tion sequencing via click-ligation of adaptors 8. Linder B, Grozhik AV, Olarerin-George AO,
to stochastically terminated 3′-azido cDNAs. Meydan C, Mason CE, Jaffrey SR (2015)
J Mol Biol 427(16):2610–2616. https://doi. Single-nucleotide-resolution mapping of m6A
org/10.1016/j.jmb.2015.06.011 and m6Am throughout the transcriptome.
RNAseq Library Prep to Identify Termination Sites 261
Nat Methods 12:767–772. https://doi. 11. Chen F, Gao X, Shilatifard A (2015) Stably
org/10.1038/nmeth.3453 paused genes revealed through inhibition of
9. Loughrey D, Watters KE, Settle AH, Lucks JB transcription initiation by the TFIIH inhibitor
(2014) SHAPE-Seq 2.0: systematic optimization triptolide. Genes Dev 29(1):39–47. https://
and extension of high-throughput chemical prob- doi.org/10.1101/g ad.246173.114
ing of RNA secondary structure with next genera- 12. Frohman MA (1993) Rapid amplification of
tion sequencing. Nucleic Acids Res 42(21):e165 complementary DNA ends for generation of
10. Khodor YL, Rodriguez J, Abruzzi KC, Tang full-length complementary DNAs: thermal
CHA, Marr MT II, Rosbash M (2015) RACE. Methods Enzymol 218:340–356
Nascent-seq indicates widespread co-transcrip- 13. Loh E (1991) Anchored PCR: amplifica-
tional pre-mRNA splicing in drosophila. Genes tion with single-sided specificity. Methods
Dev 25:2502–2512 2:11–19
Index
Steven R. Head et al. (eds.), Next Generation Sequencing: Methods and Protocols, Methods in Molecular Biology, vol. 1712,
https://doi.org/10.1007/978-1-4939-7514-3, © Springer Science+Business Media, LLC 2018
263
Next Generation Sequencing: Methods and Protocols
264
Index
Endo-restricted enzyme digestion���������������������������� 239, 240 L
Enhancers���������������������������������������������������������������������33, 88
Environmental samples������������������������������ 97, 102, 103, 109 Lambda Phage�������������������������������������������������������������������44
Enzymes������������������������30–33, 39, 46–48, 54, 61, 67, 74, 81, Library preparation������������������ viii, 4–6, 9–16, 20, 23, 28–30,
90, 91, 94, 146, 149, 151, 152, 159, 171, 212–215, 219, 32–34, 39, 45, 46, 49, 50, 98, 99, 101, 102, 106, 107, 147,
220, 242, 247, 251 150, 151, 156, 158, 160, 161, 171, 206, 207, 211–214,
Epigenetic��������������������������������������������������������������������87, 88 229, 255, 260
Eukaryotic cells�������������������������������������������������� 71, 170, 239 Ligation�����������������������5, 9, 10, 12, 33, 40, 44, 45, 49, 71–74,
77–83, 98, 146, 149, 151, 152, 159, 240–243, 246, 248,
F 253–260
Limma�������������������������������������������������������������� 189–193, 198
Fastq������������������������������� 35, 44, 50, 115, 119–121, 125–127,
Linux�����������������������������������������114–117, 119, 121, 139, 185
179–183, 185, 186, 188, 201
FastQC������������������������������������������������93, 177, 179, 180, 201 M
Feces����������������������������������������������������������������� 163, 168, 169
Flock House virus��������������������������������������������������������74, 81 Mammalian cells������������������������������������������ 87, 93, 203, 207
Fluidigm��������������������������������������53, 218–220, 227, 237, 238 Massive parallel sequencing (MPS)������������������������������27, 28
Fluidigm Biomark HD����������������������������������������������������218 Metabolic reconstruction����������������������������������������������������99
Fluorescence-activated cell sorting (FACS)���������������������217 Metagenomics��������������������������������������������������������������������97
Functional variants�����������������������������������������������������������113 Microarray���������������������������������������������������������������� 175, 203
Microbes����������������������������������������������������� 97, 101, 164, 167
G Microbial genomes�������������������������������������������������������������97
Microbiome������������������������������������������������������ 123, 163–168
Gel excision������������������������������������������������������������������������53
Microfluidics������������������������������������������������������� viii, 98, 217
Gel-free����������������������������������������������������������������������������146
Micropipetting�������������������������������������������������������������������98
Gene expression���������������������� 2, 87, 188, 189, 193, 198, 200,
MicroRNA (miRNA)������������������������������� 145, 148, 154, 159
217, 237
MinION����������������������������������������������������������������� vii, 45, 50
Gene-specific primers (GSPs)��������������54–56, 60, 61, 64, 66,
Mobile elements�����������������������������������������������������������������45
67, 69
Monocytes������������������������������������������������������������������������191
Genetic variant������������������������������������������������������� vii, 54, 66
mRNA������������������������������1–3, 7, 9, 19, 74, 81, 256, 257, 260
Genome-Scale CRISPR Knock
Multiple annealing and looping based amplification
Out (GeCKO)�������������������������������vii, 204, 207–210, 213
cycles (MALBAC)�������������������������������������������������������28
Genotypes������������������113, 114, 124, 125, 128, 129, 133, 137
Multiple displacement amplification (MDA)��������������������28,
Github������������������������������������������������������������������������������178
98–100, 102, 103, 105, 106, 108, 109
Glmnet����������������������������������������������189, 190, 193–195, 198
Multiplex PCR�������������������������������������������������������������������53
Global run-on sequencing (GRO-seq)�������������������������������19
Mutation screening�������������������������������������������������������������53
Gut microbiome����������������������������������������������� 165, 168, 169
N
H
Nanopores�������������������������������������������������������������� vii, 45, 50
Haloplex�����������������������������������������������������������������������������54
Nascent-seq������������������������������������������������������������������19–25
Hi-C�������������������������������������������������������������������������� vii, 239,
Native elongating transcript sequencing (NET-seq)���������20,
249–251
204
Highest fidelity PCR polymerases�������������������������������������53
Neutrophils�����������������������������������������������������������������������191
Hi-Plex������������������������������������������������������������������� vii, 53–69
Next-generation sequencing (NGS)����������������vii, viii, 20, 24,
Hybridization��������������������������������������������������������� 44, 46–48
72, 74, 87, 106, 114, 121, 146, 147, 150, 151, 156,
I 158–161, 203–215, 218
NIH Roadmap Epigenomics����������������������������������������������87
Illumina NextSeq 500���������������������������������� 29, 39, 207, 214 NK cells����������������������������������������������������������������������������191
Illumina sequencing libraries����������������������������������������������28 Noncoding RNAs���������������������������������������������������������������19
In vitro���������������������������������������������������������������������� 7, 20, 71 Non-homologous end-joining (NHEJ)����������������������������203
Isoform������������������������������������������������������������� 177, 180, 182 Non-model organisms�������������������������������������������������������viii
K O
K-mer frequencies��������������������������������������������������������������99 Off-target priming�������������������������������������������������������54, 67
Next Generation Sequencing: Methods and Protocols
265
Index
Optical tweezers�����������������������������������������������������������������98 Sequence error��������������������������������������������������������������������28
Oral microbiome���������������������������������������������� 164, 166, 167 Sequence-screening������������������������������������������������������������53
Orcinus orca������������������������������������������������������� 116, 118, 119 Short guide RNAs (sgRNAs)������������203, 204, 207, 208, 211
Oxford nanopore���������������������������������������������������� vii, 45, 50 Short-hairpin RNAs (shRNAs)������������������������������� 204, 208
Shotgun metagenomics������������������������������������������������������97
P Single amplified genome (SAG)�����������98, 99, 101, 106–108
Paired-end sequencing��������������������������������������������������54, 66 Single cell amplification���������������������������������������������������104
Parental imprinting������������������������������������������������������������87 Single-cell analysis��������������������������������������������������������������vii
PCR barcoding������������������������������������������������������� 44, 49, 50 Single-cell isolation������������������������������������������������������������98
Penalized regression������������������������������������������ 190, 193–196 Single nucleotide polymorphism (SNP)�������������������113–140
Peripheral blood���������������������������������������������������������������241 Size-selection��������������30, 33, 41, 53, 54, 62–64, 79, 80, 146,
Phusion polymerase�������������������������������������������� 5, 6, 15, 214 148, 154, 157, 160, 229, 241, 243–245, 248, 250
Phylogenetic diversity���������������������������������������������������������97 Small non-coding RNAs��������������������������������������������������145
Phylogenetic screening���������� 98, 99, 101, 102, 105, 106, 109 Small RNA-sequencing (sRNA-Seq)���������������������� 145, 148
Phylogenetics�����������������������������������������97–99, 102, 109, 113 Snakemake����������������������������������������������� 177–182, 187, 201
Picoplex/SurePlex���������������������������������������������������������28, 29 Soil����������������������������������������������������������� 101, 163, 165, 170
PiwiRNA (piRNA)�������������������������������������������������� 145, 154 Spliced transcripts alignment
Pol II����������������������������������������������������������������������������������20 to a reference (STAR)���������� 177, 180, 181, 185, 188, 201
Polyacrylamide������������������������������������������������5, 6, 10, 13, 15 16S rRNA gene�����������������������������97–99, 101, 102, 105, 169
Polyadenylation������������������������������������������������������������������19 ssDNA��������������������������������������������������������������������������������74
Polymerase recruitment������������������������������������������������������19 16S/18S ribosomal RNA gene sequences��������� 165, 166, 171
Polysome���������������������������������������������������������� vii, 1–3, 9, 16 Strain-promoted Azide-Alkyne Cycloaddition
Polysome profiling����������������������������������������������������������1–17 (SPAAC)����������������������������������������������������������������71, 72
Population genetics��������������������������������������������������� 113, 129
T
Precision nuclear run-on sequencing (PRO-seq)���������������19
Preimplantation genetic diagnosis (PGD)��������������������27–41 Tagmentation����������������������������������������������������������� 102, 106
ProDeGe�������������������������������������������������������������������� 99, 108 Targeted sequencing����������������������������������������� vii, 43–50, 53
Prokaryotic adaptive immune system�������������������������������203 Target-enrichment�������������������������������������������������������������46
Python�������������������������������������������������50, 114, 115, 124, 178 T cell receptor (TCR)�����������������������������������������������217–237
T cells��������������������������������������������������������������� 239, 249, 250
R TCR sequencing��������������������������������228–231, 233, 235, 237
5’-RACE, RNA structural probing����������������������������������253 Termination��������������������������������������������viii, 19, 81, 255, 260
RADseq����������������������������������������������������������������������������114 Three-dimensional DNA fluorescence
Random amplification���������������������������������������������� 166, 171 in situ hybridization (3D–FISH)��������������������������������239
Recombination������������������������������������������������������������ viii, 74 T4 polynucleotide kinase������������������������������������������������������5
Reference-guided assembly����������������������������������������������113 Transcription������������������������� viii, 5, 6, 12–15, 19, 20, 74–77,
Repetitive elements���������������������������������������������������� 87, 127 80, 81, 146, 149, 171, 218, 219, 227, 237, 239, 255, 260
Ribo-seq�����������������������������������������������������������������������1, 253 Transcriptional initiation����������������������������������������������������19
Ribosome����������������������������������������������� vii, 3, 4, 8, 10, 13, 17 Translational regulation of gene expression��������������������������1
footprint abundance, 2 Transfer RNA (tRNA)����������������������������30, 33, 41, 145, 154
profiling, 3, 4, 8, 10, 13, 17 Translation��������������������������������������������������������������������1–3, 8
RNA interference�������������������������������������������������������������204 Translatome�������������������������������������������������3, 4, 8, 10, 13, 17
RNA sequencingviii, 9, 20, 21, 175–178, 180, 183, 185, 186, Transposase���������������������������������������������������������������� 98, 106
188, 191, 195, 197, 201 T4 RNA ligase 2���������������������������������������������������� 5, 12, 149
RNase inhibitor������������������3, 4, 9, 21, 22, 149, 151, 152, 159 Trophectoderm�������������������������������������������������������������27, 28
RNA-seq by expectation-maximization (RSEM)��� 177, 180, TruSeq Amplicon���������������������������������������������������������������53
182, 185, 201
U
R statistical computing environment������������������������ 176, 177
Unix�������������������������������������������������������������������������� 114, 115
S Unnatural triazole-linked backbones����������������������������������71
Sample preparation��������������������� 3, 24, 25, 99–102, 106, 109, Uracil����������������������������������������������������������������������������������88
226, 227 Urea�������������������������������������������������������������������������� 5, 20, 22
Sample preservation��������������������������������������������������� 98, 102 3′UTR������������������������������������������������������������������ 2, 3, 74, 81
Sanger sequencing���������������������������������������������� 98, 106, 114 5′UTR������������������������������������������������������������������������������2, 3
Next Generation Sequencing: Methods and Protocols
266
Index
V Whole genome association studies�����������������������������������113
Whole-genome sequencing�����������������������������������������������27,
Vaccines����������������������������������������������������������������������������218 44, 114
Vaginal microbiome�������������������������������������������������� 165, 168
Viral purification�������������������������������������� 165, 166, 170, 171 X
Viral-like particles (VLPs)��������������������������������������� 165, 170
X-chromosome inactivation�����������������������������������������������87
W
Y
Whole genome amplification (WGA)�������vii, 28–32, 40, 98,
Y RNA�����������������������������������������������������������������������������145
100, 103–106