You are on page 1of 1

Greater than 10 kb Read Lengths Routine when

Sequencing with Pacific Biosciences XL Release


Cheryl Heiner, Primo Baybayan, Susana Wang, Yan Guo, Meredith Ashby, Joan Wilson, Kevin
Travers, Jason Chin, and Jason Underwood
Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025

Recent Developments in SMRT Sequencing

Introduction

Sequencing full-length cDNA transcripts


Workflow for full-length cDNA sequencing

Combination of new features yields long subreads, some beyond 10 kb:

>10 kb library prep recommendations


XL polymerase, C2 sequencing chemistry
1 x 120 minute collection time
Stage start
MagBead loading

Total RNA

polyA+ RNA

SuperScript Full
Length cDNA

SMARTer
PCR cDNA
PCR
Optimization
Agarose Size
Selection: <1kb,
1-2kb, 2-3kb, >3kb

Large Scale
PCR

400 Mb rice genome, CSHL,


17 kb library

Fraction of sequence from subreads >_x_

Chicken transcript library: full-pass subreads


correspond with full-length reference sequences

SMRTbell Template
Preparation

10kb libraries

50% of sequence from


subreads >4800 bases

PacBios draft cDNA sequencing protocol is now


available as a Shared Protocol on SampleNet:
http://www.smrtcommunity.com/Share/Protocol/List

Detection of novel splice forms of a cyclin-dependent kinase

PacBio Reads

Subread lengths, plant and microbial libraries

Single-Pass Accuracy

Consensus Accuracy

XL/ C2

C2 /C2

# of subreads per SMRT Cell

PacBios SMRT Sequencing produces the longest read


lengths of any sequencing technology currently available.
There have been a number of recent improvements to
further extend the length of PacBio RS reads. With an
exponential read length distribution, there are many reads
greater than 10 kb, and some reads at or beyond 20 kb.
These improvements include library prep methods for
generating >10 kb libraries, a new XL polymerase, magnetic
bead loading, stage start, new XL sequencing kits, and
increasing data collection time to 120 minutes per SMRT
Cell. Each of these features will be described, with data
illustrating the associated gains in performance.
With these developments, we are able to obtain greatly
improved and, in some cases, completed assemblies for
genomes that have been considered impossible to
assemble in the past, because they include repeats or low
complexity regions spanning many kilobases. Long read
lengths are valuable in other areas as well. In a single read,
we can obtain sequence covering an entire viral segment,
read through multi-kilobase amplicons with expanded
repeats, and identify splice variants in long, full-length cDNA
sequences. Examples of these applications will be shown.

New XL polymerase extends read lengths, while


maintaining high consensus accuracy

Applications of SMRT Sequencing

Many reads span entire multi-kb transcripts


Single-Pass Accuracy

High consensus accuracy due to randomness of errors in individual reads

Very Large Insert SMRTbell Library Prep


Key steps in preparing very large insert libraries

Magnetic bead loading for more efficient sample utilization,


removal of small fragments with large insert libraries
Diffusion Loading

Sequencing through >2000 bases of pure CGG repeats


Collaboration with UC Davis:
Expanded CCG-Repeat Alleles of the Fragile X Gene

MagBead Loading

Start with high quality input DNA: pulsed-field gel QC


Problematic sample with many
small fragments <1 kb

Ideal sample
30kb

Not ideal
30kb

20kb

20kb

10kb

Figure 2: Example of a
figure caption

5kb

Left, ideal sample, nearly all high molecular weight; right, sample has high molecular
weight band, but shorter fragments will dominate loading and sequence data

Shearing to 10-20 kb: Covaris g-TUBE devices


11

12

13

Loomis et al. (2012) Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene.
Genome Research, accepted for publication.

Regions of long, difficult sequence context are covered in single reads

10kb

5kb

10

Fragments <1 kb are


excluded with MagBead
loading

Polymerase

C2

C2

C2

XL

Loading

Diffusion

MBS

MBS

MBS

5 g (minimum)

5 g

1 g (minimum)

1 g (minimum)

750 ng

750 ng

150 ng

150 ng

Primer Annealing

5 nM

5 nM

0.8333 nM

0.8333 nM

Polymerase Binding

3 nM

3 nM

0.5 nM

0.5 nM

150 pM

150 pM

10 pM

5.5 pM

52 (with reuse)

184 (no reuse)

36 (no reuse)

68 (no reuse)

Input into DNA Repair


15% recovery

Loading (on cell)

14

Total # SMRT Cells


Lane 11 = 18.8 kbp

Template input reduced, number of SMRT Cells increased with MagBead loading and XL polymerase

PacBio variants confirmed by PCR-Sanger


Clone

BAC 1

BAC 2

Extended collection times maximize read length or throughput


30kb

1 x 120 min movie

Read Length

2 x 55 min movies

BAC 3

20kb

Average: 4,500 bp
95th Percentile: 12,000 bp
Max: 21,000 bp

10kb

Average: 4,200 bp
95th Percentile: 9,500 bp
Max: 13,000 bp

Lane 14 = 30.5 kbp

BAC 4

Reference

PacBio

Sanger

T
T
T
G
C
C
G
C
C
C
A
G
T
T
C
T
T
T
T
G
A
A
T
A
T
T

G
-A
T
T
T
C
G
T
G
G
A
C
C
T
--C
C
T
C
G
C
T
-G

G
-A
T
T
T
C
G
T
G
G
A
C
C
T
--C
C
T
C
G
C
T
-G

22 indels and 4 SNPs in human BAC confirmed by


PCR-Sanger

Sequence through 12-base homopolymer

High consensus accuracy of >Q50 obtainable with PacBio sequencing

Samples:
10. K12 gDNA (dil.11/1/2012)
11. K12 shear, regular g-TUBE, 5500 rpm, 50 L @ 100 ng/L
12. K12 shear, regular g-TUBE, 5000 rpm, 50 L @ 100 ng/L
13. K12 shear, regular g-TUBE, 4500 rpm, 50 L @ 100 ng/L
14. K12 shear, regular g-TUBE, 4000 rpm, 50 L @ 100 ng/L

>10 kb read joins 17 contigs


Example from Gbase genome assembly project

Eppendorf

Results from varying spin speed with g-TUBE fragmentation using the
MiniSpin plus. The lower the speed, the larger the size, but also the more likely
sample will remain in the upper reservoir and be lost or not sheared.

Converting to SMRTbell libraries: large DNA fragments are


fragile
Shear = 22.1 kbp
1 2 3

2 kb lambda library
11 kb plasmidbell
120 minute movies maximize number of 10-20 kb reads
2 x 55 minute movies maximize total number of total reads and Mb / sample

Stage start for longer subread lengths


Sequencing the 9,749 bp HIV genome
Cell Prep Station Start Coverage

Stage Start Coverage

Very long inserts can join regions of long repeats, greatly improving problematic assemblies.
For more information on assembly methods, see poster P0998, Towards Finished Genome
Assemblies using SMRT Sequencing .

Conclusion

Library = 16.1 kbp

Samples:
1. Input E. coli K12 gDNA
2. Sheared E. coli K12 gDNA
3. E. coli K12 SMRTbell Library

Fragment size decrease post shearing due to handling


during library prep; gentle handling helps but does not
eliminate this issue

Left, cell prep station start excludes first and last 1000 bases.
Right, stage start increases coverage range nearly to ends of genome. Along with XL
polymerase and 120 minute movies, the entire genome can be covered in a single read.

Recent improvements in SMRT Sequencing provide a wide


range of options, including the capability to sequence over 10 kb
fragments in a single read, enabling the sequencing community to
answer biological questions at a level never before possible.
Acknowledgements
The authors would like to thank Jonathan Bingham, Kathryn Keho, Wendy Wise, Jenny Gu, and the
many contributors in the PacBio community, including CSHL, UC Davis, and U Washington.

Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. 2013 Pacific Biosciences of California, Inc. All rights reserved.

You might also like