You are on page 1of 26

DNA Sequencing: Present Status

and Future Challenges


Elaine Mardis
Washington University Genome
Sequencing Center

Genome Sequence: Present Workflow


Genomic DNA
BAC/fosmid library
Dual end sequencing

Restriction digest
fingerprinting

Physical map generation

Plasmid library (3 kb)


Production sequencing
pipeline

WGS assembly using


ARACHNE algorithm to
generate contigs and
supercontigs
Concordance
Finishing

BAC Fingerprinting: Gel-based Fragment Separation


96 samples, 25 marker lanes
Marker every fifth lane
29,950 bp

HindIII
Restriction
Digestion

560 bp
1% agarose; 8 hours, 140 volts @ 14C
Marra et al., Genome Res., 7, 1072-1084 (1997)

Contig assembly:physical map


Software (Image
or Bandleader) is
used to identify
overlapping
clones with
common
restriction
fragments and
assembles them
into a contig
(FPC)

Clone

*
*
*

**

Sequence data assembly:Supercontig creation and gap filling

(A) A supercontig is constructed by successively linking pairs of contigs that share at least two
forward-reverse links. Here, three contigs are joined into one supercontig. (B) ARACHNE
attempts to fill gaps by using paths of contigs. The first gap in the supercontig shown here is
filled with one contig, and the second gap is filled by a path consisting of two contigs.
Genome Research 12: 177-189 (2002)

Whole genome map assembly

Genome map

Edit contigs and align to map. Gaps between clones


can be filled with other clones, such as fosmids, or by
generating PCR products from BAC clones or
genomic DNA.

Current GSC Production Workflow


picking
Qpix

prepping
PlateTrak
&DNATraks

sequencing
Biomek FX

Each process is documented by


barcode entry into our Oracle
database
QC checks are used to assay quality
at each step in the pipeline

detection
PE 3700/
3730

data transfer

Qpix picking robot

PlateTrak 1 & 2 Robots

Biomek FX robot

ABI 3700 Sequencer


Enhanced sensitivity relative
to gel-based systems
Capillary-based separation of
samples eliminates gel
pouring, gel loading, lane
tracking
Requires large volumes of
buffer, polymer per run
Moving parts (robot, sheath
flow) increase required
maintenance and impact
downtime
Sheath flow detection limits
sensitivity, laser illumination
scheme causes beam
dispersion across sheath flow

New generation instrument

ABI 3730 xl DNA Analyzer

In-capillary detection by
fixed laser eliminates L>R
fade and sheath flow,
improves sensitivity
Direct load from reaction
plate eliminates robotic
volume transfer, decreases
minimal load volume
Increased plate capacity,
decreased buffer/polymer
demand and automated
plate handling decrease
operator intervention

Improved results with lower template


input
0.5XBead input vs. 1X Bead input

Cumulative phred 20
bases

70000
60000
50000
40000
30000
20000
10000
0
2ML Sequenced on 3700
4ML Sequenced on 3700

2ML Sequenced on 3730


4ML Sequenced on 3730

Issue: Large clone end


sequencing
Due to lower sensitivity, end-sequencing of BAC and fosmid
clones was not robust on the 3700.
To achieve reliable results, we have utilized the ABI 3100s
in a specialty group approach:
- requires 1/4th x BDT reactions
- requires ~100 cycles in the thermal cycler
- lower throughput capability

However, the increasing emphasis on large clone linkage for WGS


approach requires higher throughput, lower cost for these templates

High-throughput sequencing
(c. 2002)

GSC produces 2.6 M reads monthly


Plasmid template preps by robotic SPRI
Sequencing reactions in 384 well/Biomek FX
Loading 120 ABI 3700s
Combined WGS plasmid, fosmid and BAC end reads
with a physical map reference is becoming the
strategy of choice for de novo genome sequencing
Our recent introduction of 30 x 3730 instruments
will increase read capacity to 3.2 M reads monthly,
and allow us to efficiently and more cheaply end
sequence large clone types such as fosmids and
BACs.

What are the future challenges to


high-throughput genome sequencing?
1. Most cost decreases have been incremental, rather than
monumental. Large cost decreases will require a revolutionary
approach to detectionperhaps not based on light.
2. There is a fundamental disconnect between the sample size
produced by current prepping and sequencing processes, and the
sensitivity of current instrumentation for detection/analysis.
3. There is a need for additional fluor combinations to enable
reaction multiplexing.

What are the current trends in


DNA sequencing?
Re-sequencing of the human genome is becoming a
key approach
toward understanding certain diseases

Characterizing the genetic differences between affected


vs. unaffected individuals
Characterizing the genetic differences between diseased
vs. normal cells
Developing diagnostic/prognostic assays for disease

What are the technical challenges of


re-sequencing human samples?

Limited quantities of samples


Large sample numbers w/multiple analyses
Critical need to avoid sample mix-ups/QA
Ultimately: instrumentation and methods that
reduce cost per reaction to well below current
costs and require little/no hands-on sample
manipulation
Informatics tools to assemble and analyze data
intelligently and correctly (!)
Database tools/features to combine different
data types in a meaningful way that aids
interpretation

General approach
Design exonand/or intronspecific PCR
primers

Annotated
human sequence
from Ensembl
PCR amplification

DNA sequencing

- lowered emphasis on readlength, increased emphasis on speed


of fragment separation and analysis

Re-sequencing: Data pipeline


Sequence
Sequence each end
of the PCR fragment

Phred
Base-calling
Quality determination

Phrap
Sequence alignment
Final quality determination

PolyPhred
Mutation/polymorphism
detection

Consed

Sequence viewing
Mutation/polymorphism
tagging

Analysis

ORACLE Database
Mutation data
laboratory tracking data
gene feature data

Laboratory Workflow
Web interface to database

(Courtesy of D. Nickerson)

Interactive Visual Tools


Data Quality Checking

(Courtesy of D. Nickerson)

Challenges for Re-Sequencing


Data Analysis
1. Need improved signal processing software for traces
- better background subtraction to eliminate false
positives in detecting sequence differences
2. Need improved software for detecting differences
between aligned sequences
- less manual review of traces and alignments
- more analytical view of results/output
3. Statistical packages that help make sense of
re-sequencing data in the context of genetics,
probability, mutation rates, prognosis/outcome, etc.

Trace data examination

Data
Organization
and
Visualization

Vg software tool is
used to cluster and
visualize data from
re-sequencing of
the same genomic
regions of multiple
individuals

Acknowledgements
GSC
- Matt Hickenbotham
- Jim Eldred
- Darren OBrien
- Tom Erb
- Joe Strong
- Lisa Cook
- Donald Williams
- Nathan Sander
- Josh Conyers
- Todd Carter
- Lliam Christy
- Pat Minx

- Rick Wilson
- John McPherson
- Bob Waterston

University of Washington
- Debbie Nickerson

You might also like