Professional Documents
Culture Documents
Dr Z Chikwambi
Biotechnology
Objectives
• What is DNA Sequencing ?
• History of development
32
P 5′ G A C G T G C A A C G A A 3′
Chemical Modification and
Cleavage
• Base Modification using Dimethyl sulphate
– Purine
• Adenine
• Guanine
– Only DMS------- G
– DMS+ Formic acid-------G+A
DMS FA H H+S
G G C C
G A T C
G G T
G G C C
C
A T
G C
A C
A T
P 5′ G A C G T G C A A C G A 3′
32
Maxam-Gilbert Sequencing
3′
A
G
C
A
G G+A T+C C A
C
Longer fragments G
T
A G
C
A
G
Shortest fragments 5′
P 5′ G A C G T G C A A C G A 3′
32
Maxam Gilbert Sequencing: Process Summarized
ddCTP + dAdGddC
C
four dNTPs dAdGdCdTdGddC
dAdGdCdTdGdCddC
dAdGdCdTdGdCdCddC
ddGTP + dAddG
G four dNTPs dAdGdCdTddG
dAdGdCdTdGdCdCdCddG
T
ddTTP + dAdGdCddT
four dNTPs dAdGdCdTdGdCdCdCdG
Chain Termination Sequencing
3′
G
G A T C G
Longer fragments T
A
ddG
A
A
T
C
Shorter fragments A
ddG T
G
5′
3’-ATGTGCTAGCT-5’
3’-ATGTGCTAGCT-5’
5’-TA-3’
5’-T-3’ 5’-TACA-3’
5’-TACACGAT-3’ 5’-TACACGA-3’
5’-TACACGATCGA-3’
Amplification in dGTTP Amplification in ddCTP
3’-ATGTGCTAGCT-5’ 3’-ATGTGCTAGCT-5’
5’-TACACG-3’ 5’-TAC-3’
5’-TACACGATCG-3’ 5’-TACAC-3’
5’-TACACGATC-3’
Reading Sequence
BAND ddATP ddTTP ddGTP ddCTP 3’ 3’
12 bp
11 bp
10 bp
9 bp
8 bp
7 bp
6 bp
5 bp
4 bp
3 bp
2 bp
1 bp
5’ 5’
Sanger Sequencing: Process Summarized
Primer
DNA fragment
Amp
PBR322
Tet
expensive technologies
for DNA sequencing.
Capillary electrophoresis Beam block
(CE) separation has many PMT
Collection Lensc
5' 3'
Sequence assembly
CGATGCGTAGCA
ATCGATGCGTAGC
TAGCAGACTACCGTT
GTTACGATGCCTT
ATCGATGCGTAGCAGACTACCGTTACGATGCCTT…
Assembled sequence
High Throughput DNA
Sequencing:
Next Generation Sequencing
(NGS) Technologies
Available Next-generation Sequencing
Platforms
• Solexa, Illumina
• SOLiD, Applied Biosystems (ABI)
• 454, Roche
• Polonator
• HeliScope
• …
7x Illumina GA-II 2x Roche 454 1x Illumina HiSeq 2000
DNA Sequencing Capability Has Grown
Exponentially
Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)
Applications of Next-generation
Sequencing
Third Generation Technologies
(a) The Roche 454 and ABI SOLiD platforms rely on emulsion PCR to amplify clonal
sequencing features. An adaptor-flanked shotgun library is PCR amplified in a water-
in-oil emulsion. One of the PCR primers is 5'-anchored to the surface of micron-scale
beads. PCR amplicons are held on the surface of the bead as clonal amplicons. 454
detects light signal using luciferase activity on ddNTP incorporation.
(b) The Solexa technology relies on bridge PCR (aka 'cluster PCR') to amplify clone
DNAs. An adaptor-flanked shotgun library is PCR amplified using both primers that
densely coat the surface of a solid substrate, attached at their 5' ends by a flexible
linker. Amplification products originating from any given member of the template form
a clonal cluster (approx. 1,000 copies.
NGS Principles: Emulsion PCR
Richard K. Wilson
Example: Illumina/Solexa
Sequencing
Example: Illumina/Solexa
Sequencing
Example: Illumina/Solexa Sequencing
• Polymorphism detection
» Is it a mutation or sequencing error
» Eg. SNPs
DNA quality deteriorates with cycle number. The quality of sequences can be selected
On that basis using the Phred scale.
NGS Data Analysis Principles
Sequence Quality: Phred Scores
• Phred score = 10 log10( probability of error )
• GeneTool/ChromaTool/Sequencher
(PC/Mac)
Assembly: Read Length & Pairing
• Short reads are problematic, because short
sequences do not map uniquely to the genome.
– Solution #1:
» Get longer reads.
– Solution #2:
» Get paired-end reads.
• The term 'paired ends' refers to the two ends
of the same DNA molecule.
– Can sequence one end, then turn it around and sequence the
other end.
– The two sequences that result are 'paired end reads'.
» Sometimes they're called 'mate pairs'
Paired End Reads are Important!
Known Distance
Read 1 Read 2
Repetitive DNA
Unique DNA
http://www.sanger.ac.uk/Software/Artemis/v10/
Genome Browsers: GBrowse
• Developed by Lincoln Stein
(CSHL/Toronto)
• Very flexible
• Works with flat files, GFF
databases, CHADO RDBs
and via adaptors from
EMBL/GenBank files
• Users include PlasmoDB,
FlyBase and WormBase.
http://gmod.org/wiki/index.php/Gbrowse
Genome Browsers: Ensembl
• Bespoke browser from
the Ensembl group
• Many different display
pages for a variety of
data types
• Users include
Ensembl, VectorBase,
Gramene
http://www.ensembl.org/index.html
Genome Browswers: Apollo
Annotation platform used by FlyBase and
TAIR groups.
Developed through Berkeley/EBI/GMOD.
Can be used as a visualization tool
Capable of reading CHADO-XML and GFF files
Adaptors for CHADO RDB and Ensembl
databases
http://gmod.org/wiki/index.php/Apollo
DNA sequenced;
So what;
• Sequence deposition.
– Into appropriate biological databases.
• Annotation.
– Giving meaning to the sequences.
– Assigning function to the DNA/ genome
sequence.
Some computational problems
• De novo assembly
16
14
15.5
12
15
10
10
14.5
8
14
13.5
6
5
13
4
12.5
2
12
0
11.5
0
0
20
2 40
500 4
60 80
1000 6
100 8 120
1500
140
10
2000
160 12
`
A C G T
Part II:
RNA Assembly
Central dogma of molecular
biology
transcription translation
Reads
ATC CAT TCG
GAT TCG
GAT
GAT TCG
s1 s3 s4 transcript 1
L-1
s1 s3 s5 transcript 2
L-1
Ambiguity due to inter-transcript repeats
s1 s3 s5 transcript 1
L-1
s1 s3 s4 transcript 2
L-1
Abundance diversity
Multi-bridging
to resolve intra-transcript repeats
transcript graph
abundance estimates
Sparsest decomposition
to extract transcripts
transcriptome
Applications of DNA Sequencing
• Forensics: to help identify
individuals because each individual
has a different genetic sequence
• Disadvantages
• Whole genome cannot be sequenced at once
• Very slow and time consuming
The Human Genome Project
• The biggest challenge for the life sciences