Professional Documents
Culture Documents
”Sanger Sequencing” developed by Fred Sanger et al
in the mid 1970’s
Uses dideoxynucleotides for ”chain termination”,
generating fragments of different lengths ending in
ddATP, ddGTP, ddCTP or ddTTP
http://openwetware.org/wiki/BE.109:Bio-material_engineering/Sequence_analysis
History of Sequencing Cont.
• A schematic of
Sanger sequencing
http://www.scq.ubc.ca/genome-projects-uncovering-the-blueprints-of-biology/
History of Sequencing Cont.
DNA fragments are separated by size by gel
electrophoresis
From the gel, the DNA sequence can be determined
Can produce DNA fragments 700-900bp long (good),
but it’s slow (bad)
Lots of other problems including clone library
generation and low throughput
The Human Genome Project used Sanger
sequencing, completion took over 10 years
Next Generation Sequencers
ABI3730xl Illumina
Sequencing Roche (454)
Genome Genome ABI SOLiD HeliScope
platform FLX
Analyzer Analyzer
Template amplification In vivo amplification Emulsion PCR Bridge PCR Emulsion PCR None (single
method via cloning molecule)
Position
Cycle: G TEMPLATES
C A G T C A 1 2 3
- - G
C C -
- A A
G G -
- - T
C - -
A A A
Provided to author courtesy of Helicos representative
8
Next Gen. Sequencers Cont.
• Sequencing-by-ligation on SOLiD
http://www.umcutrecht.nl/subsite/genetics/Research/PersonalGenomics.htm
Next Gen vs Sanger
Let’s think about the domesticated silkworm
genome
The reference genome is about 432Mb large
It was assembled from approximately 8.5 fold
coverage
ABI3730xl Illumina
Roche (454) Helicos
Platform Genome Genome ABI SOLiD
FLX Heliscope
Analyzer Analyzer
Sequencing 0.03-0.07
13 Mb/h 25 Mb/h 21–28 Mb/h 83 Mb/h
Speed Mb/h
Time to
sequence 2185.7 11.8 6.1 5.5 1.8
(days)
Bioinformatics
Novel whole genome sequencing
The Sorcerer II Global Ocean Sampling Expedition:
Northwest Atlantic through Eastern Tropical Pacific
Whole genome resequencing
Complete Resequencing of 40 Genomes Reveals
Domestication Events and Genes in Silkworm (Bombyx)
RNA-Seq (transcriptomics)
A Global View of Gene Activity and Alternative Splicing by
Deep Sequencing of the Human Transcriptome
Prisca Takundwa
NEXT-GENERATION SEQUENCING :
APPLICATIONS
APPLICATIONS
• Significance ;
1st glimpse of the complete instruction set for
a living organism
an approximation of the minimal set of
genes required for cellular life
Insight into the methods used to come up
with these cellular genomes
Cellular Genomes
• Significance
Paved the way for other cellular genomes
such as E.coli, Saccharomyces cerevisiae,
Caenorhabditis elegans, Drosophila
melanogoster
Human Genome Project
Next-generation appeal
Metagenomics
• Some examples
• Breitbart et al showed that 2000 liters of sea
water contained >5000 different viruses. >1000
of these were found in human stool and majority
of these were new species.
• Craig Venter’s Global Ocean Voyage
Genomic Medicine
• Resequencing
• Plants – Sugar beet and Tropical Evergreen
Fagaceae
• Junk DNA
• Drug discovery
Transcriptomics
Angela Benton
Background
Prakriti Mudvari
Bioinformatics of Deep
Sequencing
http://www.cbcb.umd.edu/research/viewer.jpg
The Basics.
http://www.k.u-tokyo.ac.jp/pros-e/person/shinichi_morishita/shinichi_morishita.htm
Creating a Paired End Tag
http://media.wiley.com/wires/WSBM/WSBM40/nfig001.jpg
Paired End vs. Unpaired Reads
700-900
Read Length 200-300 bp 32-40 bp 35 bp 25-35 bp
bp
Sequencing 0.03-0.07
13 Mb/h 25 Mb/h 21-28 Mb/h 83 Mb/h
throughput Mb/h
Challenges
• Quality of data
• Storage
• Cross Platform Analysis
• Data Annotation
• Assembly
• SNP/Mutation Detection
Bioinformatics Tools
• Cross_match
• ELAND
• Exonerate
• MAQ
• Mosaik
• SHRiMP
• SOAP
• Zoom!
Short Oligonucleotide Alignment Program
(SOAP)
• ABySS
• ALLPATHS
• Edena
• Euler-SR
SHARCGS
• SHRAP
• SSAKE
• Velvet
Assembly By Short Sequence
(ABySS)
• Originally developed for de novo assembly of large genomes
using short reads.
• Is a distributed representation of a de Bruijn graph that allows
parallel computation of algorithm across a network of computers.
• Assembly is done in two steps.
• First possible substrings of a specific length of sequence reads
are first generated. Substring dataset are then processed to
remove errors and contiguous sequences are built without using
paired end information.
• Mate pair information is then used to extend the contigs.
Assembly By Short Sequence
(ABySS)
http://stat.fsu.edu/~lilei/lilei/research/hmm/simulate.gif
Basecalling
• PyroBayes
• Alta-Cyclic
• BayesCall
Single Nucleotide Polymorphisms
(SNP) Detection
• PbShort
• ssahaSNP
Other Tools
Thank you!
Questions?
References