You are on page 1of 41

DOCTORAL SEMINAR-I

COMPARATIVE DNA
SEQUENCING ANALYSES FOR
FEASIBILITIES
Presented by
Jajati Keshari Nayak
PhD 1st year
C
A
B
I
N
C
A
B
I
N
Central Dogma of Life

C
A Transcription Translation

B
I mRNA
Gene Protein
N
Why ?
 Study genetic Variation
 Source & Effect
 Comparative genomics
 Similarity, Syntenic genes/allele
resource, clinical studies
 Evolutionary insight
 Variation among species and course of
evolution …
ACGTGACTGAGGACCG
 Structure and Function of Genomes TGGACTGAGACTGACT
GGGTCTAGCTAGACTA
 Understanding the ‘rules of nature’, ??CGTTTTATATATATAT
ACGTCGTCGTACTGAT
GACTAGATTACAGAC…
SEQUENCING GENOME
Strategy

Libraries

Sequencing

Assembly

Annotation
Annotation
Release

TIME MONEY
Strategies for sequencing
• Genome complexity Strategy

• dispersed repetitive sequence Libraries


Organism
• telomeres & centromeres Sequencing

Assembly
• Size & GC content
e

•Resources & Understanding Annotation

• Volume of data Release


Genome Sequencing

TG..GT TC..CC
AC..GC
CG..CA
TT..TC
TG..AC
AC..GC GA..GC
CT..TG
GT..GC AC..GC AC..GC
AA..GC AT..AT
TT..CC

Genome Short fragments of DNA Short DNA sequences

ACGTGACCGGTACTGGTAAC
GTACACCTACGTGACCGGTA
ACGTGGTAA CGTATACAC TAGGCCATA CTGGTAACGTACGCCTACGT
GACCGGTACTGGTAACGTAT
GTAATGGCG CACCCTTAG ACACGTGACCGGTACTGGTA
TGGCGTATA CATA… ACGTACACCTACGTGACCGG
TACTGGTAACGTACGCCTAC
GTGACCGGTACTGGTAACGT
ACGTGGTAATGGCGTATACACCCTTAGGCCATA ATACCTCT...

Sequenced genome
8
8
Basics of DNA
Synthesis

PPi + O

3’OH Absolute requirement of 3’OH for


Addition of New Nucleotide in the
growing chain - is the basis of
SNAGER’s method
How we obtain the sequence of
nucleotides of a species
 1st Generation DNA Sequencing (1977)
• Maxam & Gilbert – Chain degradation, Chemical Method
• Sanger – Chain termination, Enzymatic method

• Both methods took too much of time and manual.


• Maxam & Gilbert method used chemicals
modification.
• Sanger’s method was suitable for Automation and
Used Enzymes
• Maxam & Gilbert
method not suitable for
automation
Dideoxy (Sanger) Method

• ddNTP: 2’,3’-dideoxynucleotide has No 3’ hydroxyl group


available for Phospho-di-ester bond
• Terminates chain when incorporated
• Add enough so each ddNTP is randomly and completely
incorporated at each base
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G T C T T G G G C T A G C G C
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’

5’
T G C G C G G C C C A G T C T T G G G C T 21 bp
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G T C T T G G G C T A
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’

5’
T G C G C G G C C C A G T C T T G G G C T 21 bp

5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’

5’
T G C G C G G C C C A G T C T T G G G C T 21 bp

5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp

5’
T G C G C G G C C C A G T C T T G G G C T A 22 bp
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G T C T T G G G C
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’

5’
T G C G C G G C C C A G T C T T G G G C T 21 bp

5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp

5’
T G C G C G G C C C A G T C T T G G G C T A 22 bp

5’
T G C G C G G C C C A G 12 bp
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G T C T T
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’

5’
T G C G C G G C C C A G T C T T G G G C T 21 bp

5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp

5’
T G C G C G G C C C A G T C T T G G G C T A 22 bp

5’
T G C G C G G C C C A G 12 bp

5’
T G C G C G G C C C A G T C T T G G G C 20 bp
SANGER SEQUENCING
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’

5’
T G C G C G G C C C A G T C T T G G G C T 21 bp

5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp

5’
T G C G C G G C C C A G T C T T G G G C T A 22 bp

5’
T G C G C G G C C C A G 12 bp

5’
T G C G C G G C C C A G T C T T G G G C 20 bp

5’
T G C G C G G C C C A G T C T T 16 bp
Automated Sanger’s Sequencing & Data
Documentation/ Interpretation
Sanger’s Sequencing
 Advantages
 Long reads (~900bps)
 Suitable for small projects
 No need of gel electrophoresis

 Disadvantages
 Low throughput
 Expensive

19
2nd Generation: Pyrosequencing
 Basic idea:
 Visible light is generated and is proportional to the
number of incorporated nucleotides

DNA Polymerase I from


E.coli.

pyrophospate

From fireflies, oxidizes luciferin and generates light


2nd Generation: Pyrosequencing
 Sequencing by
synthesis
 Advantages:
 Accurate
 Parallel processing
 Easily automated
 Eliminates the need for
labeled primers and
nucleotides
 No need for gel
electrophoresis
Pyrosequencing Results:

AGGGGTCAGGTCAGTTTCAGGGGTTCAGTCAGTTCAG
Illumina sequencing
(reversible terminator sequencing)

In illumina modified dNTPs use contains reversible


terminator which is combine with fluorescent dye
SOLiD (sequencing by ligation)
Next Generation Sequencing
Polony array/ DNA Beads (454, SOLiD)

DNA Beads are placed in wells, No Need of tubes, High-


throughput 25
Conventional vs 2nd generation sequencing
Available next-generation sequencing
platforms
 Illumina/Solexa – Modified
terminators
 ABI SOLiD – Ligation Chemistry
 Roche 454 – Pyrosequencing
 Nanopore – Non – amplification,
single molecule
Which technology to go with

Read length Sequencing Throughput Cost (1mbp)*


Technology (per run)
Sanger ~800bp Sanger 400kbp 500$

454 ~400bp Polony 500Mbp 60$

Solexa 75bp Polony 20Gbp 2$

SOLiD 75bp Polony 60Gbp 2$

Helicos 30-35bp Single 25Gbp 1$


molecule

*Source: Shendure & Ji, Nat Biotech, 2008 28


What, When and Why
 Sanger:
Small projects (less than 1Mbp)
 454:
Whole genome, De-novo sequencing, metagenomics
 Solexa, SOLiD, Heliscope:
 Gene expression,
 Resequencing

29
Genomic DNA
Shearing/Sonication

Sequence Each Clone Fragment

Match
Overlapping
Sequences
Assembly

Contigs Compile Data,


Gap, Find and Fill Quality Check
&
Editing
Reassemble

Draft sequence
First Generation Sequencing
Second Generation Sequencing
2nd Gen Sequencing Tech
 Traditional sequencing: 384 reads ~1kb / 3 hours
 454 (Roche):
 1M reads 450-1000bp / 10-24 hours
 HiSeq (Illumina):
 http://www.youtube.com/watch?v=HtuUFUnYB9Y
 100-200M reads of 50-100bp / 3-8 days * 16 samples
 SOLiD (Applied Biosystems)
 >100M reads of 50-60bp / 2-8 days * 12 samples
 Ion Torrent (Roche):
 http://www.youtube.com/watch?v=yVf2295JqUg
 5-10M reads of 200-400bp / < 2 hours

33
Illumina HiSeq2000
 Throughput:
 $1000-2500 / lane (depends on read length, SE / PE)
 50-100 bp / read
 16 lanes (2 flow cells) / run
 150-200 million reads / lane
 Sequencing a human genome: $3000, 1 week
 Bioinfo challenges
 Very large files
 CPU and RAM hungry
 Sequence quality filtering
 Mapping and downstream analysis

34
Third Generation Sequencing
 Single molecule sequencing (no amplification needed)
 Oxford Nanopore: Read fewer but longer sequences
http://www.youtube.com/watch?v=_rRrOT9gfpo
 In 1-2 years, the cost of sequencing a human genome
will drop below $1000, storage will cost more than
sequencing
 Personal genome sequencing might become a key
component of public health in every developed
country
 Bioinformatics will be key to convert data into
knowledge

37
Nanopore Sequencing
(Potential) Applications
 Metagenomics and infectious disease
 Ancient DNA, recreate extinct species
 Comparative genomics (between species) and
personal genomes (within species)
 Genetic tests and forensics
 Circulating nucleic acids
 Risk, diagnosis, and prognosis prediction
 Transcriptome and transcriptional regulation
 More later in the semester…

39
Genome (sequence) annotation
 Structural :
 Identify genes, Pseudogenes, clusters
 Identify Mutations/ Variations
 Identify repeats
 Identify ESTs, UREs, Homologous, Analogous regions
 Identify SNPs
 Functional
 Protein/polypeptide encoded by genes
 Putative stage/tissue of expression of gene
 Associations with ‘probable’ phenotypes
 Relation to diagnostics
 Variation among population, species, genus - phylogeny
 THANK YOU

You might also like