Professional Documents
Culture Documents
COMPARATIVE DNA
SEQUENCING ANALYSES FOR
FEASIBILITIES
Presented by
Jajati Keshari Nayak
PhD 1st year
C
A
B
I
N
C
A
B
I
N
Central Dogma of Life
C
A Transcription Translation
B
I mRNA
Gene Protein
N
Why ?
Study genetic Variation
Source & Effect
Comparative genomics
Similarity, Syntenic genes/allele
resource, clinical studies
Evolutionary insight
Variation among species and course of
evolution …
ACGTGACTGAGGACCG
Structure and Function of Genomes TGGACTGAGACTGACT
GGGTCTAGCTAGACTA
Understanding the ‘rules of nature’, ??CGTTTTATATATATAT
ACGTCGTCGTACTGAT
GACTAGATTACAGAC…
SEQUENCING GENOME
Strategy
Libraries
Sequencing
Assembly
Annotation
Annotation
Release
TIME MONEY
Strategies for sequencing
• Genome complexity Strategy
Assembly
• Size & GC content
e
TG..GT TC..CC
AC..GC
CG..CA
TT..TC
TG..AC
AC..GC GA..GC
CT..TG
GT..GC AC..GC AC..GC
AA..GC AT..AT
TT..CC
ACGTGACCGGTACTGGTAAC
GTACACCTACGTGACCGGTA
ACGTGGTAA CGTATACAC TAGGCCATA CTGGTAACGTACGCCTACGT
GACCGGTACTGGTAACGTAT
GTAATGGCG CACCCTTAG ACACGTGACCGGTACTGGTA
TGGCGTATA CATA… ACGTACACCTACGTGACCGG
TACTGGTAACGTACGCCTAC
GTGACCGGTACTGGTAACGT
ACGTGGTAATGGCGTATACACCCTTAGGCCATA ATACCTCT...
Sequenced genome
8
8
Basics of DNA
Synthesis
PPi + O
5’
T G C G C G G C C C A G T C T T G G G C T 21 bp
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G T C T T G G G C T A
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’
5’
T G C G C G G C C C A G T C T T G G G C T 21 bp
5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’
5’
T G C G C G G C C C A G T C T T G G G C T 21 bp
5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp
5’
T G C G C G G C C C A G T C T T G G G C T A 22 bp
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G T C T T G G G C
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’
5’
T G C G C G G C C C A G T C T T G G G C T 21 bp
5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp
5’
T G C G C G G C C C A G T C T T G G G C T A 22 bp
5’
T G C G C G G C C C A G 12 bp
SANGER SEQUENCING
Primer
5’
T G C G C G G C C C A G T C T T
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’
5’
T G C G C G G C C C A G T C T T G G G C T 21 bp
5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp
5’
T G C G C G G C C C A G T C T T G G G C T A 22 bp
5’
T G C G C G G C C C A G 12 bp
5’
T G C G C G G C C C A G T C T T G G G C 20 bp
SANGER SEQUENCING
A C G C G C C G G G T C A G A A C C C G A T C G C G
3’ 5’
5’
T G C G C G G C C C A G T C T T G G G C T 21 bp
5’
T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp
5’
T G C G C G G C C C A G T C T T G G G C T A 22 bp
5’
T G C G C G G C C C A G 12 bp
5’
T G C G C G G C C C A G T C T T G G G C 20 bp
5’
T G C G C G G C C C A G T C T T 16 bp
Automated Sanger’s Sequencing & Data
Documentation/ Interpretation
Sanger’s Sequencing
Advantages
Long reads (~900bps)
Suitable for small projects
No need of gel electrophoresis
Disadvantages
Low throughput
Expensive
19
2nd Generation: Pyrosequencing
Basic idea:
Visible light is generated and is proportional to the
number of incorporated nucleotides
pyrophospate
AGGGGTCAGGTCAGTTTCAGGGGTTCAGTCAGTTCAG
Illumina sequencing
(reversible terminator sequencing)
29
Genomic DNA
Shearing/Sonication
Match
Overlapping
Sequences
Assembly
Draft sequence
First Generation Sequencing
Second Generation Sequencing
2nd Gen Sequencing Tech
Traditional sequencing: 384 reads ~1kb / 3 hours
454 (Roche):
1M reads 450-1000bp / 10-24 hours
HiSeq (Illumina):
http://www.youtube.com/watch?v=HtuUFUnYB9Y
100-200M reads of 50-100bp / 3-8 days * 16 samples
SOLiD (Applied Biosystems)
>100M reads of 50-60bp / 2-8 days * 12 samples
Ion Torrent (Roche):
http://www.youtube.com/watch?v=yVf2295JqUg
5-10M reads of 200-400bp / < 2 hours
33
Illumina HiSeq2000
Throughput:
$1000-2500 / lane (depends on read length, SE / PE)
50-100 bp / read
16 lanes (2 flow cells) / run
150-200 million reads / lane
Sequencing a human genome: $3000, 1 week
Bioinfo challenges
Very large files
CPU and RAM hungry
Sequence quality filtering
Mapping and downstream analysis
34
Third Generation Sequencing
Single molecule sequencing (no amplification needed)
Oxford Nanopore: Read fewer but longer sequences
http://www.youtube.com/watch?v=_rRrOT9gfpo
In 1-2 years, the cost of sequencing a human genome
will drop below $1000, storage will cost more than
sequencing
Personal genome sequencing might become a key
component of public health in every developed
country
Bioinformatics will be key to convert data into
knowledge
37
Nanopore Sequencing
(Potential) Applications
Metagenomics and infectious disease
Ancient DNA, recreate extinct species
Comparative genomics (between species) and
personal genomes (within species)
Genetic tests and forensics
Circulating nucleic acids
Risk, diagnosis, and prognosis prediction
Transcriptome and transcriptional regulation
More later in the semester…
39
Genome (sequence) annotation
Structural :
Identify genes, Pseudogenes, clusters
Identify Mutations/ Variations
Identify repeats
Identify ESTs, UREs, Homologous, Analogous regions
Identify SNPs
Functional
Protein/polypeptide encoded by genes
Putative stage/tissue of expression of gene
Associations with ‘probable’ phenotypes
Relation to diagnostics
Variation among population, species, genus - phylogeny
THANK YOU