You are on page 1of 157

APA ȘI BIOLOGIA

MOLECULARĂ
SEF LUCR. DR. KOVACS ZSOLT
INTRODUCTION TO NUCLEIC ACID
BIOCHEMISTRY AND MOLECULAR
DIAGNOSTIC TECHNOLOGIES
“This structure has novel features which are of considerable
biological interest”
Watson and Crick, 1953
MILESTONES OF THE MOLECULAR
ERA
THE HUMAN GENOME
PROJECT FEBRUARY
2001
PROMISES OF THE HUMAN
GENOME

• Diagnostic - better
• Prognostic - more powerful
• Predictive - preventive
• Therapeutic – more personalized
GENOMIC TECHNOLOGIES ARE
UNIVERSAL

• Anatomic pathology
• Chemistry/Toxicology
• Genetics
• Hematopathology/Oncology
• Infectious diseases
• Transfusion medicine/Identity
THE HUMAN GENOME
1 ctccgggctg tcccagctcg gcaagcgctg cccaggtcct ggggtggtgg cagccagcgg
61 gagcaggaaa ggaagcatgt tcccaggctg cccacgcctc tgggtcctgg tggtcttggg
121 caccagctgg gtaggctggg ggagccaagg gacagaagcg gcacagctaa ggcagttcta
181 cgtggctgct cagggcatca gttggagcta ccgacctgag cccacaaact caagtttgaa
241 tctttctgta acttccttta agaaaattgt ctacagagag tatgaaccat attttaagaa
301 agaaaaacca caatctacca tttcaggact tcttgggcct actttatatg ctgaagtcgg
361 agacatcata aaagttcact ttaaaaataa ggcagataag cccttgagca tccatcctca
421 aggaattagg tacagtaaat tatcagaagg tgcttcttac cttgaccaca cattccctgc
481 agagaagatg gacgacgctg tggctccagg ccgagaatac acctatgaat ggagtatcag
541 tgaggacagt ggacccaccc atgatgaccc tccatgcctc acacacatct attactccca
601 tgaaaatctg atcgaggatt tcaactctgg gctgattggg cccctgctta tctgtaaaaa
661 agggacccta actgagggtg ggacacagaa gacgtttgac aagcaaatcg tgctactatt
721 tgctgtgttt gatgaaagca agagctggag ccagtcatca tccctaatgt acacagtcaa
781 tggatatgtg aatgggacaa tgccagatat aacagtttgt gcccatgacc acatcagctg
841 gcatctgctg ggaatgagct cggggccaga attattctcc attcatttca acggccaggt
901 cctggagcag aaccatcata aggtctcagc catcaccctt gtcagtgcta catccactac
961 cgcaaatatg actgtgggcc cagagggaaa gtggatcata tcttctctca ccccaaaaca
CENTRAL
DOGMA

DNA

Transcription

RNA

Translation

Protein
RNA SPLICING
GENE
STRUCTURE
Upstream Downstream

5´ Exon Intron Exon Intron Exon Intron Exon 3´


1 1 2 2 3 3 4

Promoter
3´-UTR
5´-UTR (untranslated region)
(untranslated region) Polyadenylation
Signal
THE UNIVERSAL GENETIC
CODE
First Second Position Third
Position Position
(5´-end) U C A G (3´-end)

U Phe Ser Tyr Cys U


U Phe Ser Tyr Cys C
U Leu Ser Stop Stop A
U Leu Ser Stop Trp G
C Leu Pro His Arg U
C Leu Pro His Arg C
C Leu Pro Gln Arg A
C Leu Pro Gln Arg G
A Ile Thr Asn Ser U
A Ile Thr Asn Ser C
A Ile Thr Lys Arg A
A Met Thr Lys Arg G
G Val Ala Asp Gly U
G Val Ala Asp Gly C
G Val Ala Glu Gly A
G Val Ala Glu Gly G
Types of Genetic Variants

• Structural
– gain/loss of chromosome segments
– translocations
– rearrangements
– gene amplifications

• Molecular
– deletions/insertions
– nucleotide repeats (di-, tri-)
– point mutations (RFLPs, SNPs)
HUMAN GENETIC
VARIATION
• Around 99.9% nucleotide bases are exactly the
same in all people
• The differences (genetic polymorphisms) are
what makes each individual unique (except
identical twins)
• Basic concepts:
– Locus
– Allele
– Polymorphism
– Mutation
TERMINOLO
GY
Locus: Position or location Allele: Alternative versions
of a gene or genetic of a gene at a given locus
marker on a chromosome

HOMOZYGOUS

HETEROZYGOUS
GENETIC POLYMORPHISM

• A difference in DNA sequence with at least two alternate forms


which are present in the general population at a frequency >1%
• Term polymorphism is generally used to refer to a difference in
sequence that is not associated with disease
• BUT…polymorphisms may have an outwardly visible effect (i.e. a
phenotype) or be associated with risk of disease
• OR may alter the function or expression level of a protein if they
occur in or near a gene
TYPES OF GENETIC
VARIATION

Two copies
of each * *
CHROMOSOME * *

Two
ALLELES 1 A 1
of each
gene or 2 G 2
genetic locus
SEQUENCE LENGTH
POLYMORPHISM POLYMORPHISM

Single Nucleotide Tandem


Polymorphism (SNP) Repeat
TWO CLASSES OF TANDEM
REPEATS
Minisatellites Microsatellites
or or
Variable Number Short Tandem
of Tandem Repeats Repeats (STR)
(VNTR)

8 – 80 bp Repeats 2 – 7 bp Repeats
MUTATIONS: SOMATIC AND
GERMLINE
Somatic mutations Germline mutations
Occur in nongermline tissues Present in egg or sperm
Are nonheritable Are heritable

Cause cancer family syndrome

Nonheritable

Somatic mutation Mutation in All cells affected


(e.g., breast) egg or sperm in offspring
THE DNA SEQUENCE DETERMINES PROTEIN
SEQUENCE

DNA Exon

RNA

Protein
TYPES OF MUTATIONS

• Point mutations (base pair substitution)


– Codon changes: missense, silent or nonsense
– Affects gene transcript splicing
– Gene expression
• Deletions and insertions (large or small)
• Frame-shift mutations: produced by deletions,
insertions, or splicing errors
• Dynamic mutations
– Unstable trinucleotide repeats
From: NEJM 347:1512-1520, 2002

MISSENSE MUTATION
From: NEJM 347:1512-1520, 2002

SILENT
MUTATIONS

Can still result in disease!!!


From: NEJM 347:1512-1520, 2002

NONSENSE OR STOP
MUTATIONS
TYPES OF MUTATIONS

• Point mutations (base pair substitution)


– Codon changes: missense, silent or nonsense
– Affects gene transcript splicing
– Gene expression
• Deletions and insertions (large or small)
• Frame-shift mutations: produced by deletions,
insertions, or splicing errors
• Dynamic mutations
– Unstable trinucleotide repeats
DELETION MUTATION
EXAMPLES

• Cystic fibrosis
– 70% due to F508
• Duchenne muscular
dystrophy
– ~60% cases due to
large deletions
– 8% small
deletions/insertions
TYPES OF MUTATIONS

• Point mutations (base pair substitution)


– Codon changes: missense, silent or nonsense
– Affects gene transcript splicing
– Gene expression
• Deletions and insertions (large or small)
• Frame-shift mutations: produced by deletions,
insertions, or splicing errors
• Dynamic mutations
– Unstable trinucleotide repeats
INSERTIONS AND FRAME-SHIFT
MUTATIONS

INSERTION
FRAMESHIFT
MUTATIONS
• Insertion or deletion of nucleotides
• Reading frame is altered
THE FAT CAT ATE HIS HAT

Insert A
THE FAA TCA TAT EHI SHA T

Delete A
THE FTC ATA THE ISH AT
TYPES OF MUTATIONS

• Point mutations (base pair substitution)


– Codon changes: missense, silent or nonsense
– Affects gene transcript splicing
– Gene expression
• Frame-shift mutations: produced by deletions,
insertions, or splicing errors
• Insertions and deletions (large or small)
• Dynamic mutations
– Unstable trinucleotide repeats
EXAMPLE OF DYNAMIC
MUTATION:
TRINUCLEOTIDE REPEAT
EXPANSION

40 to 121 CAGn


Huntington Disease Disease Alleles

36 to 39 CAG Reduced Penetrance

27 to 35 CAG Mutable Normal Allele

< 26 C A Normal Alleles


G CAGn
Huntingtin Gene
PCR AMPLIFICATION
POLYMERASE CHAIN REACTION
(PCR)
dsDNA
94 oC denaturation

ssDNA
1 PCR Cycle
50-60 oC annealing (15-30 sec each)

Primer hybridization
72 oC extension

Polymerase/DNA synthesis
POST-PCR
ANALYSIS
PCR TESTING
STEPS

Sample Preparation Amplification Detection/Analysis


PCR
THE POLYMERASE CHAIN
REACTION
POLYMERASE CHAIN REACTION

PCR – first described in mid 1980’s, Mullis Nobel prize in


1993
An in vitro method for the enzymatic synthesis of
specific DNA sequences
Selective amplification of target DNA from a heterogeneous,
complex DNC/cDNA population
Requires
Two specific oligonucleotide primers
Thermostable DNA polymerase
dNTP’s
Template DNA
Sequential cycles of (generally) three steps (temperatures)
INITIALLY PCR USED THE KLENOW FRAGMENT OF E.
COLI DNA
POLYMERASE - INACTIVATED BY HIGH TEMPERATURES
Kleppe, Ohtsuka, Kleppe, Molineux, Khorana. 1971. J. Mol. Biol. 56:341.

Required a thermostable DNA polymerase – Taq


DNA polymerase from Thermus aquaticus
a thermophilic eubacterial microorganism
isolated from a hot spring in Yellowstone
National Park

Kcat = 150 nucleotides/sec/enzyme (at Topt)

Taq1/2 =
PCR - BEFORE THE
THERMOCYCLER
THERMOCYCLE
RS
A SIMPLE THERMOCYCLING
PROTOCOL
BASIC COMPONENTS OF PCR

Buffer
Primers
AC T G dNTPs
Taq polymerase
2
DNA template
+ + MgCl
2
MgCl2 (mM)
1.5 2 3 4 5
Magnesium Chloride
(MgCl2 - usually 0.5-5.0mM)

Magnesium ions have a variety of effects


Mg2+ acts as cofactor for Taq polymerase
Required for Taq to function

Mg2+ binds DNA - affects primer/template interactions

Mg2+ influences the ability of Taq pol to interact with


primer/template sequences
More magnesium leads to less stringency in
binding
PCR PROBLEMS
TAQ IS ACTIVE AT LOW
TEMPERATURES
At low temperatures mis-priming is likely

Temp Extension Rate


“CURES” FOR MIS-

PRIMING
“Cheap” fixes
– Physical separation –”DNA-in-the-cap”
– Set up reactions on ice

• Hot-start PCR –holding one or more of the PCR components


until the first heat denaturation
– Manually - delay adding polymerase
– Wax beads
– Polymerase antibodies

• Touch-down PCR – set stringency of initial annealing


temperature high, incrementally lower with continued cycling

• PCR additives
– 0.5% Tween 20
– 5% polyethylene glycol 400
– betaine
– DMSO
PRIMER
DESIGN
1. Typically 20 to 30 bases in length
2. Annealing temperature dependent upon
primer sequence (~ 50% GC content)
3. Avoid secondary structure, particularly 3’
4. Avoid primer complementarity (primer dimer)
5. The last 3 nucleotides at the 3` end is the
substrate for DNA polymerase - G or C
6. Many good freeware programs available
PRIMER DESIGN
SOFTWARE

Many free programs available online

OLIGO

PRIMER

PrimerQuest
RULES OF THUMB FOR PCR
CONDITIONS
• Add an extra 3-5 minute (longer for Hot-start Taq) to your cycle
profile to ensure everything is denatured prior to starting the PCR
reaction

• Approximate melting temperature (Tm) = [(2 x (A+T)) +(4 x


(G+C))]ºC
•If GC content is < 50% start 5ºC beneath Tm for annealing
temperature
• If GC content ≥ 50% start at Tm for annealing temperature

• Extension @ 72ºC: rule of thumb is ~500 nucleotide per minute.


Use 3 minutes as an upper limit without special enzymes

• “Special” PCR cycling protocols


• Touchdown PCR
• Step-up PCR
• Gradient cycling
COMMON PCR
ADDITIVES
BSA (usually at 0.1 to 0.8 µg/µL final concentration) Stabilize Taq
polymerase & overcome PCR inhibitors

DMSO (usually at 2-5% v/v, inhibitory at ≤ 10% v/v)


Denaturant - good at keeping GC rich template/primer strands from
forming secondary structures.

Glycerol (usually at 5-10% v/v)


Increases apparent concentration of primer/template mix, and often
increases PCR efficiency at high temperatures.

Stringency enhancers (Formamide, Betaine, TMAC)


Concentrations used vary by type
Enhances yield and reduces non-specific priming

Non-ionic detergents (Triton X, Tween 20 or Nonidet P-40) (0.1–1%) NOT


SDS (0.01% SDS cuts Taq activity to ~10% of normal) Stabilize Taq
polymerase & suppress formation of 2º structure
TYPICAL PCR TEMPS/TIMES
GEL
ELECTROPHORESIS
WORKHORSE METHOD
OF THE BIOTECH
LABORATORY
Separation through a matrix
(agarose or acrylamide)

Analytical or preparative method

Separates fragments with large range


of molecular weights

Driven by simple current and the


fact nucleic acids are uniformly negatively charged

Sample buffer (SB) or tracking dye (TD) or loading buffer -


used to keep sample in the well and visualize run
AN ASIDE: OHM’S
LAW
V = IR

voltage = current x resistance


(electric field)(milliamps) (ohms)

What does this REALLY mean to you?

For a given current, decreasing the thickness of the gel or ionic


strength of the running buffer increases the mobility of the nucleic
acid fragments

Manipulate this when possible to speed up the “pay-off”… Did your


PCR work?!!
ELECTROPHORESIS
VARIATIONS ON A
THEME

DGGE
Denaturing gradient gel electro4

TGGE
Temperature gradient gel electro4

Allows separation of single base


polymorphisms
PULSED FIELD GEL ELECTROPHORESIS
(PFGE)
SCHWARTZ AND CANTOR (1984) CELL 37:67-75

DNA (from cells or DNA, undigested or digested)


Embedded into agarose plugs
Electrical field constantly changes direction (non-continuous)
allows for increased resolution of very h.m.w. gDNA
(Kb to Mb ranges)
chiller, pump, flatbed electro4 box
can use digested or undigested
RNA
ELECTROPHORESIS
Requires special solution
treatment to protect RNA from
degradation or
from folding in on itself

RNA is denatured and run on agarose gels containing formaldehyde.


Formaldehyde forms unstable Schiff bases with the single imino group of
guanine bases. This maintains RNA in a denatured state so it
electrophoreses properly according to it’s molecular weight.

Uses same gel box and power supply as traditional DNA


electrophoresis
Troubleshooting PCR
Non-specific bands on your gel

Reagents, set-up Run negative control


Template concentration inappropriate Review guidelines
Annealing temp too low Optimize by gradient PCR
Extension time too short  time for longer products
Cycle number too high Review guidelines
Primer design not appropriate  specificity
Primer concentration too high Optimize by titration
Non-specific priming  specificity, Hot Start
MgCl2 concentration too high GC- Optimize by titration
rich template, PCR additives
 2° Decontaminate work area:
Contaminating DNA use ARTs, wear gloves,
pipettor, reagents,
UV treat plastics
Troubleshooting PCR
Diffuse smearing on your gel
Review guidelines
Optimize by titration
TEMPLATE CONCENTRATION
Review guidelines
INAPPROPRIATE TAQ
CONCENTRATION TOO HIGH •Reduce, review guidelines
Extension time inappropriate • specificity Optimize by titration
Cycle number too high Primer •use Hot Start Optimize by titration
design not appropriate Primer •PCR additives
concentration too high Non-
• Decontaminate work area: use ARTs, wear
specific priming
gloves,
MgCl2 concentration too high GC-
• pipettor, reagents,
rich template,  2° structure
• UV treat plastics
Contaminating DNA
Troubleshooting PCR
Poor or no amplification of bands

Problem with thermocycler, set-up, Run positive control


reagents
Enzyme concentration low  Concentration
Annealing temp too low Optimize by gradient PCR
Extension time too short  Time for longer products
Cycle number too low Review guidelines
Primer design not appropriate  Specificity
Primer concentration too high Optimize by titration
Non-specific priming  Specificity, Hot Start
MgCl2 concentration too low Optimize by titration
GC-rich template,  2° structure PCR additives
Troubleshooting PCR
Prioritizing Approaches

“Pilot” error ( set-up errors common in the interim between


training with someone and working independently)

Template dilution error (concentration matters!)

Thermocycling parameter errors (temps/times)

Bad reagents (1. dNTPs, 2. primers, 3. Taq)

Unique template or template structure issues

BAD KARMA (don’t believe it!)

Don’t get discouraged…validating PCRs can be tricky


Real time (Q)
PCR
DIFFERENTIAL GENE
EXPRESSION: THE NEED
FOR QUANTITATION
• PCR: excellent tool to detect and analyze
DNA (mutation analysis, infectious disease
diagnosis, etc…)

• RT-PCR: excellent tool to detect and


analyze RNA (mRNA, miRNA, etc…) as a
surrogate for gene expression studies.

• Neither of them can quantitate!


CYCLE AMOUNT OF
NUMBER DNA

0 1
N
1600000000
1 2
2 4 1400000000

AMOUNT OF DNA
3 8 1200000000
4 16 1000000000
5 32
800000000
6 64
7 128
600000000
8 256 400000000
9 512 200000000
10 1,024 0
11 2,048 0 5 10 15 20 25 30 35 n
12 4,096
PCR CYCLE NUMBER
13 8,192
14 16,384
15
16
32,768
65,536
Ni = N0 x 2ni
17 131,072
10000000000
18 262,144 1000000000
19 524,288 100000000
AM O U N T OF D N A
20 1,048,576 10000000
21 2,097,152 1000000
22 4,194,304 100000
23 8,388,608 10000
24 16,777,216 1000
25 33,554,432 100
26 67,108,864 10
27 134,217,728 1
28 268,435,456 0 5 10 15 20 25 30 35
29 536,870,912 PCR CYCLE NUMBER
30 1,073,741,824
END POINT VS REAL-
TIME
• End Point analysis of PCR products (EtBr
stained gels, primers labeled with fluorescent
dies followed by capillary electrophoresis,
etc…) yields the same results, regardless of
the initial amount of template.

• Real-Time PCR (qPCR) allows to follow the


amplicon accumulation kinetics and assess
when the exponential phase occurs for each
sample, which directly correlates with the
initial amount of template.
REAL-TIME PCR (QPCR)
CHEMISTRIES
• Fluorescence-based
– After absorbance of certain wavelengths of
light (excitation), the fluorophore emits light at
a longer wavelength (emission)

Excite Detect

Fluorescence
Fluorophore

Excitation
Emission
Wavelength
7
Do you recognize any
of these instruments? Do you recognize any
of these instruments?

LightCycler 24
REAL-TIME PCR (QPCR)
CHEMISTRIES
• Fluorescence-based
– After absorbance of certain wavelengths of
light (excitation), the fluorophore emits light at
a longer wavelength (emission)
– Fluorescence proportional to amplified product

• Three commonly used chemistries:


– Bi-functional
Molecules: Scorpion®
Primers
– Hydrolysis Probes:
®
TaqMan Chemistry
– Hybridization Probes:
LightCycler® Chemistry
REAL-TIME (QPCR)
CHEMISTRIES
• Molecular Beacons:

• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:

• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:

• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:

• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:

• Scorpion® Primers:
REAL-TIME (QPCR)
CHEMISTRIES
• Molecular Beacons:

• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:

• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:

• Scorpion® Primers:
SCORPION® PRIMERS APPLICATION:
EGFR
SCORPION® PRIMERS APPLICATION:
EGFR
®
REAL-TIME CHEMISTRIES: TAQMAN

• Target specific hybridization probe (Tm > Text)


• 5’ reporter and 3’ quencher

• Utilizes FRET quenching Light*


Light

Energy
R Reporter
Quencher
R Q
Q
TAQMAN CHEMISTRY
HYDROLYSIS PROBE

1. During PCR, probe hybridizes
to target sequence

R 2. Probe is partially
R Q
Taq Taq displaced during extension

3. Probe cleaved by 5’- 3’


nuclease activity of polymerase

4. Illuminated free reporter


exhibits unquenched fluorescence
3. intensifier
5. ccd
detector
1. halogen
350,000
tungsten lamp 2b. emission pixels
filters

2a. excitation
filters
4. sample plate

22
www.biorad.com
THRESH
OLD

NTC
25
26
A CALIBRATION CURVE FOR
AN ABSOLUTE
QUANTITATION

27
HALLMARKS OF A GOOD QPCR
ASSAY
Reaction Efficiency of 100 +/- 10%

If efficiency = 100%, then E=2

slope →

Dumur et al. Anal. Biochem. 2002


REAL-TIME CHEMISTRIES:
LIGHTCYCLER
HYBRIDIZATION PROBES
REAL-TIME CHEMISTRIES:
LIGHTCYCLER
HYBRIDIZATION PROBES
TAQMAN: ALLELIC
DISCRIMINATION
TAQMAN:ALLELIC DISCRIMINATION

FAM

VIC
LIGHTCYCLER:
GENOTYPING
DNA MELTING TEMPERATURE

100% anneal

50% melt

100% melt

Tm (ºC)
DNA MELTING TEMPERATURE

Fluorescence versus Temp

for better visualization of the inflection point

-(dF/dT) versus Temp

Tm (ºC)
FACTOR V
LEIDEN
Current Applications of Real-Time: q(RT-)PCR

• qPCR: quantitation of DNA viruses (CMV, EBV,


BKV, HSV, HBV)

• qPCR: Somatic Mutation Detection (EGFR –


Scorpion Primers).

• qRT-PCR: diagnosis and follow-up of leukemia


(CML, ALL, APL, etc…) by quantitating
translocated transcripts. Quantitation of RNA
viruses (HIV, HCV)

• Genotyping (Germline Mutations): allele


discrimination
®
(JAK2 mutation, IL28b genotping -
TaqMan ) and melting curve analysis (Factor II/V
- LightCycler®).
DNA
SEQUENCING
WHAT IS DNA
SEQUENCING?

• DNA sequencing is the ability to


determine nucleotide sequences of
DNA molecules.
DNA Sequencing

• Gold standard for mutation detection


• Confirmation of mutation detection by
other methods
• Gold standard for histocompatibility
typing
• Resistance testing
DNA
SEQUENCING

- In 1977 two separate methods for sequencing DNA were


developed
- chemical degradation method or Maxam-Gilbert sequencing
(Maxam and Gilbert)
- chain termination method (Sanger et al.)

1980: Walter Gilbert (Biol. Labs) & Frederick Sanger (MRC Labs)
MAXAM AND GILBERT DNA
SEQUENCING

• Chemical cleavage of
phosphate backbone at
specific bases
– G
– A or G (purine-specific)
– C or T (pyrimidine-
specific)
– C

• Fragment size analysis


by gel electrophoresis
• Not commonly used
SANGER DNA SEQUENCING
METHODS
• Technology
– Chain termination (dideoxy method)
– Cycle sequencing
• Platform
– Manual (gel electrophoresis)
– Automated (capillary
electrophoresis)
SANGER DNA SEQUENCING
METHODS
- Both methods were equally popular to begin with, but, for many
reasons, the cycle sequencing method is the method more
commonly used today

- This method is based on the principle that single-stranded DNA


molecules that differ in length by just a single nucleotide can be
separated from one another using polyacrylamide gel
electrophoresis
- Introduction of dideoxyterminators
TERMINATORS
• Because they lack the -OH (which
allows nucleotides to join a growing
DNA strand), replication stops.

Normally, this would


be where another
phosphate
Is attached, but with no -
OH
group, a bond can not form
and replication stops
COMPONENTS

• Purified PCR product (template)


• Primer (1 per sequencing reaction)
COMPONENTS

• Thermostable DNA polymerase


• Buffer, MgCl2
• Deoxynucleoside triphosphates (dNTPs)
• Dideoxynucleoside triphosphates (ddNTPs)
– Each with a different fluorescent label
– Much smaller molar concentration than dNTPs
REACTIO
N
• Similar to a PCR reaction:
– Denature at ~96°C
– Anneal primer at ~50°C
– Extend primer at ~60°C
• Primer extension occurs normally as
long as dNTPs are incorporated
• When a ddNTP is incorporated,
extension stops
REACTIO
N
• ddNTPs lack a 3’-OH group
• Once a ddNTP is incorporated, nucleophilic attack
cannot occur, so primer extension is terminated
DNA Sequencing

The different steps in cycle sequencing


DNA Sequencing

The linear amplification of the gene in sequencing


A schematic of sequencing
DETECTION

• Labeled primer with radioactive


material
SINGLE-STRANDED DNA
OF UNKNOWN SEQUENCE
5' C T G A C T T C G A C A A
3'

RADIOACTIVELY LABELED
3' T G T T
PRIMER 5'

DNA POLYMERASE I P P P O CH 2 O BASE


dATP
H H
dGTP H H
dCTP HH
dTTP DIDEOXYNUCLEOTIDE (ddNTP)

ddATP ddCTP ddTTP ddGTP

REACTION
MIXTURES

C T G A C T T C G A C A A
GEL
ELECTROPHORESIS
ddG
AUTORADIOGRAPHY
TO DETECT

ddATP
ddG
RADIOACTIVE BANDS
ddG

ddGTP
ddCTP

ddTTP
PRODUCTS IN ddGTP REACTION

LARGER C
FRAGMENTS T
G
A READ SEQUENCE OF ORIGINAL
C SINGLE-STRANDED DNA
(COMPLEMENT OF PRIMER-
T GENERATED SEQUENCE LADDER)
T
C
SMALLER G
FRAGMENTS

Textbook: Figure 5.17


Vigilant et al. 1989
PNAS 86:9350-9354
INTRODUCTION OF
FLUORESCENT DYES

– Fluorescently labeled primer


• Four lanes
– Fluorescently labeled terminators
• Single lane
AUTOMATED
SEQUENCING
• Uses ddNTPs that are each
tagged with a different color
fluorescent dye.
• Instead of performing four
different reactions, DNA
synthesis occurs in one tube.
• The sequencing instrument
uses a light sensor to read the
gel, identifying the bases by
their different colors.

http://www.mun.ca/biology/scarr/fluorescent_dideoxy_sequencing.jpg
DETECTION

• Labeled primer → lose 10-20 bases


from primer, better quality
• Labeled nucleotide → more
sequence close to primer, not even
reads
READING THE SEQUENCE

• DNA from the sequencing reaction is


purified via ethanol precipitation
• DNA is resuspended in deionized
formamide
• Plate is loaded into the automated
sequencer
APPLIED BIOSYSTEMS PRISM 3700
(CAPILLARY, 96 CAPILLARIES)

Automatic DNA
sequencers

Applied Biosystems PRISM 377


(Gel, 34-96 lanes)

Applied Biosystems PRISM 3100


(Capillary, 16 capillaries)
COMPONENTS OF ABI
310
• Chemistry
– fluorescent dyes, matrix samples,
capillary, buffers, polymer, formamide
• Hardware
– CCD camera, laser, electrodes, pump
block, hot plate for temperature control,
autosampler
• Software
– Data collection, color separation, peak
sizing & calling, genotyping, stutter
removal
AUTOMATED SEQUENCING

• Capillary array
contains
polyacrylamide gel

• DNA fragments migrate through gel by


electrophoresis
• Separate by size
AUTOMATED
SEQUENCING
• Capillary passes through a laser
• Each dye fluoresces a different
wavelength when excited by the
laser
• Fluorescence is detected by a
CCD
ABI PRISM 310 GENETIC
ANALYZER
capillary

Syringe with
polymer solution

Injection
electrode

Outlet Autosampler
buffer tray
Inlet
buffer
CAPILLARY ELECTROPHORESIS
INSTRUMENTATION
ABI 310 ABI 3100
single capillary 16-capillary array
HOW IS INJECTION ACCOMPLISHED ON
A 310

Electrode Capillary

Capillary tip and electrode


are rinsed several times in
water and then dipped into
sample. A voltage is then
applied

Samples
COMPONENTS OF
CE
• Narrow capillary
– Fused silica (glass); diameter of 50-
100 µm; length 25-75 cm
• Two buffer vials
• Two electrodes connected to high
voltage power source
• Laser excitation source
• Fluorescence detector
• Autosampler to hold
sample tubes
Computer to control the
sample injection and

detection
DNA SEQUENCING

The separation of the sequencing fragments

To measure the sizes of the fragments, each of the four reactions would be loaded into a
separate well on a gel, and the fragments would be separated by gel electrophoresis
SEPARATIO
N
Size
Separation
ABI Prism AN
spectrograph
D
Color DETECTI
Sample
Fluorescence
Separation
ON
Separation

Capillary CCD Panel (with virtual filters)


Sample Detection
Sample
Injection
Mixture of dye-labeled
PCR products from
multiplex PCR
reaction
Sample Interpretation

Figure 13.8, J.M. Butler (2005) Forensic DNA Typing, 2nd Edition © 2005 Elsevier Academic Press
AUTOMATED
SEQUENCING
• Fluorescences are processed
into an electropherogram
• Base “calls” made by sequencing
software, but can be analyzed
manually
“virtual autorad” - real-time DNA sequence output from ABI 377

1. Trace files (dye signals) are analyzed and bases


called to create chromatograms.

2. Chromatograms from opposite strands are


reconciled with software to create double-
stranded sequence data.
DNA template 3’-TAAATGATTCC-5’
5’ 3’
Primer A
anneals AT Extension produces a series of
ATT ddNTP terminated products each
ATTT one base different in length
ATTTA
ATTTAC Each ddNTP is labeled
ATTTACT with a different color
ATTTACTA fluorescent dye
ATTTACTAA
ATTTACTAAG
ATTTACTAAGG

Sequence is read by noting peak


color in electropherogram
(possessing single base resolution)

Figure 10.5, J.M. Butler (2005) Forensic DNA Typing, 2nd Edition © 2005 Elsevier Academic Press
RAW DATA FROM THE ABI PRISM 310

(prior to separation of fluorescent dye colors)


MATRIX STANDARDS (RAW
DATA)
Filter Set C Set F

6FAM (5FAM)

TET (JOE)

HEX (NED)

ROX (ROX)
A SEQUENCE PRINT-OUT FROM A CONTROL
SAMPLE
BLAST
• Basic Local Alignment Search Tool
• Similarity Program
– Compares input sequences with all sequences
(protein or DNA) in database
– Each comparison given a score
• Degree of similarity between query (input sequence)
and sequence that it is being compared to
• Higher the score, the greater the degree of similarity
CONCLUSION

• Sequencing has become a tool that


is routinely use in clinical
laboratories
• Massively Parallel Sequencing

148 NGS –

“Next-Generation Sequencing” (NGS)
Does not use Sanger method
• Different Platforms = Different Chemistries
• Very High throughput instruments
• – >100 gigabases of DNA sequence/day

• Desktop Sized Sequencing Instruments & Beyond!

– “Next-Next Generation Sequencing”


• Scaled down
• Medium throughput
• Individual Labs vs Core Facilities

• Some food for thought:

– What will sequencing be like 5, 10, 15 years from now?


WHAT IS NEXT-GEN SEQUENCING:
SANGER SEQUENCING VS NEXT-GEN
SEQUENCING
“Single” Read System/Run (i.e. 1 DNA Fragment) “Multi” Read System/Run (i.e. Thousands of Fragments )

Radio-Labeled Fluorescently Fluorescently labeled nucleotides of many different


Nucleotides Labeled Nucleotides DNA fragments being sequenced in parallel

Reference Genome

Sequencing Reads

|
NEXT-GEN SEQUENCING COST &
TECHNOLOGY
TIMELINE…
Sequencing technology SOLiD 5500
Whole Genomes
Completion Sequenced in a
of Human Launch of 1000 day!
Genome Genomes
Project Genome Analyzer Project Ion Torrent
MiSeq
(Solexa/Illumina)
454 SOLiD
(Roche) HiSeq
(Applied Bio) Ion Proton

PacBio

2003 2005 2006 2007 2008 2009 2010 2011 2012 2013
year
~$3B $100M $1.5M $40K $10K $5K $4K $≤1K?
cost per genome

| |
NEXT-GEN SEQUENCING
PLATFORMS: SYSTEM OVERVIEW
Target Preparation
Sequencing Sequencing
Target
Format Chemistry & Imaging
Amplification
Genome Fragment &
sample add adapters

▪ A,G,C,T cycle controlled fluidics


On-bead ▪ Sequential nucleotide extension
emulsion - “pyrosequencing”
454/Roche PCR ▪ 1-color bioluminescent imaging
(Clonal) Bead in defined microwell

▪ A,G,C,T cycle controlled fluidics


Illumina Array-based ▪ Sequential nucleotide extension
“Bridge-PCR” ▪ 4-color fluorescence imaging
HiSeq/MiSeq (Clonal) Random cluster on surface ▪ Flow Cells

On-bead ▪ A,G,C,T, cycle controlled fluidics


SOLiD emulsion ▪ Sequential 8mer ligation
PCR ▪ 4-color fluorescence imaging
(Life Tech) (Clonal) Random bead on surface

▪ A,G,C,T, cycle controlled fluidics


Ion Torrent On-bead ▪ Sequential nucleotide extension
emulsion ▪ Detects H+ Ions
Ion Proton PCR ▪ Semiconductor Chip Sequencing
(Clonal) Beads in defined microwell
(Life Tech)

Next-generation platforms have common elements and workflow


| |
WHAT CAN YOU DO USING NGS
TECHNOLOGY: APPLICATIONS FOR BASIC
AND CLINICAL RESEARCH
• WGS drives initial discovery
Types of Variants
• Low throughput
Detectable using NGS Whole
Genome • Medium to low coverage
Large amplifications
• Exome: Most popular targeted
Large deletions Exome sequencing method
&
• Provide more reads in regions of
Point mutations (SNP) Targeted
most interest
DNA-seq
Insertions/Deletions • More efficient than WGS
• Medium to very high coverage
Inversions
Transcriptome
• Sequence cDNA instead of DNA
Translocations Targeted •Monitors aberrant gene expression
RNA-seq and identifies fusion and/or splicing
Copy number (CNV) miRNA-seq
events.
• Targeted RNA-seq can add further
Fusions/splice sensitivity to identify rare transcripts
variants Targeted
• Sequencing the epigenome can identify
Gene expression data Bi-Sulfite
aberrations that promote disease pathogenesis
Seq
Methylation status

| |
A BEGINNER’S NEXT-GEN
GLOSSARY: WALK THE WALK,TALK
1. THE TALK
Library Preparation (Library Prep) – The method(s) used to prepare DNA or RNA for next-generation sequencing.

2. Sequencing Library (Library) – A collection of DNA or cDNA fragments of a given size range with adapters ligated to each
end that can be run through a sequencer. Libraries can be DNA or cDNA (cDNA libraries prepared when performing RNA-seq).

3. Adapters – Oligonucleotides of a known sequence that are ligated to each end of a DNA/cDNA fragment (i.e. insert). They
provide the primer sites used for sequencing the insert.

4. Index/Barcode - Short sequences of typically 6 or more nucleotides that serve as a way to identify/label individual samples
when they are sequenced together in a single sequencing lane/chip. Barcodes are typically located within the sequencing
adapters.

5. Multiplexing – Mixing two or more different samples together such that they can be sequenced in a single sequencing lane or
chip. Samples that are to be combined, need to be barcoded/indexed prior to being mixed together.

6. Target Enrichment (Capture) – Methods to allow one to isolate and/or increase the frequency of specific genes or other
regions of interest from a DNA or cDNA library prior to being sequenced. The regions of interest are retained for sequencing
and the remaining material is washed away.

7. Baits – Common name given to the oligonucelotide sequences (i.e. probes) that are responsible for identifying and binding to
a given region of interest for performing target-enrichment.

8. In-Solution Capture – A method of performing target enrichment that requires samples to be hybridized to baits to select and
enrich the sample for the desired regions of interest.

9. Amplicon Sequencing – A method of performing target enrichment that utilizes one or more pairs of PCR primers to increase
the number of copies of the genes or other regions of interest that will ultimately be sequenced.

10. Gene Panels – Name frequently given to the selected regions of interest (this can genes or intergenic regions) that will be
captured using some form of target-enrichment technology.

| |
A BEGINNER’S NEXT-GEN
GLOSSARY: WALK THE WALK,TALK
THE TALK
11. Pre-Capture Library – Common name given to the sequencing library that is created before that library undergoes some form
of target-enrichment.

12. Post-Capture Library – Common name given to the sequencing library after it has completed some form of target-enrichment.

13. Read – Base pair information of a given length from a DNA or cDNA fragment contained in a sequencing library. Different
sequencing platforms are capable of generating different read lengths.

14. Single End Read – The sequence of the DNA is obtained from the 5’ end of only one strand of the insert. These reads are
typically expressed as 1x “y”, where “y” is the length of the read in base pairs (ex. 1x50bp, 1x75bp).

15. Paired End Read – The sequence of the DNA is obtained from the 5’ ends of both strand of the insert. These reads are
typically expressed as 2x “y”, where “y” is the length of the read in base pairs (ex. 2x100bp, 2x150bp).

16. Mate Pair Read – The sequence of the DNA is obtained similar to paired-end reads, however the size of the DNA insert is
often much greater in size (2-10kb in length) and the paired reads originate from a single strand of the DNA insert.

17. Depth of Coverage – The number of reads that spans a given DNA sequence of interest. This is commonly expressed in
terms of “Yx” where “Y” is the number of reads and “x” is the unit reflecting the depth of coverage metric (i.e. 5x, 10x, 20x,
100x)

18. Sequencing Depth – The amount of sequencing a given sample requires to achieve a certain depth of coverage. This is
frequently expressed as the number of reads a sample requires (ex. 40 million reads, 80 million reads) or the number of bases
of sequencing a sample requires (ex. 4 gigabases, 100 megabases).

19. Library Complexity – The number of unique DNA fragments contained in a sequencing library.

20. Electropherogram – A graphical representation of the size and quantity of a DNA or RNA sample run through a BioAnalyzer,
TapeStation or other instrument used for performing quality control.

21. FFPE DNA/RNA – Formalin Fixed Parafin Embedded DNA or RNA. When attempting to prepare sequencing libraries from
these sample types, modifications are often required to standard library preparation protocols to accommodate the level of
DNA/RNA degradation commonly found from samples stored using this technique.

| |
A BEGINNER’S NEXT-GEN
GLOSSARY: WALK THE WALK,TALK
THE TALK
22. Call - Referring to the identification of a given aberration detected in the sequenced sample when compared to the
reference/normal genome.

23. SNP/SNV – Referring to a Single Nucleotide Polymorphism or Single Nucleotide Variant detected in a sample.

24. CNVs – Referring to Copy Number Variation that is detected in sample.

25. InDels – One or more Insertion or Deletion event that is detected in a sample.

| |
156 CORONAVIRUS

• RNA virus
• First report 1920s
• SARS
• MERS
• Covid-19
157

Mulțumesc pentru atenție!

You might also like