Professional Documents
Culture Documents
MOLECULARĂ
SEF LUCR. DR. KOVACS ZSOLT
INTRODUCTION TO NUCLEIC ACID
BIOCHEMISTRY AND MOLECULAR
DIAGNOSTIC TECHNOLOGIES
“This structure has novel features which are of considerable
biological interest”
Watson and Crick, 1953
MILESTONES OF THE MOLECULAR
ERA
THE HUMAN GENOME
PROJECT FEBRUARY
2001
PROMISES OF THE HUMAN
GENOME
• Diagnostic - better
• Prognostic - more powerful
• Predictive - preventive
• Therapeutic – more personalized
GENOMIC TECHNOLOGIES ARE
UNIVERSAL
• Anatomic pathology
• Chemistry/Toxicology
• Genetics
• Hematopathology/Oncology
• Infectious diseases
• Transfusion medicine/Identity
THE HUMAN GENOME
1 ctccgggctg tcccagctcg gcaagcgctg cccaggtcct ggggtggtgg cagccagcgg
61 gagcaggaaa ggaagcatgt tcccaggctg cccacgcctc tgggtcctgg tggtcttggg
121 caccagctgg gtaggctggg ggagccaagg gacagaagcg gcacagctaa ggcagttcta
181 cgtggctgct cagggcatca gttggagcta ccgacctgag cccacaaact caagtttgaa
241 tctttctgta acttccttta agaaaattgt ctacagagag tatgaaccat attttaagaa
301 agaaaaacca caatctacca tttcaggact tcttgggcct actttatatg ctgaagtcgg
361 agacatcata aaagttcact ttaaaaataa ggcagataag cccttgagca tccatcctca
421 aggaattagg tacagtaaat tatcagaagg tgcttcttac cttgaccaca cattccctgc
481 agagaagatg gacgacgctg tggctccagg ccgagaatac acctatgaat ggagtatcag
541 tgaggacagt ggacccaccc atgatgaccc tccatgcctc acacacatct attactccca
601 tgaaaatctg atcgaggatt tcaactctgg gctgattggg cccctgctta tctgtaaaaa
661 agggacccta actgagggtg ggacacagaa gacgtttgac aagcaaatcg tgctactatt
721 tgctgtgttt gatgaaagca agagctggag ccagtcatca tccctaatgt acacagtcaa
781 tggatatgtg aatgggacaa tgccagatat aacagtttgt gcccatgacc acatcagctg
841 gcatctgctg ggaatgagct cggggccaga attattctcc attcatttca acggccaggt
901 cctggagcag aaccatcata aggtctcagc catcaccctt gtcagtgcta catccactac
961 cgcaaatatg actgtgggcc cagagggaaa gtggatcata tcttctctca ccccaaaaca
CENTRAL
DOGMA
DNA
Transcription
RNA
Translation
Protein
RNA SPLICING
GENE
STRUCTURE
Upstream Downstream
Promoter
3´-UTR
5´-UTR (untranslated region)
(untranslated region) Polyadenylation
Signal
THE UNIVERSAL GENETIC
CODE
First Second Position Third
Position Position
(5´-end) U C A G (3´-end)
• Structural
– gain/loss of chromosome segments
– translocations
– rearrangements
– gene amplifications
• Molecular
– deletions/insertions
– nucleotide repeats (di-, tri-)
– point mutations (RFLPs, SNPs)
HUMAN GENETIC
VARIATION
• Around 99.9% nucleotide bases are exactly the
same in all people
• The differences (genetic polymorphisms) are
what makes each individual unique (except
identical twins)
• Basic concepts:
– Locus
– Allele
– Polymorphism
– Mutation
TERMINOLO
GY
Locus: Position or location Allele: Alternative versions
of a gene or genetic of a gene at a given locus
marker on a chromosome
HOMOZYGOUS
HETEROZYGOUS
GENETIC POLYMORPHISM
Two copies
of each * *
CHROMOSOME * *
Two
ALLELES 1 A 1
of each
gene or 2 G 2
genetic locus
SEQUENCE LENGTH
POLYMORPHISM POLYMORPHISM
8 – 80 bp Repeats 2 – 7 bp Repeats
MUTATIONS: SOMATIC AND
GERMLINE
Somatic mutations Germline mutations
Occur in nongermline tissues Present in egg or sperm
Are nonheritable Are heritable
Nonheritable
DNA Exon
RNA
Protein
TYPES OF MUTATIONS
MISSENSE MUTATION
From: NEJM 347:1512-1520, 2002
SILENT
MUTATIONS
NONSENSE OR STOP
MUTATIONS
TYPES OF MUTATIONS
• Cystic fibrosis
– 70% due to F508
• Duchenne muscular
dystrophy
– ~60% cases due to
large deletions
– 8% small
deletions/insertions
TYPES OF MUTATIONS
INSERTION
FRAMESHIFT
MUTATIONS
• Insertion or deletion of nucleotides
• Reading frame is altered
THE FAT CAT ATE HIS HAT
Insert A
THE FAA TCA TAT EHI SHA T
Delete A
THE FTC ATA THE ISH AT
TYPES OF MUTATIONS
ssDNA
1 PCR Cycle
50-60 oC annealing (15-30 sec each)
Primer hybridization
72 oC extension
Polymerase/DNA synthesis
POST-PCR
ANALYSIS
PCR TESTING
STEPS
Taq1/2 =
PCR - BEFORE THE
THERMOCYCLER
THERMOCYCLE
RS
A SIMPLE THERMOCYCLING
PROTOCOL
BASIC COMPONENTS OF PCR
Buffer
Primers
AC T G dNTPs
Taq polymerase
2
DNA template
+ + MgCl
2
MgCl2 (mM)
1.5 2 3 4 5
Magnesium Chloride
(MgCl2 - usually 0.5-5.0mM)
• PCR additives
– 0.5% Tween 20
– 5% polyethylene glycol 400
– betaine
– DMSO
PRIMER
DESIGN
1. Typically 20 to 30 bases in length
2. Annealing temperature dependent upon
primer sequence (~ 50% GC content)
3. Avoid secondary structure, particularly 3’
4. Avoid primer complementarity (primer dimer)
5. The last 3 nucleotides at the 3` end is the
substrate for DNA polymerase - G or C
6. Many good freeware programs available
PRIMER DESIGN
SOFTWARE
OLIGO
PRIMER
PrimerQuest
RULES OF THUMB FOR PCR
CONDITIONS
• Add an extra 3-5 minute (longer for Hot-start Taq) to your cycle
profile to ensure everything is denatured prior to starting the PCR
reaction
DGGE
Denaturing gradient gel electro4
TGGE
Temperature gradient gel electro4
0 1
N
1600000000
1 2
2 4 1400000000
AMOUNT OF DNA
3 8 1200000000
4 16 1000000000
5 32
800000000
6 64
7 128
600000000
8 256 400000000
9 512 200000000
10 1,024 0
11 2,048 0 5 10 15 20 25 30 35 n
12 4,096
PCR CYCLE NUMBER
13 8,192
14 16,384
15
16
32,768
65,536
Ni = N0 x 2ni
17 131,072
10000000000
18 262,144 1000000000
19 524,288 100000000
AM O U N T OF D N A
20 1,048,576 10000000
21 2,097,152 1000000
22 4,194,304 100000
23 8,388,608 10000
24 16,777,216 1000
25 33,554,432 100
26 67,108,864 10
27 134,217,728 1
28 268,435,456 0 5 10 15 20 25 30 35
29 536,870,912 PCR CYCLE NUMBER
30 1,073,741,824
END POINT VS REAL-
TIME
• End Point analysis of PCR products (EtBr
stained gels, primers labeled with fluorescent
dies followed by capillary electrophoresis,
etc…) yields the same results, regardless of
the initial amount of template.
Excite Detect
Fluorescence
Fluorophore
Excitation
Emission
Wavelength
7
Do you recognize any
of these instruments? Do you recognize any
of these instruments?
LightCycler 24
REAL-TIME PCR (QPCR)
CHEMISTRIES
• Fluorescence-based
– After absorbance of certain wavelengths of
light (excitation), the fluorophore emits light at
a longer wavelength (emission)
– Fluorescence proportional to amplified product
• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:
• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:
• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:
• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:
• Scorpion® Primers:
REAL-TIME (QPCR)
CHEMISTRIES
• Molecular Beacons:
• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:
• Scorpion® Primers:
REAL-TIME (QPCR) CHEMISTRIES
• Molecular Beacons:
• Scorpion® Primers:
SCORPION® PRIMERS APPLICATION:
EGFR
SCORPION® PRIMERS APPLICATION:
EGFR
®
REAL-TIME CHEMISTRIES: TAQMAN
Energy
R Reporter
Quencher
R Q
Q
TAQMAN CHEMISTRY
HYDROLYSIS PROBE
1. During PCR, probe hybridizes
to target sequence
R 2. Probe is partially
R Q
Taq Taq displaced during extension
2a. excitation
filters
4. sample plate
22
www.biorad.com
THRESH
OLD
NTC
25
26
A CALIBRATION CURVE FOR
AN ABSOLUTE
QUANTITATION
27
HALLMARKS OF A GOOD QPCR
ASSAY
Reaction Efficiency of 100 +/- 10%
slope →
FAM
VIC
LIGHTCYCLER:
GENOTYPING
DNA MELTING TEMPERATURE
100% anneal
50% melt
100% melt
Tm (ºC)
DNA MELTING TEMPERATURE
Tm (ºC)
FACTOR V
LEIDEN
Current Applications of Real-Time: q(RT-)PCR
1980: Walter Gilbert (Biol. Labs) & Frederick Sanger (MRC Labs)
MAXAM AND GILBERT DNA
SEQUENCING
• Chemical cleavage of
phosphate backbone at
specific bases
– G
– A or G (purine-specific)
– C or T (pyrimidine-
specific)
– C
RADIOACTIVELY LABELED
3' T G T T
PRIMER 5'
REACTION
MIXTURES
C T G A C T T C G A C A A
GEL
ELECTROPHORESIS
ddG
AUTORADIOGRAPHY
TO DETECT
ddATP
ddG
RADIOACTIVE BANDS
ddG
ddGTP
ddCTP
ddTTP
PRODUCTS IN ddGTP REACTION
LARGER C
FRAGMENTS T
G
A READ SEQUENCE OF ORIGINAL
C SINGLE-STRANDED DNA
(COMPLEMENT OF PRIMER-
T GENERATED SEQUENCE LADDER)
T
C
SMALLER G
FRAGMENTS
http://www.mun.ca/biology/scarr/fluorescent_dideoxy_sequencing.jpg
DETECTION
Automatic DNA
sequencers
• Capillary array
contains
polyacrylamide gel
Syringe with
polymer solution
Injection
electrode
Outlet Autosampler
buffer tray
Inlet
buffer
CAPILLARY ELECTROPHORESIS
INSTRUMENTATION
ABI 310 ABI 3100
single capillary 16-capillary array
HOW IS INJECTION ACCOMPLISHED ON
A 310
Electrode Capillary
Samples
COMPONENTS OF
CE
• Narrow capillary
– Fused silica (glass); diameter of 50-
100 µm; length 25-75 cm
• Two buffer vials
• Two electrodes connected to high
voltage power source
• Laser excitation source
• Fluorescence detector
• Autosampler to hold
sample tubes
Computer to control the
sample injection and
•
detection
DNA SEQUENCING
To measure the sizes of the fragments, each of the four reactions would be loaded into a
separate well on a gel, and the fragments would be separated by gel electrophoresis
SEPARATIO
N
Size
Separation
ABI Prism AN
spectrograph
D
Color DETECTI
Sample
Fluorescence
Separation
ON
Separation
Figure 13.8, J.M. Butler (2005) Forensic DNA Typing, 2nd Edition © 2005 Elsevier Academic Press
AUTOMATED
SEQUENCING
• Fluorescences are processed
into an electropherogram
• Base “calls” made by sequencing
software, but can be analyzed
manually
“virtual autorad” - real-time DNA sequence output from ABI 377
Figure 10.5, J.M. Butler (2005) Forensic DNA Typing, 2nd Edition © 2005 Elsevier Academic Press
RAW DATA FROM THE ABI PRISM 310
6FAM (5FAM)
TET (JOE)
HEX (NED)
ROX (ROX)
A SEQUENCE PRINT-OUT FROM A CONTROL
SAMPLE
BLAST
• Basic Local Alignment Search Tool
• Similarity Program
– Compares input sequences with all sequences
(protein or DNA) in database
– Each comparison given a score
• Degree of similarity between query (input sequence)
and sequence that it is being compared to
• Higher the score, the greater the degree of similarity
CONCLUSION
148 NGS –
•
“Next-Generation Sequencing” (NGS)
Does not use Sanger method
• Different Platforms = Different Chemistries
• Very High throughput instruments
• – >100 gigabases of DNA sequence/day
Reference Genome
Sequencing Reads
|
NEXT-GEN SEQUENCING COST &
TECHNOLOGY
TIMELINE…
Sequencing technology SOLiD 5500
Whole Genomes
Completion Sequenced in a
of Human Launch of 1000 day!
Genome Genomes
Project Genome Analyzer Project Ion Torrent
MiSeq
(Solexa/Illumina)
454 SOLiD
(Roche) HiSeq
(Applied Bio) Ion Proton
PacBio
2003 2005 2006 2007 2008 2009 2010 2011 2012 2013
year
~$3B $100M $1.5M $40K $10K $5K $4K $≤1K?
cost per genome
| |
NEXT-GEN SEQUENCING
PLATFORMS: SYSTEM OVERVIEW
Target Preparation
Sequencing Sequencing
Target
Format Chemistry & Imaging
Amplification
Genome Fragment &
sample add adapters
| |
A BEGINNER’S NEXT-GEN
GLOSSARY: WALK THE WALK,TALK
1. THE TALK
Library Preparation (Library Prep) – The method(s) used to prepare DNA or RNA for next-generation sequencing.
2. Sequencing Library (Library) – A collection of DNA or cDNA fragments of a given size range with adapters ligated to each
end that can be run through a sequencer. Libraries can be DNA or cDNA (cDNA libraries prepared when performing RNA-seq).
3. Adapters – Oligonucleotides of a known sequence that are ligated to each end of a DNA/cDNA fragment (i.e. insert). They
provide the primer sites used for sequencing the insert.
4. Index/Barcode - Short sequences of typically 6 or more nucleotides that serve as a way to identify/label individual samples
when they are sequenced together in a single sequencing lane/chip. Barcodes are typically located within the sequencing
adapters.
5. Multiplexing – Mixing two or more different samples together such that they can be sequenced in a single sequencing lane or
chip. Samples that are to be combined, need to be barcoded/indexed prior to being mixed together.
6. Target Enrichment (Capture) – Methods to allow one to isolate and/or increase the frequency of specific genes or other
regions of interest from a DNA or cDNA library prior to being sequenced. The regions of interest are retained for sequencing
and the remaining material is washed away.
7. Baits – Common name given to the oligonucelotide sequences (i.e. probes) that are responsible for identifying and binding to
a given region of interest for performing target-enrichment.
8. In-Solution Capture – A method of performing target enrichment that requires samples to be hybridized to baits to select and
enrich the sample for the desired regions of interest.
9. Amplicon Sequencing – A method of performing target enrichment that utilizes one or more pairs of PCR primers to increase
the number of copies of the genes or other regions of interest that will ultimately be sequenced.
10. Gene Panels – Name frequently given to the selected regions of interest (this can genes or intergenic regions) that will be
captured using some form of target-enrichment technology.
| |
A BEGINNER’S NEXT-GEN
GLOSSARY: WALK THE WALK,TALK
THE TALK
11. Pre-Capture Library – Common name given to the sequencing library that is created before that library undergoes some form
of target-enrichment.
12. Post-Capture Library – Common name given to the sequencing library after it has completed some form of target-enrichment.
13. Read – Base pair information of a given length from a DNA or cDNA fragment contained in a sequencing library. Different
sequencing platforms are capable of generating different read lengths.
14. Single End Read – The sequence of the DNA is obtained from the 5’ end of only one strand of the insert. These reads are
typically expressed as 1x “y”, where “y” is the length of the read in base pairs (ex. 1x50bp, 1x75bp).
15. Paired End Read – The sequence of the DNA is obtained from the 5’ ends of both strand of the insert. These reads are
typically expressed as 2x “y”, where “y” is the length of the read in base pairs (ex. 2x100bp, 2x150bp).
16. Mate Pair Read – The sequence of the DNA is obtained similar to paired-end reads, however the size of the DNA insert is
often much greater in size (2-10kb in length) and the paired reads originate from a single strand of the DNA insert.
17. Depth of Coverage – The number of reads that spans a given DNA sequence of interest. This is commonly expressed in
terms of “Yx” where “Y” is the number of reads and “x” is the unit reflecting the depth of coverage metric (i.e. 5x, 10x, 20x,
100x)
18. Sequencing Depth – The amount of sequencing a given sample requires to achieve a certain depth of coverage. This is
frequently expressed as the number of reads a sample requires (ex. 40 million reads, 80 million reads) or the number of bases
of sequencing a sample requires (ex. 4 gigabases, 100 megabases).
19. Library Complexity – The number of unique DNA fragments contained in a sequencing library.
20. Electropherogram – A graphical representation of the size and quantity of a DNA or RNA sample run through a BioAnalyzer,
TapeStation or other instrument used for performing quality control.
21. FFPE DNA/RNA – Formalin Fixed Parafin Embedded DNA or RNA. When attempting to prepare sequencing libraries from
these sample types, modifications are often required to standard library preparation protocols to accommodate the level of
DNA/RNA degradation commonly found from samples stored using this technique.
| |
A BEGINNER’S NEXT-GEN
GLOSSARY: WALK THE WALK,TALK
THE TALK
22. Call - Referring to the identification of a given aberration detected in the sequenced sample when compared to the
reference/normal genome.
23. SNP/SNV – Referring to a Single Nucleotide Polymorphism or Single Nucleotide Variant detected in a sample.
25. InDels – One or more Insertion or Deletion event that is detected in a sample.
| |
156 CORONAVIRUS
• RNA virus
• First report 1920s
• SARS
• MERS
• Covid-19
157