Professional Documents
Culture Documents
12 25 September 2011
LABORATORY MANUAL
INDIAN INSTITUTE OF SPICES RESEARCH
(INDIAN COUNCIL OF AGRICULTURAL RESEARCH)
CALICUT 673 012, KERALA
Published by
Dr. M. Anandaraj
Director
Organized by
Dr. Johnson K. George (Course Director)
Dr. Santosh J. Eapen
Dr. Prasath (Course Coordinator)
Compiled & Edited by
Dr. Johnson K. George
P. R. Rahul
A. Chandrasekar
The manual is an in-house publication intended for training purposes only and is not for public
circulation.
Copyright 2011 IISR. All rights reserved. Reproduction and redistribution prohibited without approval.
CONTENTS
Sl.No Title Page No
1 RNA/DNA isolation 1
2 Reverse Transcription-PCR (RT-PCR) 4
3 Gel Elution Techniques 7
4 Cloning of PCR Amplified DNA (T/A cloning) & Bacterial Transformation 11
5 Plasmid isolation and restriction digestion 14
6 Sequence analysis 17
7 Agarose Gel Electrophoresis 19
8 Denaturing Polyacrylamide Gel Electrophoresis (PAGE) for nucleic acids 21
9 Silver staining of DNA Polyacrylamide gels 24
10 NBS Profiling 28
11 EcoTILLING 32
12 Promoter Mining 35
13 Tools for Genetic Diversity Analysis 38
14 RAPD and ISSR Analysis 48
15 Microsatellite (simple sequence Repeats) profiling 51
16 Multilocus Sequence Typing of bacteria 56
17 Rolling circle amplification-RACE (RCA-RACE) 64
18 Protocols in development and analysis of mutants for functional genomics 67
19 Quantitative RT-PCR 73
20 Loop mediated isothermal amplification (LAMP) 75
21 Two Dimensional Gel Electrophoresis 78
22 Bioinformatics -data mining tools, Identification of microsatellite sites, EST analysis
and annotation
83
23 Sequence - Based Marker Designing 87
Annexures
I General Conversion Tables and Formulae 88
II Gene tagging steps 89
III Bioanalyzer and Off Gel Fractionator 101
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 1
DNA/ RNA Isolation
Introduction
Any molecular biology work is basically done using the genetic material of an organism,
either DNA or RNA. Thus the isolation of a good quality DNA/RNA is essential to the
success/failure of any experiment. The main role of DNA molecules is the long-term storage of
information in the form of triplet codons containing the instructions needed to construct other
components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic
information are called genes, but other DNA sequences have structural purposes, or are involved in
regulating the use of this genetic information. Ribonucleic acids (RNA) are crucial molecules in the
central dogma of life and perform vital functions in both structural and functional roles. RNA
molecules form the bridge between the stable genetic information contained within DNA and
enzymes and proteins that carry out much of the metabolism within the cell. Many of the sites of
protein synthesis, the ribosomes within the cell, are composed of these ribonucleic acids as tRNA
molecules that deliver the amino acid building blocks to the ribosomes. Of all the RNA species, the
nucleic acid intermediate, messenger RNA, is a desirable source of material to biologists, since this
reflects much of, what ultimately, is translated into enzymes and proteins. High quality RNA is the
starting material to study the qualitative and quantitative changes in mRNA expression, in- vitro
translation, RNase protection assay, reverse transcriptase - polymerase chain reaction (RT-PCR) and
cloning. The gene specific primers can be designed based on sequence information available at the
NCBI database and can be used for the isolation of genes using RT-PCR.
1.1. DNA Isolation by modified CTAB method (Ausubel et al., 1995)
The protocol used for extraction of DNA from Piper leaf tissues is as follows,
1. Grind 5 g of young leaves in liquid nitrogen with a mortar and pestle and add 25 ml of
preheated (65C) CTAB buffer. Add 0.2% -Mercaptoethanol prior to use.
2. Incubate at 60C for 30 minutes.
3. Extract with equal volume of chloroform: isoamyl alcohol (24:1) at 10,000 rpm for 10
minutes at room temperature.
4. Take the aqueous phase and add 2/3 rd volume of ice-cold isopropanol.
5. Incubate at -20C for 2 hours and centrifuge (10,000 rpm, 15 minutes at 4C).
6. Discard the supernatant and invert the tube on paper towel for few minutes.
7. Dissolve the pellet and add 1.5 ml of TE buffer at room temperature over night.
8. Add 10 g/ml of RNase A and incubate at 37C for 30 minutes.
9. Add equal volume of Tris saturated phenol, mix it well and centrifuge at 10,000 rpm for ten
minutes.
10. To the aqueous phase add equal volume of phenol: chloroform: isoamyl alcohol,
(25:24:1), shake and centrifuge at 10,000 rpm for ten minutes.
11. Take the aqueous phase and add equal volume of chloroform: isoamyl alcohol (24:1), shake
and centrifuge at 10,000 rpm for ten minutes.
12. To the aqueous phase add one-tenth volume of 3M sodium acetate (pH 5.2) and 2.5 volumes
of ethanol and incubate at -20 for one hour or at -70
0
C for 30 minutes.
13. Centrifuge at 10,000 rpm for 10 minutes and wash the pellet in 70% ethanol (10,000 rpm for
5 minutes).
14. Air dry the pellet and dissolve in 200 l TE and estimate the yield.
1.1.2. Quantification of DNA
The amount of DNA present in the sample is estimated using UV
spectrophotometer/biophotometer/nanodrop etc which are all basically measuring the OD at 260nm.
DNA shows a clear absorbance peak at 260 nm and the value of 1.0 OD
260
is calculated equivalent to
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 2
50 g/ml. DNA solution was considered pure if the value of OD
260
: OD
280
is 1.8. Visualize the DNA
on (0.8%) agarose gel for its quality. Store the DNA at -20
0
C until further experiment.
1.2. RNA isolation using TRI-Reagent
TRI Reagent is a mixture of guanidine thiocyanate and phenol in a mono-phase solution,
which effectively dissolves RNA, DNA and protein on homogenization or lysis of tissue sample.
After adding chloroform and centrifuging, the mixture separates into 3 phases: an aqueous phase
containing the RNA, the interphase containing DNA and an organic phase containing proteins. Each
component can be isolated after separating the phases. One ml of TRI Reagent is sufficient to isolate
RNA, DNA and protein from 50-100 mg of leaf tissue.
This is one of the most effective methods for isolating total RNA and can be completed in
one hour starting with fresh tissue. The procedure is very effective for isolating RNA molecules of all
types from 0.1 to 15 kb in length. The resulting RNA is intact with little or no contaminating DNA
and protein. This RNA can be used for northern blots, mRNA isolation, in- vitro translation, RNase
protection assay, cloning and reverse transcriptase - polymerase chain reaction (RT-PCR).
Materials required
Sterile powder free nitrile gloves, refrigerated centrifuge, vortex, autoclavable polythene
covers, DEPC treated and autoclaved microfuge tubes, microtips, pestle and mortar.
Reagents required
TRI Reagent (Sigma), chloroform, iso-propanol, 75% ethanol prepared using DEPC treated
and autoclaved water, DEPC treated and autoclaved water or RNA re-suspension solution (Ambion)
to dissolve the RNA pellet, RNaseZAP.
Steps in RNA isolation
1. Grind 100mg leaf sample to fine powder using liquid nitrogen, transfer it to 1.5 ml DEPC treated,
sterile microfuge tube and add 1ml of TRI Reagent.
2. Shake vigorously for homogenous mixing of TRI reagent with the sample and keep the sample at
4C until all the samples are homogenized.
3. Incubate the samples at room temperature for 5 min, so as to ensure complete dissociation of
nucleoprotein complexes and release of RNA, mediated by guanidine thiocyanate and phenol
present in the TRI Reagent.
4. Centrifuge the samples at 12,000 rpm for 10 min at 4C. In this step, all the insoluble materials
such as cellular debris, extra cellular membranes and high molecular weight DNA (>20kb) and
most of polysaccharides are sediment at bottom of the microfuge tubes. The RNA, low molecular
weight DNA and protein are in supernatant.
5. Carefully transfer the supernatant to a fresh microfuge tube and add 200l of
chloroform for every 1 ml of TRI Reagent used in the sample preparation.
6. Shake vigorously for 15 s and incubate at room temperature for 5-10 min.
7. Centrifuge at 12,000 rpm for 15 min at 4C. The centrifugation separates the mixture into 3 phases:
a red organic phenol phase containing protein, an inter-phase containing DNA and a colorless
upper aqueous phase containing RNA.
8. Transfer the supernatant containing RNA to a fresh microfuge tube and precipitate the RNA by
adding 500 l of iso-propanol. Incubate the samples at room temperature for 5 min.
9. Centrifuge at 12000 rpm for 15 min at 4C to pellet the RNA.
10. Decant the supernatant and wash the pellet with 75 % ethanol prepared with DEPC treated sterile
water.
11. Centrifuge at 12000 rpm for 10 min at 4C to pellet the RNA.
12. Air dry the pellet for 10 min and dissolve the RNA with 50 l of DEPC treated water.
13. Check the quality of RNA in 1 % agarose gel.
14. Quantify the RNA in a spectrophotometer (260/280 nm), the 260/280 ratio should be 1.9 to 2.2,
which indicate the good quality of RNA.
Quantify the RNA using the following formula:
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 3
RNA in g/l = (40 x Dilution factor x Absorbance at 260)/1000
*A260/280 ratio should equal to 2, indicating little or no contamination of protein and
polysaccharides however, because of variation in starting materials and individual practices, the
expected ratio ranges from 1.7-2.2.
*A260/280 ratio lower than 1.7, the RNA should be purified again. In most cases this is due to
protein contamination and occurs when the aqueous phase is collected and some organic phase
comes with it.
Quality of RNA
The quality and integrity of RNA is judged by the intactness of the 25S and 18S ribosomal
RNA bands in an agarose gel (1.5%).
Notes and Precautions
1. Treat the plastic wares (micro-tips, micro-centrifuge tubes, pestle, mortar and other necessary items
with 0.1% Diethyl pyrocarbonate for overnight and autoclave it for 2 hours.
2. Use separate pipettes for RNA work.
3. Plastic gloves or powder free nitrile gloves should be worn at all times during isolation and
handling of RNA to avoid contamination of samples with RNases.
4. Perform RNA isolation in dust free environment.
5. The use of RNAzap
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 28
NBS Profiling
Ben Vosman
Plant Research International, Wageningen, Netherlands
Introduction
NBS profiling is a technique for DNA fingerprinting and expression profiling of R-genes based on
conserved motifs in the nucleotide binding domain of resistance genes in plants.
The technique involves three steps:
1. Restriction enzyme digest of (c)DNA and the ligation of adapters
2. Selective amplification of fragments using a (degenerated) primer for the conserved
domains.
3. Gel analysis of the amplified fragments
Depending on the motif and primer, 30-150 fragments can be amplified in a single PCR reaction of
which up to 95 % contain the targeted motif. Polymorphisms are based on variations in the region of
conserved domain (including absence presence of genes), mutations in the restriction sites used and
indels in the sequence between the motif-specific primer annealing site and the restriction site.
Any changes made to this protocol (including use of polymerase, minute changes in the primers,
RL mixes, PCR cycling conditions, use of PCR machines) may affect the NBS profile produced
and should be done with extreme caution and the appropriate controls.
Starting Material
1.1 Quality check and estimation of DNA Yield.
If one starts with similar amounts of plant material using the same procedure the yield will also be
similar. Dissolve DNA to an expected concentration of 200 ng/ul in TE. Load approximately 50 ngr
on a agarose gel. Using a dilution series of known quantity (add RNase to the loading buffer) estimate
the DNA concentration and dilute the DNA to a final concentration of 50 ngr/ul. The quality of the
DNA is one of the most important determinants of the quality of the NBS profile. The highest grade
of DNA quality should be pursued.
2.1 Restriction Digestion and Adaptor Ligation
In this step the DNA is digested with a restriction enzyme with a four base recognition site and
blocked adapters are ligated to the ends. The blocked adapters consist of a long oligo with a sequence
similar to the adapter primer and a short oligo that is blocked by an amino group at the 3 end. The
amino groups blocks elongation by Taq polymerase. At the start of the PCR the adapter primer can
not anneal, only when a domain specific primer anneals and is elongated the annealing site for the
adapter primer is generated. This prevents the amplification of adapter-adapter fragments.
Adapter sequences:
5 A C T C G A T T C T C A A C C C G A A A G T A T A G A T C C C A 3 (long arm)
5P T G G G A T C T A T A C T T 3-NH2 (short arm)
Prepare a mix of reagents shown in blue (always prepare approximately 10% more than needed). It is
best to pipet reagents in the order listed and to mix the solution before the enzymes are added and
after all components are added.
Components l per reaction
5 xRL+ (AFLP buffer) 12
adapter (adapted to Restriction enzyme 3
H
2
O 29
ATP 10 mM 6
Restriction enzyme (10 Units/ul) 1
Ligase (1 Unit/ul) (for blunt Enzymes: use high
concentrate ligase
1
DNA 4
Incubate for 3 hours at 37
0
C (preferable in PCR block) inactivate enzymes for 15 minutes at 65
0
C
and store at 4
o
C or at 20
o
C.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 29
Add 60 ul H
2
O to the Restriction Ligation mixture!
Note: experiments have shown that variation of the amount of DNA from 50 and 500 ng does not
result in different patterns; we recommend using 200 ng of DNA. Other experiments have shown that
dilution of the restriction ligation mixture is critical.
3. FIRST AMPLIFICATION ROUND
3.1. PCR methods
In this step the domain specific primer is annealed and elongated by the Taq polymerase resulting in
an annealing site for the adaptor primer not present previously. This in combination with the hot start
Taq ensures high specificity. Although degenerate primers have a reputation of giving variable
results, experiments show high reproducibility of this PCR (comparable to AFLP).
Prepare a mix of reagents shown in blue (always prepare approximately 10% more than needed). It is
best to pipet reagents in the order listed and to mix the solution before the enzymes are added and
after all components are added.
Adapter primer sequence:
5 G T T T A C T C G A T T C T C A A C C C G A A A G 3
Components l per reaction
PCR buffer (with 15 mM
MgCl
2
)
2.5
DNTP mix (5mM) 1
HotstarTaq polymerase
(Qiagen)
0.08
NBS specific primer 10 pMol/ul 2
Adapter primer (10 pMol/ul 2
H
2
O 12.42
Template 5
PCR conditions:
15 min 95 C
30 sec 95 C, 1.40 min at 55-60C, depending on motif-specific primer (see below),
2 min 72 C 30-35 cycles
20 min 72 C
Hold at 4
0
C
Note: HotstarTaq polymerase is only active after an incubation step of 15 minutes at 95
o
C and
therefore prevents non-specific amplification during pipetting.
Annealing temperatures of the common primers:
NBS2, NBS3: 60C
NBS1, NBS5, NBS9: 55C
3.2 VERIFICATION OF PCR AMPLIFICATION
Load 15 l of the PCR product on 1% agarose which should result in a smear with several distinct
fragments in the size range of 100-1000 bp. Patterns vary with the primer used and the DNA source.
3.3 DILUTION OF THE AMPLIFIED FRAGMENTS
Add 90 ul of H
2
O to the remainder of the sample.
4. AMPLIFICATION WITH LABELLED PRIMER
4.1 PRIMER LABELLING
Prepare a mix of reagents shown in blue (always prepare approximately 10% more than needed). It is
best to pipet reagents in the order listed and to mix the solution before the enzymes are added and
after all components are added.
Components l per reaction
T4-forward buffer 5x 0.1
Distilled water 0.19
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 30
Domain specific primer ((10
pmol/l)
0.1
T4-polynucleotide kinase (10
U/l)
0.01
| -
33
P|ATP 0.1
incubate mixture in 37 C waterbath for 1 to 16 hours.
Optional: Inactivate kinase by heating reaction mixture to 70 C for 10 minutes
4.2 PCR METHODS
Prepare a mix of reagents shown in blue (always prepare approximately 10% more than needed). It is
best to pipet reagents in the order listed and to mix the solution before the enzymes are added and
after all components are added.
Components l per reaction
PCR buffer 2
DNTP mix (5mM) 0.8
Taq polymerase 0.08
Labelled motif-specific primer 0.5
Adaptor primer (10 pMol/ul) 0.2
H
2
O 11.42
Diluted mixture first PCR 5
PCR conditions: (Perkin Elmer GeneAmp
TM
PCR system 9600)
30 sec 95 C,1.40 min 55-60C (depending on motif-specific primer), 2 min 72 C 30-35 cycles
20 min 72 C
Hold at 4
o
C
5. POLYACRYLAMIDE GEL ELCTROPHORESIS
5.1. SAMPLE PREPARATION
1. Add an equal volume of loading buffer (98% formamide, 10mM EDTA pH8.0, Bromophenol
blue and Xylene cyanol), incubate samples for 3 min to 95 C and cool samples on ice before
loading on the PAA-gel.
2. DNA samples are analyzed on a 6% polyacrylamide gel (SequaGel-6, Ready-To-Use 6%
Sequencing Gel Solution, National Diagnostics)
3. Electrophorese unit: Bio Rad sequi-Gen II(38x50 cm)
4. Gel loading using a fixed order and a tracking system for loading of samples in order to
prevent loading errors. Empty wells in the microtiter plate scheme, provide a check for
correct loading.
5. Run the samples generally for at least 3 hours (depends on size of DNA fragments) at 110 W.
6. Fix gel on Whatman 3MM paper and dry.
7. Cover dry gel with film (Kodak X-OMAT AR, 35x43 cm) and store gel and film in light-tight
cassette. Length of exposure time depends on the amount of radioactivity of the image.
8. Develop film.
STOCKS AND SOLUTIONS:
5xRL+ buffer: 50 mM Tris.HAC pH 7.5
50 mM MgAc
250 mM KAc
25 mM DTT
250 ng/ul BSA
TE buffer: 1 ml 1M Tris.HCl (pH 8.0)
20 l 0.5 M EDTA (pH 8.0)
Add MilliQ H
2
O up to 100 ml.
Adapter synthesis:
Labo
blunt
5 A C
Adapt
Add 1
Incub
Refer
Van d
Van d
Calen
Brugm
Wang
Jacob
Natio
ratory Manu
adapter:
C T C G A T
ter synthesis:
1.25 nmol eac
bate mixture f
rences:
der Linden CG
Vosman (
Theor. App
der Linden C
molecular
process. F
volume 14
nge F, van de
(2005) Res
major gene
mans B, Wou
Vossen E (
using NBS
g, M., RG Va
for plant s
Evolution
bs, M.M.J., B
Berg (2010
genetic ma
onal training o
ual
T C T C A A
3
ch of top- and
for 3 min at 90
G, D.C.A.E. W
2004) Efficie
pl. Genet. 109
CG, Smulder
evolution. I
.T. Bakker, L
43. Koeltz, Ko
er Linden CG
sistance gene
es and QTL f
uters D, van
(2008) Genet
S profiling. Th
an den Berg, G
systematics: a
Plant Syst Ev
. Vosman, V
0) A novel ap
ap. Theor. Ap
on Allele Mi
A C C C G A A
NH
2
T T C A
d bottom stran
0 C and cool
Wouters, Vira
ent targeting
9:384-393.
rs MJM & V
In: Plant spe
L.W. Chatrou
oenigstein/Ge
G, van de We
e analogues id
for disease res
Os H, Hutte
tic mapping a
heor Appl Ge
GC Van der L
a first study i
vol 276:137
V.G.A.A. Vle
pproach to loc
ppl. Genet. 12
ining 12
th
- 2
A A G T A T
A T A T C T
nd and adjust
l down slowly
ag Mihalka, E
of plant dis
Vosman B. (2
ecies-level sy
, B. Gravend
ermany. Page
eg E, Schoute
dentified thro
sistance in app
en R, van der
and transcripti
enet 117:1379
Linden & B. V
in tuber-beari
148
eeshouwers, R
cate Phytopht
20:785-796.
25
th
Sept, 201
T A G A T C C
A G G G T
volume to 75
y to room tem
Elena .Z. Koc
sease resistan
2005) Motif-
ystematics: ne
deel & P.B. Pe
291-303.
en HJ, van A
ough the NBS
ple. Theor Ap
r Linden G,
ion analyses o
91388
Vosman (200
ing Solanum
R.G.F. Visse
thora infestan
11, IISR, Cali
C C A 3
5-P
5 l.
mperature.
From Bru
chieva, M. J.
nce loci usin
-directed prof
ew perspecti
elser (eds.) R
Arkel G, Den
S-profiling me
ppl Genet. 11
Visser R, va
of resistance
08) The utility
species. Plan
er, B. Henken
ns resistance g
icut
ugmans et al.,
M. Smulders
ng NBS prof
filing: a glan
ives on patte
Regnum Vege
nanc C, Dur
ethod map clo
0: 660-668.
an Eck H, va
gene loci in p
y of NBS pro
nt Systematic
n & R.G. Va
genes on the p
31
2008
s & B.
filing.
nce at
ern &
etabile
el CE
ose to
an der
potato
ofiling
cs and
an den
potato
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 32
EcoTILLING
K.Johnson George, and I.P. Vijesh Kumar
Indian Institute of Spices Research, Marikunnu P.O., Calicut- 673012.
TILLING and EcoTILLING applications were originally designed to be used on the LICOR DNA
Analyzer, but have moved to numerous other platforms which do not require the use of dye labeled
primers. The benefits are an inexpensive platform for reverse genetics and rapid SNP discovery,
which still allows pooling of samples to increase throughput and reduce discovery bias.
EcoTILLING studies are primarily concerned with identifying informative SNPs for population
genetics, forensics, conservation and resource management work. Inclusion of too few individuals in
the discovery panel can introduce ascertainment bias. EcoTILLING allows hundreds or thousands of
individuals to be included in the discovery panel by pooling, which reduces ascertainment bias, and
allows for the discovery of the most informative SNPs for the study at hand.
During the training, we will be using Transgenomic SURVEYOR Mutation Detection Kits for
TILLING. The kit includes a mismatch-specific DNA endonuclease to scan for known and unknown
mutations and polymorphisms in heteroduplex DNA. SURVEYOR Nuclease, the key component of
the kits, is an endonuclease that cleaves DNA with high specificity at sites of base-substitution
mismatch and other distortions. The SURVEYOR Mutation Detection Kit for Standard Gel
Electrophoresis has been designed to cleave unlabeled DNA fragments at mismatched sites for
subsequent analysis by agarose gel electrophoresis or polyacrylamide gel electrophoresis (PAGE).
DNA 200 to 4,000 bp long can be analyzed using manual agarose gel electrophoresis while smaller
fragments (<1,000 bp) can be analyzed using manual polyacrylamide gel electrophoresis (PAGE).
Kit Components
1). SURVEYOR Nuclease S 2). SURVEYOR Enhancer S 3).0.15 M MgCl2 Solution 0.25 mL
4).Stop Solution 0.25 mL (Store all components at 20 C)
DNA samples from black pepper (Piper nigrum) accessions will be used in the experiments.
Step 1. PCR amplify your target fragment.
This step is criticalto the success of the surveyor nuclease digestion.
Ensure the following:
Your PCR yield is sufficiently high (>25 ng/L).
Your PCR product has low background (preferably a single species of the correct size).
Your PCR product is essentially free of primer-dimer artifacts.
It is imperative that a single PCR product be produced for efficient TILLING.
Pooling several individuals in each PCR
Pooling of samples is advantageous for several reasons:
(1) More potential heteroduplexes may be seen
(2) the number of individuals that can be surveyed at a time is increased
(3) pooling can give an indication of the frequency of the SNP site in various populations prior to
investing time and money in high-throughput genotyping.
How many samples can be pooled?
Up to 5 individual samples (~50ng/ul), 1ul each, can be pooled into a PCR reaction.
For a single 25uL PCR reaction, add:
DNA 50 ng
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 33
10X PCR Buffer 2.5 uL
50mM MgCl2 1.5 uL
10mM dNTPs 1.3 uL
SharkaTAQ 1.0 uL
H2O x uL
Total: 25.0 uL
Cycling:
95C for 3 minutes
33 cycles of:
95C for 30 seconds
57C for 30 seconds *
72C for 2 minutes**
15C soak
Depending on the primer* and expected PCR product**
Step 2. Create heteroduplexes of the PCR products:
1. Mix equal amounts of PCR products in a 0.2-mL tube (If interested in creating pooled PCR
product, individual products also may be used). For efficient annealing final volume should
be at least 10 L. The concentration of samples should be in the range of 25 to 80 ng/L and
ideally 50 ng/L. About 200 - 400 ng of hybridized DNA is recommended for treatment with
SURVEYOR Nuclease S, so that each tube should contain at least 200 ng total DNA.
2. Place the tube in a thermocycler and run the following program:
95 C 10 min
95 C to 85 C (-2.0 C/s)
85 C 1 min
85 C to 75 C (-0.3 C/s)
75 C 1 min
75 C to 65 C (-0.3 C/s)
65 C 1 min
65 C to 55 C (-0.3 C/s)
55 C 1 min
55 C to 45 C (-0.3 C/s)
45 C 1 min
45 C to 35 C (-0.3 C/s)
35 C 1 min
35 C to 25 C (-0.3 C/s)
25 C 1 min
4 C Hold.
The product is now ready to be treated with SURVEYOR Nuclease for heteroduplex analysis.
Continue with Step 3 Treatment with SURVEYOR Nuclease.
Step 3. Cleave heteroduplexes:
1. For each digestion, add the following components in the order shown to a nuclease-free 0.2-
mL tube (kept on ice):
200 to 400 ng (V = 8 to 40 L) hybridized DNA
1/10th V L 0.15 M MgCl2 Solution
1 L SURVEYOR Enhancer S
1 L SURVEYOR Nuclease S
2. Mix by vortexing gently, by agitation or by aspiration/expulsion in a pipette tip using a
micro-pipetter.
3. Incubate at 42 C for 60 min
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 34
4. Add 1/10th volume of Stop Solution and mix. Store the digestion products at 20 C if not
analyzed immediately.
Step 4. Separate cleavage products:
Samples can be separated on various platforms. Amplified DNA fragments in the size range of 200 to
4,000 bp are most effectively resolved from potential digestion products on agarose gels (2%).
Labo
and d
transc
strikin
Trans
organ
regula
other
regula
PCR
This p
nested
rigoro
Gene
Natio
ratory Manu
Ind
The DNA
directs RNA p
cription. Iden
ng interest f
scription fact
nisms. The f
atory sites, in
factors to re
ate the transcr
R Based Gen
protocol was
d PCR with a
ously the requ
eral outline
onal training o
ual
dian Institute
region, usual
polymerase to
ntification of
for biologists
tors, which a
factors can b
n the promote
egulate the tra
ription of gen
nome Walkin
adapted from
a touch down
uirements for
on Allele Mi
Prom
D. Pras
of Spices Res
lly upstream
o the correct
f transcription
s since these
are proteins,
bind to speci
er region of p
anscription o
nes.
ng
m Siebert et al
program. Thi
the design of
ining 12
th
- 2
moter Min
sath and P.R.
search, Marik
to the coding
transcriptiona
nal regulator
e elements g
play a maj
ific sites, ter
particular gen
f a gene. Tra
l. (1995) and
is method is v
f primers and
25
th
Sept, 201
ning
. Rahul
kunnu P.O., C
g sequence of
al start site a
ry elements w
govern the r
jor role in
rmed transcr
nes and inter
anscription fa
www.clontec
very prone to
the touch-dow
11, IISR, Cali
Calicut- 6730
f a gene or op
and thus perm
within promo
regulation of
gene regulat
ription factor
ract with RN
actors are sai
ch.com. The p
o artefacts so y
wn PCR prog
icut
12.
peron, which
mits the initati
oter regions
f gene expre
tion of euka
r binding sit
NA polymeras
id to coopera
protocol invo
you need to f
gram.
35
binds
ion of
is of
ession.
aryotic
tes or
se and
atively
lves a
follow
Labo
Diges
1. Cu
to ma
2. Inc
3. Ad
lay
4. Ad
80
5. Spi
sup
6. Res
7. Run
Adap
1. To
25mM
2 and
2. Use
3. Inc
4. Sto
5. Ad
for m
Natio
ratory Manu
stion of gen
t with a 6bp b
aximize gene c
2.5 g of D
5 l of rest
10l 10 x r
10 l 1mgm
dH
2
O to a
cubate for 5h a
dd 100l 25:24
yer to fresh tu
dd 2.5 volume
o
C for 1 hr.
in in microfu
pernate.
suspend pelle
n 1 l on a 1%
pter Ligatio
make the ada
M final conce
d then let coo
ed a concentr
5 l of dig
1 l 10X li
1 l T4 lig
2.4 l adap
0.6 l dH
2
cubate overnig
op the ligase a
dd 90 ul TE (1
ore than a 20
onal training o
ual
nomic DNA
blunt end cut
coverage. Set
DNA
triction enzym
restriction bu
ml-1 BSA (if
final reaction
at the appropr
4:1 phenol:ch
ube.
es ethanol and
uge for 10 at
et in 20l dH
2
% gel, 30V fo
on
aptor, mix the
entration [e.g.
l to room tem
rated T4 DNA
gested DNA
igase buffer
gase
ptor (25mM)
0
ght at 16
o
C
activity by inc
10mM Tris pH
0 PCR reactio
on Allele Mi
tter. Can eithe
t up reaction:
me
uffer
f not already i
n volume of 1
riate tempera
hloroform: IA
d 0.1 volume
t full speed.
2
O.
or 5 hours. A
e long and sh
20l long pr
mp.
A ligase (5x)
cubating the r
H7.5, 1mM E
ons.
ining 12
th
- 2
er use one en
in 10 x buffer
00l.
ture.
AA. Vortex an
3M NaOAc p
Wash pellet
good smear s
hort primers (s
rimer (50mM
, (Biolabs, M
reactions at 70
EDTA). This g
25
th
Sept, 201
nzyme or set u
r)
nd spin for 5
pH4.5. Precip
with 70% eth
should be seen
see below) in
M) + 20l sho
M0202T) and s
0
o
C for 5
gives you 100
11, IISR, Cali
up 4 different
in microfuge
pitate at 20
o
C
hanol, spin a
n.
n the right con
ort (50mM)].
set up the foll
0l of a librar
icut
t enzyme reac
e. Remove aq
C overnight o
again and dra
ncentration to
Place at 100
o
lowing reactio
ry that can be
36
ctions
queous
or at
ain off
o get a
o
C for
on:
e used
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 37
Touch down PCR
Two nested reactions need to be carried out using the AP1 and AP2 primers illustrated on the
previous page and two gene specific primers. The gene specific primers need to be designed
(www.clontech.com)
1. Set up following reaction using an expand or other enhanced polymerase
1 l adapter ligated library
5 l 10 X PCR buffer
4 l 25 mM MgCl2
1 l 10 mM dNTPs
1 l AP1 Primer (10M)
1 l Gene specific primer 1 (10M)
0.5 l DNA Polymerase
35.5 l dH
2
O
50 l TOTAL
2. Cycle as follows:
94
o
C (25s), 72
o
C (3) X 7
94
o
C (25s), 67
o
C (3) X 32
67
o
C (7) X 1
cool to 4
o
C.
3. Analyze 8l of the reaction on a 1.5% agarose gel. You should observe banding patterns however
there may be some smearing.
4. Dilute 1l of each primary PCR into 49 l dH
2
O.
5. Set up nested reaction mix:
1 l diluted primary PCR reaction
5 l 10 X PCR buffer
4 l 25 mM MgCl2
1 l 10 mM dNTPs
1 l AP2 Primer (10M)
1 l Gene specific primer 2 (10M)
0.5 l DNA polymerase
35.5 l dH
2
O
50 l TOTAL
6. Cycle as follows:
94
o
C (25s), 72
o
C (3) X 5
94
o
C (25s), 67
o
C (3) X 20
67
o
C (7) X 1; cool to 4
o
C.
7. Analyze 5l of the reaction on a 1.5% agarose gel. You should observe distinct banding patterns.
The remainder of the PCR reaction can then be used to clone and sequence the band of interest.
Selected references:
Siebert et al. 1995. An improved PCR method for walking in uncloned genomic DNA , Nucleic acid
research, 23(6):1087-1088.
www.clontech.com
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 38
TOOLS FOR GENETIC DIVERSITY ANALYSIS
Rajesh M.K
.*, 1
and Jayasekhar S.
2
1
Division of Crop Improvement,
2
Division of Social Sciences
Central Plantation Crops Research Institute, Kasaragod 671124, Kerala
(*E-mail: mkraju_cpcri@yahoo.com)
Introduction
Plant genetic resources constitute the chief component of agro-biodiversity and comprise of
land races, modern cultivars and obsolete varieties, breeding lines and genetic stocks and wild
species. They provide the basic materials to the plant breeders to utilize genetic variability for the
development of high yielding cultivars with a broad genetic base. The utilization of these genetic
resources, however, depends upon their efficient and adequate characterization and evaluation, which
in turn entails efficient characterization standards and appropriate strategies.
Analysis of trait data generated from characterization and evaluation of the genetic resources
is used to understand and use diversity. Currently, a large number of distance measures are available
for analyzing similarity/dissimilarity among accessions based on different traits representing different
types of variables. The selection of the most appropriate distance measure for each trait is the
prerequisite for diversity analysis studies. One of the approaches is to form clusters where accessions
between clusters would be more diverse than the accessions within a cluster. The clustering
algorithms require a distance/similarity matrix between the accessions which can be calculated
depending upon the nature or type of traits such as morphological and agronomic traits and/or
molecular markers.
The availability of cost-efficient, large scale genotyping techniques has greatly facilitated the
assessment of genetic diversity within populations. Various computational tools have also been
developed concurrently to analyze the genetic data derived from the genotyping experiments. In this
review, the basics of population genetics, important parameters in genetic diversity analysis and the
most widely used computer programmes in population genetic studies have been described.
Basics of population genetics
Variation in alleles allows organisms to adapt to ever-changing environments. Alleles are
different forms of the same gene that are expressed as different phenotypes. All of the alleles shared
by all of the individuals in a population make up the population's gene pool. In diploid organisms,
every gene is represented by two alleles, one inherited from each parent. The pair of alleles may differ
from one another, in which case it is said that the individual is "heterozygous" for that gene. If the
two alleles are identical, it is said that the individual is "homozygous" for that gene.
Population genetics is the study of allele frequency distribution and change under the
influence of the four main evolutionary processes: natural selection, genetic drift, mutation and gene
flow. It also takes into account the factors of population subdivision and population structure and
attempts to explain such phenomena as adaptation and speciation.
Based on Mendelian genetics, it is possible to predict the probability of the appearance of a
particular allele in an offspring when the alleles of each parent are known. Similar predictions can be
made about the frequencies of alleles in the next generation of an entire population. By comparing the
predicted or "expected" frequencies with the actual or "observed" frequencies in a real population,
one can infer a number of possible external factors that may be influencing the genetic structure of the
population (such as inbreeding or selection).
A population is defined as a group of interbreeding individuals that exist together at the same
time. A population may either be considered as a single unit or it can be subdivided into smaller units.
Subdivisions of a population may be the result of ecological factors or behavioural factors. If a
population is subdivided, the genetic links among its parts may differ, depending on the real degree of
gene flow taking place.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 39
A population is considered structured if:
(i) Genetic drift is occurring in some of its subpopulations,
(ii) Migration does not happen uniformly throughout the population, or
(iii) Mating is not random throughout the population.
A populations structure affects the extent of genetic variation and its patterns of distribution.
Genetic drift
Genetic drift refers to fluctuations in allele frequencies that occur by chance (particularly in
small populations) as a result of random sampling among gametes, i.e. random changes in gene
frequency which are not due to selection, gene mutation or migration. Genetic drift decreases
diversity within a population because it tends to cause the loss of rare alleles, reducing the overall
number of alleles. Because of genetic drift, small, isolated populations often have unusual frequencies
of a few alleles.
Gene flow
Gene flow is the passage and establishment of genes typical of one population in the gene
pool of another by natural or artificial hybridization and backcrossing. Non-random mating occurs
when individuals those are more closely (inbreeding) or less closely related mate more often than
would be expected by chance for the population. Self-pollination or inbreeding is similar to mating
between relatives. It increases the homozygosity of a population and its effect is generalized for all
alleles. Inbreeding per se does not change the allelic frequencies but, over time, it leads to
homozygosity by slowly increasing the two homozygous classes.
Mutations could lead to occurrence of new alleles, which may be favourable or deleterious to
the individuals ability to survive. If changes are advantageous, then the new alleles will tend to
prevail by being selected in the population. The effect of selection on diversity may be:
(i) Directional, where it decreases diversity;
(ii) Balancing, where it increases diversity. Heterozygotes have the highest fitness, so selection
favours the maintenance of multiple alleles; and
(iii) Frequency dependent, where it increases diversity. Fitness is a function of allele or
genotype frequency and changes over time.
Migration
Migration implies not only the movement of individuals into new populations but that this
movement introduces new alleles into the population (gene flow). Changes in gene frequencies will
occur through migration either because more copies of an allele already present will be brought in or
because a new allele arrives. Various factors which affect migration in crop species include breeding
system, sympatry with wild and/or weedy relatives, pollinators, and seed dispersal. The immediate
effect of migration is to increase a populations genetic variability and, as such, helps increase the
possibilities of that population to withstand environmental changes. Migration also helps blend
populations and prevent their divergence.
Hardy-Weinberg Principle
The foundation for population genetics was laid in 1908, when Godfrey Hardy and Wilhelm
Weinberg independently published which is known as the Hardy-Weinberg Equilibrium or Hardy-
Weinberg Principle, which states: "In a large, randomly breeding (diploid) population, allelic
frequencies will remain the same from generation to generation; assuming no unbalanced mutation,
gene migration, selection or genetic drift." When a population meets all of the Hardy-Weinberg
conditions, it is said to be in Hardy-Weinberg equilibrium. The "equilibrium" is a simple prediction of
genotype frequencies in any given generation, and the observation that the genotype frequencies are
expected to remain constant from generation to generation as long as several simple assumptions are
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 40
met. This description of stasis provides a counterpoint to studies of how populations change over
time.
Testing for Hardy-Weinberg Equilibrium
The deviation of a population from Hardy-Weinberg equilibrium is an indication of the intensity of
external factors and can be determined by a statistical formula called a chi-square, which is used to
compare observed versus expected outcomes. The statistical test follows this formula:
HWT
=
(O
i-
E
i
)
2
/E
i
Where HWT = Statistical test for Hardy-Weinberg Equilibrium; Oi = Observed frequencies and Ei =
Expected frequencies
If X
2
cal
X
2
tab
, then H
0
hypothesis is accepted and it follows that allele frequencies for loci in a given
population are HWT equilibrium. If X
2
cal
X
2
cal
, then H
0
hypothesis is rejected.
Important parameters in genetic diversity analysis
(A) Polymorphism or rate of polymorphism: A polymorphic gene is usually defined as one for
which the most common alleles has a frequency of less than 0.95.
Pj = q 0.95
Where, Pj = rate of polymorphism and q = allele frequency
For a correct estimation of genetic distance, the genetic loci use in genetic distance analysis
should be informative, i.e., they should display sufficient polymorphism. The limit of allele
frequency, which is set at 0.95, is arbitrary, its objective being to help identify those genes in which
allelic variation is common. Rare alleles are defined as those with frequencies of less than 0.005.
This index is best applied with codominant markers. It can also be used with dominant
markers too, but restrictively, as the estimate based on dominant markers would be biased below the
real number.
(B) Average number of alleles per locus: It is the sum of all the detected alleles in all loci,
divided by the total number of loci. This parameter, which provides complementary information
to that polymorphism, is given by:
N= ( ) k / 1
=
k
i 1
n
i
Where: k = Number of loci and n
i
= Number of alleles detected by locus
This parameter is best applied in the case of codominant markers as dominant markers do not permit
the detection of all alleles.
(C) Effective number of alleles: This measure, which explains about the number of alleles that
would be expected in a locus in each population, is given by:
A
e
= 1/(1 h) = 1/p
i
2
Where, pi = frequency of the i
th
allele in a locus and h = 1 p
i
2
= heterozygosity in a locus.It
ranges from 0 to 1. It can be calculated for both dominant and co-dominant markers. By taking
allele frequencies into account, this descriptor of allelic richness is less sensitive to rare alleles.
This parameter plays a fundamental role in verification of sampling strategies. However, its
calculation is affected by the sample size.
(D) Observed Heterozygosity: A population's heterozygosity is measured by first
determining the proportion of genes that are heterozygous and the number of individuals that are
heterozygous for each particular gene. For a single gene locus with two alleles, the Observed
Heterozygosity (H
o
) is calculated as follows:
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 41
H
o
= Number of heterozygotes at a locus
Total number of individuals surveyed
Derivations of the above formula are used to calculate the H
O
when there are more than two
alleles for a particular locus, which is particularly common when microsatellite or simple sequence
repeat (SSR) markers are applied for analysis of populations.
(E) Expected Heterozygosity: The Expected Heterozygosity (H
e
) is defined as the estimated
fraction of all individuals that would be heterozygous for any randomly chosen locus. It is the
probability that, at a single locus, any two alleles, chosen at random from the population, are different
to each other. For a locus j with I alleles, It is calculated as:
h
j
= 1 p
i
2
Where, h
j
= heterozygosity per locus and p = allele frequencies
H
e
differs from the H
o
because it is a prediction based on the known allele frequency from a sample of
individuals. Deviation of the observed from the expected can be used as an indicator of important
population dynamics.
(F) Effective Population Size: One of the many variables of population dynamics that can
influence the rate and size of fluctuation in allele frequencies is population size. Genetic drift, the
random increase or decrease of an allele's frequency, affects small populations more severely than
large ones, since alleles are drawn from a smaller parental gene pool. The rate of change in allele
frequencies in a population is determined by the population's effective population size. The effective
population size is the number of individuals that evenly contribute to the gene pool.
The actual number of individuals in a population is rarely the effective population size. This
is because some individuals reproduce at a higher rate than others (have a higher fitness), the
distribution of males and females may result in some individuals being unable to secure a mate, or
inbreeding reduces the unique contribution of an individual. The effective population size is a
theoretical measure that compares a population's genetic behavior to the behavior of an "ideal"
population. As the effective population size becomes smaller, the chance that allele frequencies will
shift due to chance (drift) alone becomes greater.
(G) Shannon index: Estimates Shannons Information Index as a measure of gene diversity. It is
based on information theory and is a measure of the average degree of "uncertainty" in predicting to
what species an individual chosen at random from a collection of S species and N individuals will
belong. This average uncertainty increases as the number of species increases and as the distribution
of individuals among the species becomes even. The proportion of species i relative to the total
number of species (p
i
) is calculated, and then multiplied by the natural logarithm of this proportion
(lnp
i
) in order to obtain the Shannons Index (H).
H=-
=
S
i 1
(p
i
In p
i
)
It can be shown that for any given number of species, there is a maximum possible H, H
max
= lnS
which occurs when all species are present in equal numbers. When Shannon index is near 1, it can be
concluded that the population is highly heterozygous.
(H) Inbreeding and Relatedness: Small effective population size can result in a high occurrence
of inbreeding, or mating between close relatives. One of the effects of inbreeding is a decrease in the
heterozygosity (increase in homozygosity) of the population as a whole, which means a decrease in
the number of heterozygous genes in the individuals. This effect places individuals and the population
at a greater risk from homozygous recessive diseases that result from inheriting a copy of the same
recessive allele from both parents. The impact of accumulating deleterious homozygous traits is
called inbreeding depression - the loss in population vigor due to loss in genetic variability.
Wright (1951) developed a set of parameters called F-statistics. The inbreeding coefficient
(F
IS
) defined as the probability that two homologous (same) alleles present in the same individual are
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 42
identical by descent. F
IS
is calculated by comparing the expected heterozygosity (H
e
) with observed
heterozygosity (H
o
), and ranges from -1 (no inbreeding) to +1 (complete identity). If the values for
both observed and expected heterozygosity are the same, F
IS
will be zero. A positive value indicates
that there is an increased number of homozygotes, and population may be inbred - the larger the
number, the greater the extent of inbreeding. A negative value indicates that there are more
heterozygous individuals than would be expected; this might happen for the first few generations after
two previously isolated populations become one.
The relationships among the F statistics can be deduced through the following:
(1 - F
IT
) = (1 F
IS
)(1 F
ST
)
F
IT
= 1 (H
I
/H
T
)
F
IS
= 1 (HI/HS)
F
ST
= 1 (H
S
/H
T
)
Where, H
T
= total gene diversity or expected heterozygosity in the total population as estimated from
the pooled allele frequencies, HI = intrapopulation gene diversity or average observed heterozygosity
in a group of populations, and H
S
= average expected heterozygosity estimated from each
subpopulation.
These statistical indices measure:
F
IS
= the deficiency or excess of average heterozygotes in each population
F
ST
= the degree of gene differentiation among populations in terms of allele frequencies
F
IT
= the deficiency or excess of average heterozygotes in a group of populations
The chi-square test can be used to statistically analyze whether the difference between the
observed and expected is not likely due to chance. If there is a significant increase in the expected
number of heterozygotes, inbreeding can be ruled out as a possible population dynamic that is
influencing the genotype frequencies.
Corrections for Sampling Error:
There are two sources of allele frequency difference among subpopulations in a sample:
(i) Real differences in the allele frequencies among our sampled subpopulations (ii) Differences that
arise because allele frequencies in our samples differ from those in the subpopulations from which
they were taken.
Nei and Chesser (1983) described the G
ST
approach to account for the sampling error. G
ST
is
an interpopulation differentiation measure when multiple loci are used for analysis. It measures the
proportion of gene diversity that is measured among populations, when a large number of loci are
sampled.
G
ST
= D
ST
/ H
T,
where, D
ST
= interpopulation diversity,
H
T
= total diversity (H
S
+ D
ST
),
Hs = intrapopulation genic diversity, and
D
ST
= H
T
H
S
.
Because of the complexity of its components, calculation of G
ST
requires specialized
computer software. It can be used with codominant markers and restrictedly with dominant markers,
since it is a measure of heterozygosity. Weir and Cockerham (1984) described another statistic, u,
which incorporates an important source of sampling error ignored by G
ST
.
Measurement of genetic distance
Various genetic distance measures have been proposed for analysis of molecular marker data,
depending on whether the markers are dominant or co-dominant. For dominate markers, the total
number of bands is conventionally set as the number of analyzed loci. For co-dominant markers,
genetic similarity between two individuals number of alleles per locus determined for total collection,
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 43
is in general higher than two, Opposite to the 1- and 0- allele for dominant markers. Generally,
genetic distance in codominant markers are based on allele frequencies.
If we assume that a = 3, b = 1, c = 3 and d = 2 then:
(i) Dice and Nei and Li: a/[a + (b + c)/2] =0.6
(ii) Jaccard: a/(a + b + c) = 0.49
(iii) Sokal and Sneath: a/[a +2(b + c)] = 0.273
(iv) Roger and Tanimoto : (a + d)/[a + d + 2(b + c)]= 0.385
The Jaccard coefficient only count bands present for either individual and treats double
absences as missing data. If false-positive or false negative data occur, the index estimate tends to be
biased. It can be applied with co-dominant marker data. Nei and Li coefficient counts the percentage
of shard bands among two individuals and gives more weight to those bands they are present in both.
It considers that absence has less biological significance, and so this coefficient has complete
meaning in terms of DNA similarity. It can be applied with codominant marker data (RFLP, SSR).
Multivariate analysis
One of the main concerns of plant breeders is to quantify the degree of dissimilarity in
genetic resources, since knowledge concerning genetic distances is necessary for optimum
organization of gene banks and for identifying parental combinations that produce progenies with
maximum genetic variability, thereby increasing the chances of obtaining superior individuals
(Mohammadi and Prasanna, 2003). Use of multivariate statistical algorithms is considered an
important strategy to quantify genetic similarity. Multivariate analysis is based on the statistical
principle of multivariate statistics, which involves observation and analysis of more than one
statistical variable at a time. Multivariate techniques permit standardization of multiple types of in-
formation of a set of characteristics. The most widely used algorithms are principal component and
canonical variable analysis, as well as clustering methods
The principle of clustering methods is to join genotypes into groups, so that there is
uniformity within and heterogeneity among groups. These methods depend on previous estimates of
dissimilarity measures derived from discrete and continuous (or categorical) variables. These
categorical variables can be defined as binary, nominal or ordinal. Among grouping methods,
hierarchical clustering has been used most frequently, particularly the single linkage (SL) and
unweighted pair group method using arithmetic averages (UPGMA) methods. The reliability of
clustering methods depends on the magnitude of the cophenetic correlation, which is the association
between the genetic distance matrix and the matrix based on genotype grouping. SL consider absence
corresponds to homozygous loci, it can be used with dominate marker (RAPD, AFLP) because
absence could corresponds to homozygous recessives. UPGMA is most commonly method for cluster
analysis, UPGMA can only be used when the evolutionary rate is nearly same for all groups included
in the study, when studying the genetic diversity of germplasm collection, SL method should be
preferred above the UPGMA clustering method, because genetic difference among accessions in
germplasm are dominantly determined by selection and breeding rather than by evolutionary forces.
Resampling is a term used in statistics for bootstrapping and permutation these procedures
can be used in genetic diversity studies to assign confidence to the presence of clusters in a
dendrogram. Bootstrapping is a statistical method for estimating the sampling distribution of an
estimator by sampling with replacement from the original sample, major purpose of bootstrapping is
deriving robust estimates of standard errors and confidence intervals of population parameters. A
permutation test is type of statistical significant test in which a reference distribution is obtained by
calculating all possible values of the test statistic under rearrangements the tables on the observed
data points.
Steps involved in analysis of molecular marker data
Three main steps are involved in the statistical analysis of molecular data in diversity studies:
A. Data collection: The data on molecular markers is recorded in the following two forms:
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 44
(a) Binary data: presence or absence of molecular marker bands
(b) Allelic data (based on allele size)
B. Data analysis using univariate and multivariate statistical approaches
C. Interpretation of the data.
Each step in the process should follow a standardized format if the output of one diversity
study is to be compared to other studies and inferences drawn in this manner.
Software programs for analyzing genetic diversity
Many software programs for molecular population genetics studies have been developed; the
important ones are given below:
(i) CONVERT (http://www.agriculture.purdue.edu/fnr/html/faculty/rhodes/
students%20and%20staff/glaubitz/software.htm)
CONVERT is a user-friendly, 32-bit Windows program that aids conversion of diploid
genotypic data files into formats that can be directly read by a number of commonly used population
genetic computer programs: GDA, GENEPOP, ARLEQUIN, POPGENE, MICROSAT, PHYLIP and
STRUCTURE (Glaubitz, 2004). In addition, CONVERT can be used to produce a table of allele
frequencies in a convenient format, allowing the visual comparison of allele frequencies across
populations. The input file for CONVERT follows a 'standard' format that can be easily obtained via an
EXCEL file containing the genotypic data. CONVERT can also read in input data files in GENEPOP
format. CONVERT works on Windows 95/98/NT/2000/XP platforms.
(ii) ARLEQUIN (http://cmpg.unibe.ch/software/arlequin/)
Released first in 1997, Arlequin is a freely available integrated population genetics software
environment (Schneider et al., 1997). It is able to handle both large samples of molecular data
(RFLPs, DNA sequences, microsatellites) and also conventional genetic data (standard multi-locus
data or allele frequency data). Molecular data can be entered as DNA sequences, RFLP haplotypes,
microsatellite profiles, or multilocus haplotypes. The graphical interface is designed to allow users to
rapidly select the different analyses they want to perform on their data.
The data format is specified in an input file. The user can create a data file from scratch,
using a text editor and appropriate keywords, or use the Project Outline Wizard. Data can be
imported from files created for other programs, including MEGA, BIOSYS, GENEPOP, and
PHYLIP. Missing or ambiguous data can be included. A very detailed user manual is available, which
includes a large amount of theoretical information, formulae, and references. A large number of data
can be analysed, and a Batch Files option is also available
(iii) POWERMARKER (http://statgen.ncsu.edu/powermarker/)
PowerMarker was designed specifically for the use of SSR/SNP data in population genetics
analyses (Liu, 2003). Data can be imported from Excel or other formats, making data set-up very
easy. Data can also be exported to NEXUS and Arlequin formats. It includes a 2D viewer for
linkage disequilibrium visualization. The user can edit graphics within PowerMarker or export them
for publication. The program has been tested extensively for accuracy and efficiency. Full
documentation is included. Several new modules for association study are included in the package.
Several demonstration datasets are available to get started. The program is free, but requires having
PHYLIP, TreeView and the Microsoft.net framework system (all freely available) and Excel 2000
(not free). Another disadvantage is that it is available only for Windows 98 and above (not for
Macintosh or other systems).
(iv) PAUP (http://paup.csit.fsu.edu/)
PAUP is widely used for inferring and interpreting evolutionary trees (Swofford, 2002). It
originally meant Phylogenetic Analysis Using Parsimony, but now has many other options. Although
not free, it is relatively inexpensive and available from Sinauer Associates, Sunderland, MA. A new
version, 4.0 beta, has been released as a provisional version. Macintosh, PowerMac, Windows and
Unix/OpenVMS versions are available; the Mac version has some extra features. The Windows
version runs as a GUI application, however, unlike the Macintosh version, most options are
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 45
command-line-driven. The advantage to running PAUP under Windows is that a scrollback display
buffer is built into the program, an editor is provided, and commands are remembered between
sessions (they can be recalled, edited, etc.). It is closely compatible with MacClade (another program
available from Sinauer), since they use a common data format (NEXUS).
(v) MEGA (http://www.megasoftware.net/)
MEGA (Molecular Evolutionary Genetics Analysis) software has been widely used since its
creation in 1993. It uses DNA sequence, protein sequence, evolutionary distance or phylogenetic tree
data. It is an integrated tool for conducting automatic and manual sequence alignment, inferring
phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing
evolutionary hypotheses (Kumar et al., 2008). Although it was designed for the Windows platform, it
runs well on Macintosh with a Windows emulator, Sun workstation (with SoftWindows95) or Linux
(with Windows by VMWare). Online, a thorough manual is available, together with a bulletin board
to interact with other users.
(vi) GENEPOP (http://genepop.curtin.edu.au/)
Genepop is a population genetics software package, which has options for the following
analysis: Hardy-Weinberg equilibrium, linkage disequilibrium, population differentiation, effective
number of migrants, F
st
or other correlations (Raymond and Rousset, 1995). Genepop can be used
either as a DOS-version or a Web-version. The web-version is easy to use: after choosing an option
for the analysis, the data is typed or pasted into the text window provided and the results are obtained
either by email or by viewing the output via the Web.
(vii) POPGENE (http://www.ualberta.ca/~fyeh/popgene_download.html)
POPGENE is a user-friendly window-based computer package for the analysis of genetic
variation among and within natural populations using co-dominant and dominant markers and
quantitative traits (Yeh and Boyle, 1997). This package provides the Windows graphical user
interface that makes population genetics analysis more accessible for the casual computer user and
more convenient for the experienced computer user. The current version is designed specifically for
the analysis of co-dominant and dominant markers using haploid and diploid data. It performs most
types of data analysis encountered in population genetics and related fields. It can be used to compute
summary statistics (e.g., allele frequency, gene diversity, genetic distance, F-statistics, multilocus
structure, etc.) for (a) single-locus, single populations; (b) single-locus, multiple populations; (c)
multilocus, single populations and (d) multilocus, multiple populations. The latest version also
includes the module for quantitative traits.
(viii) GDA (http://hydrodictyon.eeb.uconn.edu/people/plewis/software.php)
GDA (Genetic Data Analysis) is a programme written by Lewis and Zaykin (1999). It
computes linkage and Hardy-Weinberg disequilibrium, some genetic distances, and provides method-
of-moments estimators for hierarchical F-statistics.
(ix) GenAlEx (http://www.anu.edu.au/BoZo/GenAlEx/)
GenAlEx (' Genetic Analysis in Excel') is a user-friendly cross-platform package for
population genetic analysis that runs within Microsoft Excel (Peakall and Smouse, 2006). GenAlEx
enables population genetic data analysis of codominant, haploid and binary genetic data providing
analysis tools applicable to plants, animals and microorganisms. It has tools for importing, editing
and manipulating raw genotype and sequence data from automated sequencing or genotyping
software. New 2D spatial autocorrelation procedures have been incorporated in addition to the
existing wide range of spatial analysis options. Pairwise relatedness among individuals can be
estimated. There are tools for genetic tagging applications, including location of matching genotypes
and calculation of probabilities of identity. Data export options to a host of other population genetic
software packages are also available.
(x) TFGPA (http://www.marksgeneticsoftware.net/tfpga.htm)
TFGPA (Tools for Population Genetic Analyses) is a Windows program for the analysis of allozyme
and molecular population genetic data (Miller, 1997). This program calculates descriptive statistics,
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 46
genetic distances, and F-statistics. It also performs tests for Hardy-Weinberg equilibrium, exact tests
for genetic differentiation, Mantel tests, and UPGMA cluster analyses. Additional features include the
ability to analyze hierarchical data sets as well as data from either codominant markers such as
allozymes or dominant markers such as AFLPs or RAPDs.
(xi) STRUCTURE (http://pritch.bsd.uchicago.edu/structure.html)
The program structure is a free software package for using multi-locus genotype data to
investigate population structure. Its uses include inferring the presence of distinct populations,
assigning individuals to populations, studying hybrid zones, identifying migrants and admixed
individuals, and estimating population allele frequencies in situations where many individuals are
migrants or admixed. It can be applied to most of the commonly-used genetic markers, including
SNPs, microsatellites, RFLPs and AFLPs. The basic algorithm was described by Pritchard et al.
(2000).
Useful internet resources
The following are a list of Internet resources containing links to useful information pertaining
to genetic diversity analysis, population genetics and other software available:
(i) An alphabetical list of genetic analysis software from the North Shore LIJ Research
Institute (http://linkage.rockefeller.edu/soft/list1.html) contains a list of 520 programmes.
Computer software on genetic linkage analysis for human pedigree data, QTL analysis for
animal/plant breeding data, genetic marker ordering, genetic association analysis, haplotype
construction, pedigree drawing, and population genetics are included here.
(ii) Phylogeny Programs (http://evolution.genetics.washington.edu/ phylip/software.html)
contains links to 365 phylogeny packages and 51 free web servers. Updates to these pages are
made monthly. Many of the programs in these pages are available on the web, and some of
the older ones are also available from ftp server machines. The programs listed below include
both free and non-free ones. The packages are sorted in various ways (e.g. by methods,
system used, analyzing particular kind of data, most recent etc.).
(iii) Maize Genetics site (http://www.maizegenetics.net/bioinformatics) from Cornells Institute of
Genomic Diversity contains freely available software programme to evaluate linkage
disequilibrium, nucleotide diversity, and trait associations
(iv) The European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EBI) site
(http://www.ebi.ac.uk/) contains links to many useful programs and other sites.
(v) Mathematical Genetics and Bioinformatics Site, University of Chicago
(http://mathgen.stats.ox.ac.uk/software.html)
(vi) Statistical genetics and Bioinformatics Site, North Carolina State University
(http://statgen.ncsu.edu/brcwebsite/software_BRC.php) contains softwares for genetic data
analysis developed and made available by researchers at or affiliated with the Bioinformatics
Research Centers.
Conclusion
The analysis of genetic diversity within a species is imperative for gaining an insight into the
process of evolution of the species at the population level. Many statistical packages and computer
programmes are currently available for analyzing molecular data for assessment of genetic diversity.
Most programs perform similar tasks and many of them are freely downloadable from the internet.
The programmes, however, differ from each other in the type of marker they can handle, the manner
in which the raw data is formatted and also in how the users select the details of the computations to
be performed. Many of these programmes use a specific data-file format, but several of these
programmes offer the possibility to read or write data from, or to, other file formats. Many of these
programmes possess user-friendly and sophisticated graphical interfaces which helps the users to
easily select the type of analyses to be performed and to set up computational parameters. Currently,
researchers are directing their efforts on development of newer programmes using more specialized
methodologies.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 47
References
Glaubitz, J.C. (2004) convert: A user-friendly program to reformat diploid genotypic data for
commonly used population genetic software packages. Molecular Ecology Notes, 4: 309-310.
Liu, J. (2003) PowerMarker: New Genetic Data Analysis Software, Version 3.0. Free program
distributed by author over Internet.
Kumar, S., J. Dudley, M. Nei and K. Tamura (2008) MEGA: A biologist-centric software for
evolutionary analysis of DNA and protein sequences. Briefings in Bioinformatics, 9: 299-306.
Miller, M.P. (1997) Tools for population genetic analysis (TFPGA) 1.3:AWindows program for the
analysis of allozyme and molecular population genetic data. Distributed by the author.
Mohammadi, S.A. and B. M. Prasanna (2003) Analysis of Genetic Diversity in Crop Plants- Salient
Statistical Tools and Considerations. Crop Science, 43:12351248.
Nei, M. and R.K. Chesser (1983) Estimation of fixation indices and gene diversities. Annals of
Human Genetics, 47:253259.
Nei, M. and W. Li (1979) Mathematical model for studying genetic variation in terms of restriction
endonucleases. Proceedings of National Academy of Sciences (USA), 76:52695273.
Nei, M., and R.K. Chesser (1983) Estimation of fixation indices and gene diversities. Annals of
Human Genetics, 47:253259.
Peakall, R. and P. E. Smouse (2006) GENALEX 6: genetic analysis in Excel. Population genetic
software for teaching and research. Molecular Ecology Notes, 6: 288-295.
Pritchard, J.K., M. Stephens, and P. Donnelly (2000) Inference of population structure using
multilocus genotype data. Genetics, 155:945959.
Raymond, M., and F. Rousset (1995) GENEPOP (version 1.2): Population genetics software for exact
tests and ecumenicism. Journal of Heredity, 86:248249.
Schneider, S., D. Roessli and L. Excoffier (2000) ARLEQUIN, version 2.00-software for population
genetics data analysis. Genetics and Biometry Laboratory, University of Geneva,
Switzerland.
Swofford, D.L. (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods),
Version 4. Sinauer Associates, Sunderland, MA.
Weir, B.S. and C.C. Cockerham (1984) Estimating F-statistics for the analysis of population
structure. Evolution, 38:13581370.
Wright, S. (1951) The genetical structure of populations. Annals of Eugenics, 15: 323-354.
Yeh, F.C. and T.J.B. Boyle (1997) Population genetic analysis of co- dominant and dominant markers
and quantitative traits. Belgian Journal of Botany, 129:157.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 48
RAPD and ISSR Analysis
Ritto Paul, Sayuj K.P and K. Nirmal Babu
Indian Institute of Spices Research, Marikunnu P.O., Calicut- 673012.
Principle
Many different methods and technologies are available for the isolation of genomic DNA. In general,
all methods involve disruption and lysis of the starting material followed by the removal of proteins
and other contaminants and finally recovery of the DNA. Removal of proteins is typically achieved
by digestion with proteinase K, followed by salting-out, organic extraction, or binding of the DNA to
a solid-phase support (either anion-exchange or silica technology). DNA is usually recovered by
precipitation using ethanol or isopropanol. The choice of a method depends on many factors: the
required quantity and molecular weight of the DNA, the purity required for downstream applications,
and the time and expense. Several of the most commonly used methods are detailed below, although
many different methods and variations on these methods exist. However, they usually lack
standardization and therefore yields and quality are not always reproducible. Reproducibility is also
affected when the method is used by different researchers, or with different sample types. The
separation of DNA from cellular components can be divided into four stages:
1. Disruption
2. Lysis
3. Removal of proteins and contaminants
4. Recovery of DNA
Standardized Protocol for DNA isolation for Spices.
- Lyophilize 200-300 mg of fresh leaf material.
- Grind 20 mg of lyophilized leaf material to a fine powder using quartz sand using pestle and
mortar.
- Transfer the powdered material to 700 l of pre-warmed Extraction buffer and 700 l of 2X
CTAB buffer and incubate for 60 min at 60 C with occasional stirring.
- Extract with equal volume of Phenol: Chloroform: Isoamyl alcohol (25:24:1).
- Centrifuge at 10,000 rpm for 15 min at room temperature (20 C).
- Separate the aqueous phase and transfer to a fresh tube.
- Add 2 l 0f RNase A (10 mg/ml) to final concentration of 50 mg/ml and incubate for 30 min
at 37 C.
- Extract with an equal volume of chloroform: isoamyl alcohol (24:1) at 10,000 rpm for 10
min.
- To the aqueous phase add 0.6 volumes of ice-cold isopropanol and incubate at -20C for 30-
60 min.
- Centrifuge at 10,000 rpm for 10 min at 4 C. Wash the DNA pellet obtained with 70%
ethanol and 10 mM ammonium acetate.
- Dry the DNA pellet and dissolve in 100 ml of water or low concentration TE buffer.
Quantification of DNA
i. Agarose gel electrophoresis
- Attach tape to the ends of the gel tray. Position the well-forming comb and ensure that the gel
tray is horizontal.
- Prepare 0.8% Agarose gel by adding 0.8 gm agarose in 100 ml of 1x TAE and gently boil the
solution in microwave oven with occasional mixing until all agarose particles are completely
dissolved. Allow it to cool to 60C and add 0.1g/ml to 0.5g/ml ethidium bromide. Pour agarose
onto the gel tray and allow the gel to set.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 49
- Remove the comb and tape. Place the gel into the electrophoresis tank and pour 1x TAE until the
gel is fully immersed.
- Load the DNA sample wit 6 x loading dye in to the wells. In one well load a standard marker.
- Carry out the electrophoresis at 5-6 V/cm gel until the dye is 4-5 cm from the wells.
- Visualize the DNA bands on a UV transilluminator or in Gel Documentation System.
ii. DNA quantification by UV spectroscopy
- Take 5l of the DNA samples in a quartz cuvette. Make up the volume to 1 ml with distilled
water.
- Measure absorbance of the solution at wavelengths 260 and 280 nm.
- Calculate the ratio A280/A260.
- A good DNA preparation exhibits this ratio < 0.55 O.D units.
- Calculate DNA concentration using the relationships for soluble standard DNA, 1 O.D at 260 nm
=50g/ml. This estimate is influenced by the contaminating substances like RNA and very low
molecular weight DNA in the solution.
Randomly Amplified polymorphic DNA (RAPD) analysis
Randomly amplified polymorphic DNAs (RAPDs) are well suited to high through put
system, required for plant genetic analysis because of its simplicity, speed, low cost requirement of
smaller quantities of genomic DNA and relative abundance of the marker in the genome. This is a
PCR based technique in which single PCR primer of ten nucleotides in length will find homologous
sequences in the genome, by chance and will amplify several regions of the genome, if the primer is
annealed within the reasonable distance that can be amplified by Taq DNA polymerase and also in
correct orientation. RAPDs are dominant marker, which cannot differentiate the homozygotes from
the heterozygotes.
The primer used in the RAPD reaction possesses the base sequences, which is arbitrarily
defined. In this marker system the investigator have no idea to which, if any gene or repeated
sequence in the plant genome, the primer may have homology. Any band after the RAPD reaction
resolved in an ethidium bromide stained agarose gel or silver stained polyacrylamide gel can be used
as the raw data for comparison of plant genome.
Inter Simple Sequence Repeat (ISSR) analysis
The Inter-Simple Sequence Repeat marker (ISSR, anchored microsatellite) use simple
sequence repeats anchored at the 5' or 3' end by a short arbitrary sequence as PCR primers
(Zietkiewicz et al, 1994). This generates multilocus markers. It is a simple and quick method that
combines most of the advantages of microsatellites (SSRs) and amplified fragment length
polymorphism (AFLP) to the universality of random amplified polymorphic DNA (RAPD).
ISSR markers are highly polymorphic and are useful in studies on genetic diversity,
phylogeny, gene tagging, genome mapping and evolutionary biology. ISSRs are ideal markers for
genetic mapping and population studies due to their abundance and the high degree of polymorphism
between individuals with a population of closely related genotypes.
Optimization of reaction conditions should precede the actual RAPD and ISSR analysis
to get repeatable results. Following optimizations are essential:
- Template DNA concentration.
- Taq DNA polymerase concentration.
- Mg
2+
ion concentration.
- Primer concentration.
- Primer annealing temperature.
Labo
-
Mate
polym
and 1
initial
denatu
at 72
TAE b
Data
coeffi
System
dendr
Refer
Rohlf
Samb
Zietki
Natio
ratory Manu
Primers su
rial and reag
Amplificat
merase buffer
U Taq DNA
l step of 94
uration, 45 s
C for elonga
Amplificat
buffer (pH 8.
Ampl
were analy
icients for bi
m program
rograms can b
rences
f, F.J. (1999)
2.02i. Exet
brook, J., Frits
I. 2
nd
editio
iewicz, E., R
repeats (S
17683.
Figure showing
onal training o
ual
uitable for det
gents:
tion was perf
, 50 ng of tot
A Polymerase.
C for 5 m
at (37C for R
ation. A 10 m
tion products
0) stained wit
lified product
sed based o
nary data via
Package for
be constructed
). NTSYS-pc
ter Software,
sch, E.F. and
on. Cold Sprin
Rafalski, A. a
SR)-anchored
g the ISSR profili
on Allele Mi
tection of poly
formed in a to
tal cellular D
. PCR amplif
mins, followed
RAPD analys
mins step at 72
s were separa
th Ethidium B
ts were score
on the Jacca
a SIMQUAL
r PC (NTSY
d based on the
c Numerical
Setauket, Ne
d Maniatis, T.
ng Harbor Lab
and Labuda.
d polymerase
ing of Black pepp
ining 12
th
- 2
ymorphic loci
otal volume o
DNA, 150M
fication were
d by 35 cyc
sis and 50 C
2 C is progra
ated by electr
Bromide as de
ed as present
ards Sorens
of the Num
YS-pc ver. 2
e analysis of t
Taxonamy a
ewyork.
. (1989). Mole
boratory Pres
D. (1994). G
e chain react
pper
25
th
Sept, 201
i in the taxa t
of 25 ml incl
dNTP mix,
performed in
cles, each on
for ISSR Ana
ammed as a fin
trophoresis on
escribed by S
t (1) or absen
sen-Dice and
merical taxono
2.021i Packa
the data.
and Multivari
lecular Clonin
ss.
Genome fing
tion amplific
11, IISR, Cali
to be analysed
luding 2.5l o
1.5 mM MgC
n a thermocy
ne including
alysis) for ann
nal extension
n 1.5% agaro
Sambrook et a
nt (0) to form
d Simple m
omy and Mu
age) (Rohlf,
iate Analysis
ng: A Labora
gerprinting by
cation, Genom
icut
d.
of 10 X Taq
Cl
2
, 0.4 M p
ycler as follow
45s at 94
nealing and 2
n.
ose gel and i
al (1989).
m a binary m
matching simi
ultivariant An
1993). UP
s System. Ve
atory Manual
y simple seq
mics, Vol. 20
50
DNA
primer
ws: an
C for
2 mins
in 1X
matrix.
ilarity
nalysis
GMA
ersion
l, Vol.
quence
0, pp.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 51
Microsatellite (simple sequence Repeats) Profiling
Anucyriac, Anupama K. Rittopaul, Rahul P.R
Indian Institute of Spices Research, Marikunnu P.O., Calicut- 673012.
Simple sequence repeats (SSRs) also called microsatellites are stretches of DNA consisting of
tandemly repeating mono, di, tri, tetra or penta nucleotide units that are arranged throughout the
genomes of all prokaryotic and eukaryotic genomes analysed to date (Powell et al., 1996; Zane et al.,
2001). SSR loci harbor considerable length variation and are extremely abundant. The origin of such
polymorphism is appears most likely to be due to slippage events during DNA replication
(Schlotterer & Tautz 1992).They are individually amplified by polymerase chain reaction from total
genomic DNA, using a pair of oligonucleotide primers specific to the DNA flanking the SSR
sequence and hence define the microsatellite locus. Amplification products obtained from different
individuals can be resolved on gels to reveal polymorphism. The amplified products usually exhibit
high levels of length polymorphisms, which result from variation between alleles in the number of
tandemly repeating units of the locus (Tautz, 1989; Weber and May, 1989). Microsatellites have
proven to be an extremely valuable tool for genome mapping in many organisms (Schuler et al.1996;
Knapik et al. 1998), but their applications span over different areas ranging from ancient and forensic
DNA studies, to population genetics and conservation/management of biological resources ( Jarne &
Lagoda 1996).The advantages of microsatellites are that they are relatively abundant with uniform
genome coverage, high variable codominant, robust and reproducible, easy to detection by PCR,
represent sequence tagged sites and require only small amount of starting DNA. Their high
information content, which is directly related to the effective number of alleles at each locus and the
ease of automating the PCR assays for identifying the Simple Sequence Repeat polymorphisms make
SSRs ideal genetic markers. But there is considerable difficulty in generating SSR markers compared
to others as cloning and sequence information is necessary.
The traditional method for isolation of SSRs involve - Creation of small insert genomic
library, Screening of library for presence of microsatellites, sequencing of the positive clones, primer
design and locus specific analysis and identification of polymorphisms (Rafalski et al., 1996). A
further class of isolation methods is based on selective Hybridization which appears to be extremely
popular for isolation of microsatellites (Zane et al 2001). The basic protocol involves restriction
digestion of the DNA into small fragments, Selective hybridization using biotinylated
oligonucleotides, capture the microsatellite containing regions using magnetic beads , cloning of the
DNA fragments, Sequencing of positive clones and primer designing (Armour et al., 1994; Kijas et
al., 1994;Glenn et al 2005).
ATTTGTATTT TACAACACCT CACATGCTCA GTTATTTGGT TCATATGCAA
Forward Primer
GTCTCGGTTT TGGTCTCTGC TCAGAAAAAG AGAGAGAGAG AGAGAGAGAG
Reverse Primer
AGAGAGAGAG AGAGAGAGAA GAAATTTGCA GTTAATTGTC AAGTAGAAGT
Fig. 2. Soyabean library derived microsatelllite (AG)
20
Because of their sensitivity to minor genetic differences, PCR-based markers such as AFLPs and
microsatellites are likely to remain key molecular tools for some time to come.
Protocol for developing microsatellite profiles
Microsatellites can be amplified with specifically designed primers, if available for crop in
question, using PCR and can be resolved on either acrylamide or high quality Agarose gels both
radioactive as well as non-radioactive methods can be used.
A simple method using non-radioactive PCR and polyacrylamide gel electrophoresis is given below;
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 52
PCR Amplification: Select the DNA of the population that is to be studied. Prepare PCR for each
of the genotypes in the following method:
Per reaction x 10
10x PCR buffer (without Mgcl
2
): 2.5 l 25 l
Mgcl
2
(25mM)
2l 20l
dNTPs 2.5 mM each 1l 10l
Forward Primer (5M ) 2.5l 25l
Reverse Primer (5M) 2.5l 25l
Sterile H
2
O 13.4l 134l
Taq Polymerase 0.5 Unit (5U/ l) 0.1l 10l
Mix thoroughly, distribute 24l into each PCR tube and add
DNA: 15-25ng, 1 l (20ng /l) to each of the tube. Total reaction volume 25l
Follow standarad PCR with
Initial denaturation: 94
0
C for 2 min 1 cycle
Denaturation, annealing and primer extension: 94
0
C for 30 seconds 35 cycles
50-60
0
C for 30seconds
72
0
C for 1 min
Final Extension 72
0
C for 5 min 1 cycle
Electrophoresis
Resolve the amplification product in 3% Metaphor Agarose gel in 1x TBE or using 6-8%
polyacrylamide gel in 1X TBE. Use the profile for analysis.
SSR Advantages:
- Co-dominant (more informative when dealing with heterozygotes)
- Highly variable (important for species with narrow gene pools)
- Widely used
- Excellent for use in marker assisted selection, fingerprinting and marker assisted
backcrossing
Neutral Polyacrylamide Gel Electrophoresis
These gels are used for the separation and purification of fragments of double- stranded
DNA. They will migrate through non-denaturing polyacrylamide gels at rates that are inversely
proportional to the log
10
of their size. The mobility is also affected by their base composition and
sequence, so that duplex DNAs of exactly the same size can differ in mobility up to 10%. Monomers
of acrylamide are polymerized into long chains in a reaction initiated by free radicals. In the presence
of N, N methylenebisacrylamide, these chains become cross linked to form a gel. The porosity of
the resulting gel is determined by the length of chains and degree of cross linking that occurs during
the polymerization reaction.
Materials
1. TBE 10X (500ml)
Trizma base 54g
Boric acid - 27.5g
0.5M EDTA, pH 8.0 20.0ml
2. 40% Acrylamide/ bisacrylamide (29:1) solution
Acrylamide 38.62g
Bisacrylamide 1.38g
Add water to obtain a final volume of 100ml;store at 4C
3. 10% Ammonium per sulfate (APS)
Dissolve 0.1g of APS in 1ml distilled water., Store at 4C
4. KOH/Methanol solution (10%w/v)
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 53
This solution is for cleaning the glass plates used to cast sequencing gels. It is prepared by
dissolving 5g of KOH pellets in100 ml of methanol. Store the solution at room temperature in atightly
capped glass bottle.
5. TEMED
Electrophoresis grade TEMED available from commercial suppliers.
6. Ethidium Bromide. (10mg/ml) 1%stock.
Add 1g of ethidium bromide to 100 ml of water. Stir on a magnetic stirrer for several hours to
ensure that the dye has dissolved. Wrap the container in aluminum foil or transfer the solution to a
dark bottle and store at room temperature.
7. Loading Dye(6X)
Sucrose (40%) or Glycerol(30%) = 4gm or 3gm
Bromophenol blue (0.25%) = 0.025gm
Xylene cyanol (0.025%) =0.025gm
Make upto 10ml with distilled water.
Methods
Assembling the apparatus and preparing the Gel solution
1. If necessary, clean the glass plates and spacers with KOH/methanol.
2. Wash the glass plates and spacers in warm detergent solution and rinse them well, first in tap
water and then in deionized water. Hold the plates by the edges or wear glows, so that oils
from the hand do not become deposited on the working surface of the plates. Rinse the plates
with ethanol and set them aside to dry.
3. Assemble the glass plates with spacers:
a, Lay the larger (or un notched) plate flat on the bench and arrange the spacers at each side
parallel to the two edges.
b, Lay the inner (notched) plates in position, resting on the spacer bars.
c, Clamp the plates together with binder or bulldog paper clips and bind the entire length of
the two sides and the bottom of the plates with gel-sealing tape to make a water tight seal
4. Taking into account the size of the glass plates and the thickness of the spacers, calculate the
volume of gel required. Prepare the gel solution with the desired polyacrylamide percentage .
Add the following into a beaker to prepare 60 ml of 8% polyacrylamide gel and swirl it for
mixing.
40%Acrylamide solution - 12.0 ml
10XTBE - 6.0 ml
10%APS - 300.0l
TEMED - 125.0l
Distilled water - 41.58ml
5. Expel the gel solution to the assembled plates, avoiding air bubbles and filling almost to the
top.
6. Once the solution is filled up insert the comb at the top of gel without creating air bubbles.
Allow 60 min for the gel to polymerize.
7. After polymerization is complete, surround the comb and the top of the gel with paper towels
that have been soaked in 1 X TBE. Then seal the entire gel in Saran Wrap and store it at 4c
until needed (may be stored for 1-2 days in this state before used).
8. When ready to proceed with electrophoresis, carefully pull the comb from the polymerized
gel and remove the gel sealing tape from the bottom of the gel .
9. Remove any excess polyacrylamide from around the comb and top of the glass plates with
razor blade. Clean the plates with paper towels.
10. Add 1X TBE in the bottom chamber or as per the capacity of the chamber.
11. Fit the gel assembly into the apparatus and fill the upper tank with required quantity of 1X
TBE buffer. Remove any air bubble from the top of the gel.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 54
12. Add tracking dye; 2l of 6X dye for 10l of PCR product and mix it well.
13. Flush the wells with IX TBE buffer. Load the samples into the wells as per the type of comb
used. Generally 1-2 l of the sample is loaded. If the concentration of PCR product is high,
smaller quantity should be loaded for better resolution of DNA fragments.
14. Each gel must be loaded with DNA size markers (50bp or 100bp as per requirement).
15. Start electrophoresis with constant voltage of 80V for 5h (Low voltage with longer duration
helps in the finer separation of closely sized markers).
16. Run the gel until the marker dyes have migrated the desired distance. Turn off the power,
disconnect the leads, and discard the electrophoresis buffer from the reservoirs.
17. Detach the glas plates. Use a spacer or plastic wedge to lift a corner of the upper glass plate.
Check the gel remains attached to the lower plate. Pull the upper plate smoothly away.
Remove spacers.
18. The gel is taken out and stained in the tray containing 20l of ethidium bromide (1.0% stock
solution) in 1 litter of distilled water for 5 minutes.
19. The tray is constantly shaken in the horizontal shaker to maintain the uniformity of the
solution.
20. The gel is taken out and destained in double distilled water for 20 minutes.
21. After destaining the gel is analysed using a Gel Doc imaging system .
References
Armour, J. A.L., Newmann, R., Gobert, S., Jefferys, A. J., 1940. Isolation of human simple repeat
loci by hybridization selection .Human Mol.Gen.3, 599-605
Creste,S.,Tulmann,N.A.,Figueira,A., 2001 Detection of Single Sequence repeat polymorphisms in
denaturing Polyacrylamide SequencingGels by Silver Staining. Plant Molecular Biology
Reporter 19,299 - 306
Glenn, T. C., Schable, N.A., 2005.Isolating microsatellite DNA loci. In: Methods in Enzymology
395, Molecular Evolution : producing the biochemical data , part b( ends Zimmer EA,
Roalson EH).Academic pres, San Diego
Jarne ,P., Lagoda, P.J.L., 1996 Microsatellites, from molecules to populations and back. Trends in
Ecology and Evolution, 11,424 - 429.
Jones CJ, Edwards KJ, Castaglione S, Winfield MO, Sale F, Van de Wiel C, BredemeijerG, Buiatti
M, Maestri E, Malcevshi A, Marmiroli N, Aert R, Volckaert G, Rueda J, Linacero R,
Vazquez A and Karp A 1997 Reproducibility testing of RAPD, AFLPand SSR markers in
plants by a network of European laboratories. Mol Breed 3:381-390.
Karp, A., Seberg, O. and Buiatti, M. (1996) Molecular techniquesin the assessment of botanical
diversity, Ann. Bot. 78,143149
Kijas, J.M., Fowler, J.C., Garbett, C.A., Thomas, M.R., 1994 Enrichment of microsatellites from the
citrus genome using biotinylated oligonucleotide sequences bound to streptavidin-coated
magnetic particles. Biotechniques, 16, 656 - 662.
Lu, Z.X., 1998 Construction of a genetic linkage map and identification of AFLP markers for
resistance to root-knot nematodes in peach rootstocks, Genome 41, 199207
Lynch, M. and Walsh, B. 1998 Genetic Analysis of Quantitative Traits, Sinauer
Patterson,A.H., Tanksley, S.D., Sorrels, M.E.,. 1991. DNA markers in plant improvement. Advances
in Agronomy. Vol. 46. Academic Press. pp 40-90.
Powell W, Gordon CM and Provan J. 1996. Polymorphism revelaed by simple sequence repeats.
Elsevier Publishers 1 (7) : 215.
Powell, W., 1996 The comparison of RFLP, RAPD, AFLP and SSR (microsatellite) markers for
germplasm analysis, Mol. Breed. 2,225238.
Rafalski, J.A.,1996. Generating and using DNA markers implants, In, Analysis of Non mammelian
Genomes A Practical Guide (Birren E and Lai E eds.) Academic Press.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 55
Rosendahl, S. and Taylor, J.W. (1997) Development of multiplegenetic markers for studies of genetic
variation in arbuscular mycorrhizal fungi using AFLP, Mol. Ecol. 6, 821829
Sambrook,J and Russel,D.W .2001. Molecular Cloning: A Laboratory Manual, third ed. CSH
Laboratory Press, ColdSpring Harbor, New York.
Semblat, J.P., 1998 High-resolution DNA fingerprinting of parthenogenetic root-knot nematodes
using AFLP analysis, Mol. Ecol. 7, 119125
Schlotterer C, Tautz D 1992 Slippage synthesis of simple sequence DNA. Nucleic Acids Research,
20, 211 - 215.
Swapna ,M.,Sivaraju,K.,Sharma,R,K.,Singh,N.K.,Mohapatra,T.,2010. Single-Strand conformational
Polymorphism of EST- SSRs:A potential Tool for Diversity Analysis and varietal
Identification in Sugarcane
Tautz,D., 1989 Hypervariability of simple sequences as a general source for polymorphic DNA
markers. Nucleic Acids Res 17(16): 6463-6471.
Vos P, Hogers R, Bleeker M, Reijans M, Van der Lee T, Hornes M, Frijters A, Pot J, Peleman J,
Kuiper M and Zabeau M 1995 AFLP: a new technique for DNA fingerprinting.Nucleic Acids
Res 23: 4407-4414.
Labo
Back
Ralsto
is the
plant
long p
plant
plants
analy
which
determ
amon
Thoug
across
agreem
reveal
devise
geogr
respec
Indon
Natio
ratory Manu
kground
onia solanace
causative ag
pathogens ow
persistence in
species, inclu
s, shrubs, and
sed by severa
h is based o
mination (Hay
g the global c
gh popular, th
s the geograp
ment with ea
l the actual
ed Phylotypin
raphical origi
ctively, wher
nesia, Japan an
onal training o
ual
Multi
earum Yabuu
ent of bacteri
wing to its ag
n soil & wate
uding many e
d trees (Wick
al phenotypic
on substrate
yward 1964,
collection of R
hese techniqu
phical location
ch other. Perh
diversity ex
ng which clas
n of the strai
reas phylotyp
nd Australia.
on Allele Mi
ilocus Sequ
uchi (Smith) p
ial wilt of cro
ggressiveness
er associated e
economically
ker et al, 200
and genotyp
utilization a
Buddenhagen
Ralstonia sola
ues are not eno
ns. Besides, t
haps these an
xist in popula
ssifies Ralston
ins: phylotyp
pe III memb
ining 12
th
- 2
uence Typ
A. Kuma
IARI, New D
previously kn
op plants that
s, broad host
environments
important cro
07). The gene
ic. The pheno
ability of th
n 1962). Five
anacearum (T
ough for deci
these two ind
nomalies prom
ation of Ralsto
nia solanacea
e I and II are
bers are Afri
25
th
Sept, 201
ping of bac
ar
Delhi
nown as Pseu
is regarded a
range & wid
s. The pathog
ops including
etic diversity
otypic assay i
he bacterium
e biovars and
Table 1 & 2).
iphering the p
dependent syst
mpted the res
onia solanac
arum in to fou
e composed o
ican, and ph
11, IISR, Cali
cteria
udomonas so
as one of the i
de geographic
gen is known
g vegetables,
y of Ralstonia
includes biov
m, and host
as many race
population bio
tems of class
earchers to d
cearum. Fegan
ur phylotypes
of Asian and
hylotype IV
icut
lanacearum
important bac
cal distributio
to infect ove
spices, herba
a solanacear
var characteriz
range based
es have been
ology of the s
sification are
devise finer to
n and Prior (
s which reflec
d American st
isolates are
56
Smith
cterial
on and
er 450
aceous
rum is
zation
d race
found
strains
not in
ools to
2005)
cts the
trains,
from
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 57
Molecular tools for genetic diversity analysis: Many techniques based on the electrophoretic
mobility of the genomic fragments are in use for the analysis of population structure of R.
solanacearum isolates worldwide. The genotypic tool based on electrophoretic patter comparison of
PCR/restriction digestion generated fragments (to name few, ISSR, RAPD, Rep-PCR) among the
strains is the most popular choice in the late 1990s. These techniques which exploits, the random
amplified fragments are turned out to be NonPortable Tools due to their inherent non-reproducible
nature. Sequence based discrimination of strains such as Multilocus sequence typing (MLST) and
Comparative Genome Hybridization (CGH) which uncovers allelic variants in conserved
housekeeping and virulence genes is portable across the laboratories. Sequence data can be compared
readily between laboratories, such that a typing method based on the sequences of gene fragments
from a number of different housekeeping loci. Multilocus sequence typing approach uses sequences
of internal gene fragments and assigns different allele numbers to the sequence at each locus, so it
will provide unique allelic profile for each isolate called Sequence Types (STs). Based on this
approach Castillo and Greenberg (2007) had analyzed the evolutionary forces operating on R.
solanacearum populations using Multilocus Sequence Typing (MLST) including five housekeeping
and three virulence-related genes. R. solanacearum to be a diverse pathogen, showing high levels of
nucleotide polymorphism and a number of unique alleles in the Chromosome and in the
Megaplasmid. So far about 27 to 33 STs were identified for the eight genes by MLST based analysis.
Methodology Isolation of the R. solanacearum isolates
Bacterial wilt affected plant samples were collected from field and processed for isolation of
bacterium. The thoroughly washed stem cutting of wilted plants were allowed to ooze in a clean glass
of water for few minutes and were plated on to CPG agar amended with 2, 3, 5 triphenyl tetrazolium
chloride and incubated at 28
o
C for two to three days. The typical colonies of R. solanacearum as
indicated by their fluidal appearance with spiral pink centre were purified by repeated streaking on
fresh plates.
Preparation of bacterial cells for DNA isolation
A single colony of R. solanacearum was inoculated in a broth and incubated for about 24-36 h for
isolation of total genomic DNA.
Isolation of genomic DNA from R. solanacearum
DNA isolation
1. Density of the bacterial suspension is adjusted to OD1.0 @ 600nm
2. Spin down at 14000 rpm at room temperature for 2 min.
3. The supernatant is discarded and pellet is washed three times with sterile distilled water.
4. To the pellet 550l of TE buffer+lysozyme is added, mixed well and incubated for 30 min at 37C.
5. After incubation 76l of 10% SDS+Proteinase K is added.
6. The contents are mixed by flipping the tube and incubated for 15 min at 65C.
7. After incubation 100l of 5M NaCl is added and mixed the contents by flipping the tube.
8. Then 80l of CTAB/NaCl is added, mixed and incubated for 10 min at 65C.
9. After incubation 660l of Chloroform+isoamyl alcohol is added.
10. The contents are mixed by flipping the tube about 30sec.
11. Then centrifuge for 5 min at 14000 rpm at room temperature.
12. After centrifugation the aqueous fraction is carefully transferred to a new 1.5ml tube without
touching the white middle layer (interface). This step is repeated twice.
13. Equal volume of isopropanol is added and inverted to mix.
14. Then centrifuge for 15 min at 14000 rpm at room temperature.
15. The supernatant is gently drained and mixed with 0.5ml of 70% ice cold ethanol.
16. Centrifuge for 15min at 14000rpm at room temperature.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 58
17. After this the supernatant is carefully removed and evaporated the remaining ethanol in the
laminar flow for about one hour.
18. 25l of 10:1 TE is added to each tube to dissolve the DNA and the tubes are kept at 4
o
C for
overnight.
19. The DNA from the two tubes is pooled in one tube, so the total is 50l and added RNase to
remove the contaminating RNA at a concentration of200g/ml
20. The tubes are incubated for 30min at 30C. Then stored the DNA at -20
o
C.
Quality analysis and quantitation of genomic DNA
1. 5l of stock DNA is diluted 10 times by adding 45l of MQ water.
2. The quantity of DNA is measured using a Biophotometer
3. Quality assessed by gel electrophoresis
4. DNA concentration is adjusted to 200ng per ul of water 5. Proceed with PCR amplification of
genes
Multilocus Sequence Typing
Various steps involved in the sequence typing are given in the fig 1. Briefly the selected genes are
amplified, purified and sequenced. The sequence reads are assembled and compared with the database
for assigning the alleles. The combination of the allele numbers is unique for each strain of the
bacterium in question. The allele numbers are further compared among the strains in order to decipher
the strain migration in the field of molecular epidemiology.
Choice of loci: For the diversity analysis five housekeeping genes, which resides in the
chromosome (ppsA, phosphoenol pyruvate synthase; gyrB, DNA gyrase, subunit B; adk, adenylate
kinase; gdhA, glutamate dehydrogenase oxidoreductase; and gapA, glyceraldehyde 3-phosphate
dehydrogenase oxidoreductase) and three plasmid borne virulence related genes (hrpB, regulatory
transcription regulator; fliC, encoding flagellin protein; and egl, endoglucanase precursor) are
considered. The details of the genes, its protein and the conserved length are furnished in the Table 3.
Fig.1.MLSTworkflow
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 59
PCR Amplification: For PCR amplification, the reaction mixture (50l) contained 50-100ng of
template genomic DNA, 1 PCR buffer, MgCl2 3mM, DMSO 6%, each dNTPs 50M, 10pmol of
each primer(Table 1), and 1 U of Taq DNA polymerase DNA was amplified using an initial
denaturation at 96C for9 min, followed by 35 cycles of 95C for 30s, appropriate annealing for 1 min
and extension 72C for 2 min. Reactions were completed with a final extension step of 10 min at
72C. All PCR products were electrophoresed through a 1.0 % agarose gel and visualized with UV
light after ethidium bromide staining.
Elution, Purification and Sequencing: The amplicon was eluted and purified using Gel Elution
kit according to the instructions given. The eluted product was sequenced in both directions and the
sequences were assembled using DNA baser software. Sequencing is carried out on each DNA strand
with BigDye Terminator Ready Reaction Mix under the following conditions, an initial denaturation
at 96C for 9 min, followed by 35 cycles of 95C for 30s, appropriate annealing for 1 min and
extension 72C for 2 min. Reactions were completed with a final extension step of 10 min at 72C.
Unincorporated dye terminators were removed by precipitation with 95% alcohol.
Sequence analysis: Sequences are carefully analysed and sequence type assigned for each of the
strain by comparing the data sets with www.pamdb.org. The strain relation with the existing
collection of strain can be determined by eBurst programme ( http://eburst.mlst.net).
Handling sequence data: The sequencing machines would give us the chromatogram indicating
the quality of the sequence reads (Fig 2). The sequence reads are carefully observed for any errors in
the base using any one of the chromatogram viewers (eg. DNA baser, BioEdit, Chromos etc). Thus
obtained sequence is called as raw sequence (Fig.3). For each gene, two such sequences are obtained
which are known as forward sequence and reverse sequence respectively. The forward and reverse
Labo
seque
DNA
seque
>A01
CTGG
ACAA
AGCT
AGCT
CACG
GGCA
ACCG
>A01
GCAG
GAAG
CCAT
AGTT
TCCA
GTTG
TTGC
Fig. 3
seque
>FliC
GGCC
GCAG
GACC
AGCC
TGCC
GTTG
CGAG
Fig. 4
The a
conse
19 of
>CaR
GCCG
GCTG
AAGG
Natio
ratory Manu
ences are asse
baser). The
ences (Contigs
_CaRS_Mep
GGCCAGGT
ACGGCGGT
TGGCAACG
TGTTCGAC
GGACGTGA
ACGAGCGT
GACCTGAC
1_CaRS_Me
GGGCTGCG
GGTCGACA
TATTGGAA
TGGCGTTG
AGGTTGGT
GGCGCATA
CAGGTACG
3. Fasta file
encing mach
C-CaRs-Mep
CTTCAGGG
GCGCTGGT
CGTGGTCA
CGTCGAAC
CAGCTGTTG
GGATTCCA
GTCGGCCG
4. Assemble
assembled seq
ensus sequenc
Ralstonia sol
Rs-Mep (318)
GACTCGTA
GTGGAATC
GAATACCA
onal training o
ual
embled using
assemble seq
s) are used to
pFLIC_FLIC
TTGAAACAA
TCTGTCGGC
GGCTAACAA
GGCTCGGT
ACCACGGTC
TGACCAGCG
CCTCCCTGA
pFLIC_FLIC
GGTCGCGTT
ATGTTGACG
AGGTCGTCG
GGTTTCGAT
CTGGTCGG
ACGTTGCAG
GAGTTCGCA
format of f
hine
(393)-Contig
GAGGTCAGG
TCACGCTCG
ACGTCCGTG
CAGCTTGTT
GGTATTCC
CAGCCAGT
GTTTGCAGG
ed sequence
quences are co
ce where the a
lanacearum i
)
ACCTGGGCC
CCAACAACG
AACAGCTGG
on Allele Mi
g any one of t
quences are c
determine th
CF_copy.ab1
ACCTGCAA
CAGCCGAC
AGAACATC
TGGCTTCGA
CACCAACG
GCTGCCAA
AAGGCCGCC
CR_copy.ab
TGGCAGCG
GTTGGTGA
GAAGCCAC
TGTTCTTGT
GCTGCCGAC
GGTTGTTTT
ACCGTTGGC
forward and
gs
GTCGGTAT
GTGCCGGTC
GGCTGCGTT
TGCCGTTGT
CTTGTCCAG
TTGGCGCAT
GTACGAGTT
for an allel
ompared with
allelic differen
s furnished be
CAGGTTGAA
GGCGGTCT
GCAACGGC
ining 12
th
- 2
the programm
alled Contigs
he allelic varia
1- Forward S
ACGTATGCG
CCAGACCAA
CGAAACCAA
ACGACCTT
GTCAACATG
ACGCGACCG
CCA
1- Reverse S
GCTGGTCAC
CCGTGGTC
CGAGCCGT
TTAGCCGTT
CAGACCGC
TCAACCTGG
CGTT
d reverse se
CGATCGCG
CAGCGTGC
TCTGGCCA
TAGTTGGC
GGTTGGTCT
TACGTTGC
TCGCAC CG
le, fliC
h the database
nces can be re
elow for illus
AAACAACC
GTCGGCAG
CTAACAAG
25
th
Sept, 201
mes that are a
s (large conti
ations (Fig. 4
Sequence
GCCAACTGG
ACCTGGAC
ACGCCAAC
CCAATATG
GTCGACCTT
GCAGCCCA
Sequence
CGCTCGTGC
CACGTCCGT
TCGAACAG
TGCCAGCTG
CCGTTGTTG
GCCCAGGT
equence read
GGCCTGGGC
CCGAAGGTC
ATATTGGAA
GTTGGTTT
TGGTCGGCT
CAGGTTGTT
GTTGGC
e (www.pamd
esolved. The
stration (Fig.5
CTGCAACG
GCCGACCA
GAACATCGA
11, IISR, Cali
available in p
iguous sequen
).
GCTGTGGA
CAAGGAATA
CTACAACG
GGCCAGAA
TCGGCACG
AGGCCGCGA
CCGGTCAG
TGGCTGCG
GCTTGTTGC
GTTGGTATT
GGATTCCAC
TACGAGTCG
ds obtained
CTGCGGTC
CGACATGT
AGGTCGTCG
TCGATGTTC
TGCCGACA
TTTCAACCT
db.org) for de
consensus se
5).
GTATGCGCC
AGACCAACC
AAACCAAC
icut
public domain
nces). Such a
AATCCA
ACCAAC
GGCAACA
ACGCAGC
GCTGACC
ATCGAT
GCGTGCC
GTTCTGG
CCGTTGT
TCCTTG
CAGCCA
GGCCGT
d from
CGCGTTG
TTGACGTTG
GAAGCCAC
CTTGTTAGC
AGACCGCCG
TGGCCCAG
termination o
equence of all
CAACTG
CTGGAC
CGCCAAC
60
n (eg.
a long
GGT
CCG
CCGT
GTT
GTA
of
lele
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 61
TACAACGGCAACAAGCTGTTCGACGGCTCGGTGGCTTCGACGACCTTCCAATATG
GCCAGAACGCAGCCACGGACGTGACCACGGTCACCAACGTCAACATGTCGACCT
TCGGCACGCTGACCGGCACGAGCGTGACCAGCGCTGCCAACGCGACC
Fig.5. Allele 19 of gene fliC belong to Ralstonia solanacearum infecting Zingiberaceae
members
The string of allele numbers (integers) for the housekeeping and virulence genes obtained for a strain
is called as sequence type which is specific for a strain of bacterium. For example, the allele numbers
obtained for a cardamom strain of Ralstonia solanacearum is ppsA-10, fliC19, hrpB-27, gdhA- 24,
adk-1, gyrB-26, egl-25. The combination of integers (10, 19, 27, 24, 1, 26, and 25) serves the input
data for establishing the strain relationship by eBurst programme which is based on eBurst algorithm,
a dedicated programme for analysis of microbial MLST data.
Phylogenetic analysis using MLST data The allelic sequences, thus, obtained from the strains
are pooled to construct concatenated sequences which serve input data for establishing phylogeny.
The concatenated sequence is nothing but the string of all the loci are assembled in an order (ppsA +
fliC + hrpB + gdhA + adk + gyrB + egl ) to get large sequence length. An example of concatenated
sequence constructed for a strain of Ralstonia solanacearum obtained from cardamom is furnished
below (Table 3, Fig.6). This large sequence length is used in the phylogenetic analysis of bacterium in
question.
>R_solanacearum__CaRs-Mep_ (911) [ppsA + fliC + hrpB + gdhA + adk + gyrB + egl]
GACGAAGACGTGGTCGAGCTGGCCAAGTACGCCGTCATCATCGAGAAGCACTAC
GGTCGCCCGATGGACATCGAGTGGGGTAAGGACGGCAAGGACGGCAAGATCTAC
ATCCTGCAGGCCCGCCCCGAGACGGTGAAGAGCCAGTCGGTCGGCAAGGTCGAG
CAGCGCTTCCGCCTGAAGGGCTCGGCGCCGGTGCTGACCACCGGCCGCGCGATCG
GCCAGAAGATCGGTACGGGCCCCGTGCGCGTGATCAACGATCCGGCCGAAATGG
AGCGCGTGCAGCCGGGCGACGTGCTGGTCGCCGACATGACCGACCCGAACTGGG
AGCCGGTGATGAAGCGCGCCTCGGCCATCGTCACCAACCGTGGCGGCCGCACCT
GTCACGCCGCCATCATCGCGCGTGAGCTGGGCGTGCCGGCCGTGGTCGGCTGCGG
CGACGCCACCGACCTGCTGAAGGACGGCACGCTGGTCACCGTGTCCTGCGCCGA
GGGCGACGAAGGCAAGATCTACGACGGCCTGCTCGAGACGGAAATCACCGAAGTGCGC
CGCGGCGAGATGCCGCCGATCGACGTCAAGATCATGATGAACGTCGGCAA
CCCGCAGCTGGCCTTCGAGTTCGCGCAGATCCCGAACGGCGGCGTGGGCCTGGCC
CGCCTCGAGTTCATCATCAACAACAACATCGGCGTCCACCCGAAGGCGATCCTCG
ACTACCCGCAAGCCGACTCGTACCTGGGCCAGGTTGAAAACAACCTGCAACGTAT
GCGCCAACTGGCTGTGGAATCCAACAACGGCGGTCTGTCGGCAGCCGACCAGAC
CAACCTGGACAAGGAATACCAACAGCTGGCAACGGCTAACAAGAACATCGAAAC
CAACGCCAACTACAACGGCAACAAGCTGTTCGACGGCTCGGTGGCTTCGACGAC
CTTCCAATATGGCCAGAACGCAGCCACGGACGTGACCACGGTCACCAACGTCAA
CATGTCGACCTTCGGCACGCTGACCGGCACGAGCGTGACCAGCGCTGCCAACGC
GACCGTGCTGGCGATGGCCGATGCCTCGCTGCTGCTCGAGTGCGATGAAGAAGC
GGAAGAAGGCTTCCGCCTGGCGCAGCGCCTGATCCGCCATTCGGATGACCAGCTG
CGCGTGGTGTCGTGCCGCAATACCGGCTGGCAGGCACTGCTGCGCGATCGCTACG
CCGCGGCGGCGAGCTGCTTCTCGCGCATGGCCGAAGACGATGGCGCGACCTGGA
CCCAGCAGGTCGAGGGCCTGATCGGCCTGGCGCTGGTGCATCACCAGCTCGGCCA
GCAGGATGCCTCCGACGACGCGCTGCGGGCGGCGCGCGAGGCCGCAGACGGCCG
CAGCGATCGCGGCTGGCTGGCCACCATCGATCTGATCATCTACGAATTCGCCGTG
CAGGCCGGCATCCGCTGCTCCAACCGCCTGCTCGAGCATGCGTTCTGGCAATCGG
CCGAAATGGGCGCGACCCTGCTGGCCAACCACGGCGGCCGCAACGGTTGGACGC
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 62
CGACCGTATCGCAGGGCGTACCGATGCCGGCGCTGATCCAGCGCCGCGCCGAAT
ACCTCAGCCTGCTGCGCCGCATGGCCGACGGGGACCGCGCGGCAATCGACCCGC
TGATGGCGACCCTCAACCACTCGCGCAAGCTCGGCAGCCGCCTGCTGATGCAGAC
CAAGGTGGAAGTCGTGCTGGCCGCGCTGAGCGGCGAGCAGTACGACGTCGCCGG
CCGCGTCTTCGACCAGATCTGCAACCGCGAGACCACCTACCGCGCGCGCCGCTGG
AATTTCGACTTCCTCTACTGCCGCGCCAAGATGGCCGCCCAGCGCGGCGACTCGG
TCAAGAACGCGGCCGTCAACGTGCCGTACGGCGGCGCCAAGGGCGGCGTCCGCG
TCGATCCGCGCAAGCTGTCGTCGGGCGAACTCGAGCGCCTGACCCGCCGCTACAC
CAGCGAGATCGGCATCATCATCGGCCCGAACAAGGACATCCCGGCGCCGGACGT
GAACACCAACGCGCAGATCATGGCGTGGATGATGGACACGTACTCCATGAACGA
AGGCGCCACCGCCACCGGCGTGGTGACCGGCAAGCCGATCGCGCTGGGCGGCAG
CCTGGGCCGCCGCGAGGCGACCGGCCGCGGCGTGTTCGTGGTCGGCAGCGAGGC
TGCACGCAATCTGGGCATCGACGTCAAGGGTGCGCGCATCGTGGTGCAAGGCTTC
GGCAACGTCGGCAGCGTGGCCGCCAAGCTGTTCCAGGATGCCGGCGCCAAGGTG
ATCGCGGTGCAGGACCACAAGGGCATCGTGTTCAACGGCGCGGGCCTGGACGTC
GACGCGCTGATCCAGCACGTGGACCATAACGGCAGCGTCGACGGCTTCAAGGCC
GAGACCCTGTCGGCGGACGATTTCTGGGCGCTGGAATGCGAATTCCTGATCCCGG
CCGCGCTCGAAGGCCAGATCACCGGCAAGAACGCGCCCCAAATCAAGGCAAAAA
TTGTCGTTGAAGGTGCAAACGGCCCCACGACGCCCGAAGCGGACGACATCCTGC
GCGATCGCGGCATCCTGGTCTGCCCGGACGTGATCGCCAATGCCGGCGGCGTCAC
GGTGAGCTATTTCGGCATTCCGCAGATCTCCACCGGCGACATGCTGCGCGCCGCC
GTCAAGGCCGGCACCCCGCTGGGCATCGAAGCCAAGAAGGTGATGGACGCCGGC
GGCCTGGTGTCCGACGACATCATCATCGGCCTGGTGAAGGACCGCCTGCAGCAGT
CCGACTGCAAGAACGGCTACCTGTTCGACGGCTTCCCGCGCACCATCCCCCAGGC
CGAAGCCATGAAGGATGCCGGCGTGCCGATCGACTACGTGCTGGAAATCGACGT
GCCGTTCGACGCCATCATCGAGCGCATGAGCGGCCGCCGCGTGCACGTGGCCTCG
GGCCGGACCTATCACGTCAAGTACAACCCGCCCAAGAACGAGGGCCAGGACGAC
GAAACCGGCGATCCGCTGATCCAGCGCGACGACGACAAGGAAGAAACCCCTGAC
CGGCCTGCGCGCCGCGATGACGCGCGTCATCAACAAGTACATCGCCGACAACGA
GATCGCCAAGAAGGCCAAGGTCGAAACCTCCGGCGACGACATGCGCGAAGGCCT
GACCTGCGTGCTGTCGGTGAAGGTGCCCGAGCCCAAGTTCAGCTCGCAGACCAA
GGACAAGCTCGTTTCGTCCGAAGTGCGCCTGCCGGTGGAAGAAGTCGTGGCCAA
GGCGCTGACGGACTTCCTGCTGGAGACGCCCAACGACGCCAAGATCATCTGCGG
CAAGATCGTTGAAGCCGCGCGTGCCCGCGAAGCCGCCCGCAAGGCCCGCGAGAT
GACGCGCCGCAAGGGCGTGCTCGACGGCATGGGCCTGCCCGGCAAGCTGGCCGA
CTGCCAGGAGAAAGACCCGGCACTGTCCGAACTGTTCATCGTCGAGGGTGACTCCGCAG
GCGGCTCGGCCAAGCAGGGCCGCGACCGCAAGTTCCAGGCGATCCTGCCG
CTCAAGGGCAAGATCCTGAACGTGGAGCGCGCGCGCTTCGACAAGATGCTCTCC
AGCCAGGAAGTGCTCACGCTCATCACCGCCATGGGCACCGGCATCGGCAAGGAC
GACTACAACCTCGACAAGCTGCGCTATCACCGCATCATCATCATGACCGACGCGG
ACGTGGACGGCTCGCACATCCGCACGCTGCTGCTGACGTTCTTCTACCGCCAGAT
GCCCGAGATCATCGAGCGCGGCCACGTGTACATCGCCCAGCCGCCGCTGTACAA
GATCAAGCACGGCAAGGAAGAGCGCTACATCAAGGACGACAACGAGATGGCCG
CGTACCTGATGCGCCAGGCGCTCGACACCGCCATCCTGGTGCGCGCCGACGGCAC
CACCCTCAGTACGGCGGCCGCTACCGACACCACGACCCTGAAGACGGCCGCCAC
CACCTCGATCTCGCCGTTGTGGCTCACCATCGCCAAGGACAGCGCGGCGTTCACG
GTGAGCGGCACGCGCACGGTGCGCTATGGCGCCGGCAGCGCGTGGGTGGCGAAG
AGCATGTCCGGCACAGGCCAGTGCACCGCCGCCTTCTTTGGCAAGGATCCGGCGG
CCGGTGTCGCCAAGGTATGCCAGGTGGCGCAGGGCACGGGCACCCTGCTGTGGC
GCGGCGTCAGCCTGGCCGGCGCCGAGTTCGGGGAGGGCAGCCTGCCGGGCACCT
ACGGGAGCAACTACATCTATCCGTCCGCCGACAGCGCGACCTACTACAAGAACA
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 63
AGGGCATGAACCTCGTGCGCCTGCCGTTCCGCTGGGAGCGGCTGCAGCCCACGCT
CAACCAGGCGCTCGACGCGAACGAGCTGTCGCGCCTGACCGGGTTCGTCAACGC
CGTGACGGCGGCCGGCCAGACGGTGCTGCTCGATCCGCACAACTACGCGCGCTA
CTACGGCAACGTGATCGGCTCGAGCGCGGTGCCCAACAGCGCGTACGCCGATTTC
TGGCGGCGCGTGGCCACCCAGTTCAAGGGCAATGCCCGCGTCATCTTCGGGCTGA
TGAACGAGCCCAATTCGATGCCGACCGAGCAGTGG
Fig.6. Concatenated sequence obtained for a strain of Ralstonia solanacearum
Selected reading
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schffer, Jinghui Zhang, Zheng Zhang, Webb
Miller, and David J. Lipman (1997), "Gapped BLAST and PSIBLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Buddenhagen, I, L Sequeria and A Kelman. 1962. Designation of races of Pseudomonas
solanacearum. Phytopathology 52:726. (Abstract)
Castillo, Jose A., Greenberg, Jean T. 2007 Evolutionary dynamics of Ralstonia solanacearum Appl.
Environ. Microbiol.73: 1225-1238
Nalvo F. Almeida, Shuangchun Yan, Rongman Cai, Christopher R. Clarke, Cindy E. Morris, Norman
W. Schaad, Erin L. Schuenzel, George H. Lacy, Xiaoan Sun, Jeffrey B. Jones, Jose A.
Castillo, Carolee T. Bull, Scotland Leman, David S. Guttman, Joo C. Setubal, and Boris A.
Vinatzer 2010PAMDB, A Multilocus Sequence Typing and Analysis Database and Website
for Plant-Associated Microbes, Phytopathology 100:3, 208-215
Fegan M, Prior P (2005) How complex is the Ralstonia solanacearum species complex? In: Allen
C, Prior P, Hayward AC (eds) Bacterial wilt disease and the Ralstonia solanacearum species
complex. APS Press, St. Paul, pp 449461
Feil EJ, Li BC, Aanensen DM, Hanage WP, Spratt BG. 2004 eBURST: inferring patterns of
evolutionary descent among clusters of related bacterial genotypes from multilocus sequence
typing data. J Bacteriol. Mar;186(5):1518-30
Hayward, AC. 1964. Characteristics of Pseudomonas solanacearum. J. App. Bacteriol. 27:265-277
Hayward AC (1991) Biology and epidemiology of bacterial wilt caused by Pseudomonas
solanacearum. Annu Rev Phytopathol 29:6587
Kumar, A., Sarma, Y. R., and Anandaraj, M. 2004. Evaluation of genetic diversity of Ralstonia
solanacearum causing bacterial wilt of ginger using REP-PCR and PCRRFLP. Curr. Sci.
87:1555-1561.
Prior P, Fegan M (2005) Recent developments in the phylogeny and classification of Ralstonia
solanacearum. Acta Hortic 695:127136
Spratt BG, Hanage WP, Li B, Aanensen DM and Feil EJ. (2004) Displaying the relatedness among
isolates of bacterial species -- the eBURST approach. FEMS Microbiol Lett. Dec
15;241(2):129-34
Wicker E, Grassart L, Coranson-Beaudu R, Mian D, Guilbaud C, Fegan M, Prior P (2007) Ralstonia
solanacearum strains from Martinique (French West Indies) exhibiting a new pathogenic
potential. Appl Environ Microbiol 73:67906801
Yabuuchi E, Kosako Y, Yano I, Hotta H, Nishiuchi Y (1995) Transfer of two Burkholderia and an
Alcaligenes species to Ralstonia gen. Nov.: proposal of Ralstonia pickettii (Ralston, Palleroni
and Doudoroff 1973) comb. Nov., Ralstonia solanacearum (Smith 1896) comb. Nov. and
Ralstonia eutropha (Davis 1969) comb. Nov. Microbiol Immunol 39:897904
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 64
Rolling circle amplification-RACE (RCA-RACE)
K.Johnson George, and I.P. Vijesh Kumar
Indian Institute of Spices Research, Marikunnu P.O., Calicut- 673012.
Isolation of full-length gene transcripts is important to determine the protein coding region and study
gene structure. However, isolation of novel gene sequences is often limited to expressed sequence
tags (ESTs) (i.e., short cDNA fragments that predominantly represent the 3 end of the transcript).
Rapid amplification of cDNA ends (RACE) is today by far the most popular approach for obtaining
full-length cDNA when only part of the transcripts sequence is known. Since its original description
in 1988 by Frohman et al, numerous modifications and improvements of the method have been
developed and consist of a collection of PCR-based cloning procedures that extend a known cDNA
fragment toward the 3 (3 RACE) or the 5(5 RACE) cDNA end. The original method is based on
attachment of an anchor sequence to one end of the cDNA that can be used as a primer binding
template in PCR with a second gene-specific primer from the known part of the gene.
Alexios et. al. (2003) developed an improved inverse-RACE method, which uses CircLigase
(Epicentre Biotechnologies, Madison, WI, USA) for cDNA circularization, followed by rolling circle
amplification (RCA) of the circular cDNA with 29 DNA polymerase (New England Biolabs,
Ipswich, MA, USA). In this way, a large amount of the PCR template is produced, allowing the
simultaneous isolation of the 3 and 5 unknown ends of a virtually unlimited number of transcripts
after a single reverse transcription reaction. Figure 1 illustrates this method, named RCA-RACE. The
process takes advantage of the properties of CircLigase to circularize single-stranded cDNA
molecules via an intramolecular link. This ATPdependent ligase can circularize singlestranded DNA
(ssDNA) templates that have a 5-phosphate and a 3-hydroxyl group and are longer than 30
nucleotides. According to the manufacturer, under standard reaction conditions, the enzyme makes
essentially no linear or circular concatemers, since it catalyzes only intramolecular ligations. In
addition, although CircLigase is influenced by the ssDNA sequence, high concentrations of the
enzyme can effectively circularize difficult templates (www.epibio.com/pdftechlit/222pl085. pdf).
The circularized cDNA is then amplified in a RCA reaction using the 29 DNA polymerase and
random primers. This would allow the generation of enough template for the cloning of rare
transcripts, as well as high-throughput cloning of cDNA ends for large numbers of genes from scarce
tissue, which cannot be effectively performed with standard RACE methodologies. Variation of the
technique is the famRCA-RACE (Apostolos et al 2010) for amplification of isolating a family of
homologous cDNAs (Fig.1)
FIGURE 1 The family rolling circle amplification rapid amplification of cDNA ends (famRCA-
RACE) method (degenerate primers for isolation of members of a family of homologous genes
present in the mRNA preparation). In step 1, messenger RNA is reverse transcribed into cDNA using
an oligo(dT) primer harboring a 5 phosphorylated adaptor (circle). After RNaseH treatment, the
resulting cDNA in step 2 is circularized using CircLigase. The circular cDNA is then amplified by
RCA using 29 DNA polymerase (gray oval) and random hexamer primers (small squares attached
to gray ovals) to multicopy concatemers. For each transcript family of interest, an aliquot of the RCA
reaction serves as a template in an inverse PCR using degenerate primers to obtain simultaneously the
transcripts 5 and 3 ends (step 3). Degenerate primers are designed outworking (arrows) on
conserved regions (thicker regions on concatemers). An agarose gel with the range of the cloned PCR
products to isolate the genes is presented in step 3.
Labo
Fig. 1
Proto
Plant
length
be fol
1
2
3
4.
5
Natio
ratory Manu
1.
ocol:
Material: Pip
h gene viz., W
llowed.
. Extract to
. Synthesize
adaptor pri
Add 0.5 m
murine leu
to be incub
at 70C for
. Incubate th
using the Q
. Circularize
1X reactio
a total rea
heating at
kit(Qiagen
. Rolling Ci
containing
England B
polymeras
onal training o
ual
per colubrinum
WRKY expect
tal RNA usin
e firststrand cD
imer [5-GGC
mM dNTPs, 10
ukemia virus (
bated at 37C
r 15min.
he reaction at
QIAquick P
e half of the p
on buffer(Epic
action volume
t 80C for 1
n).
ircle Amplifi
g a 15-L ali
Biolabs),1X
e and
on Allele Mi
m challenge i
ted to be upre
ng Spectrum P
DNA using 3
CCACGCGTC
0 mM dithioth
(MMLV) rev
C for 1 h, follo
t 37C for 20
PCR purificati
purified cDNA
centre), 50M
e of 50l and
10 minutes.
ication (RCA
quot of the c
29 DNA p
either 10
ining 12
th
- 2
inoculated wi
egulated durin
PlantTotal RN
3g RNA in a
CGACTAGT
hreitol (DTT)
erse transcrip
owed by heat
min after the
ion kit
A (15 l )usin
M ATP and 1
d incubate at
Purify the m
A): RCA reac
circularized c
polymerase
0M rand
25
th
Sept, 201
ith Phytophth
ng the interac
NA kit(Sigma
a reaction con
TAC(T)18-3]
), 1 first stra
ptase. The rea
inactivation o
addition of 2
ng CircLigas
l 50 mM Mn
60C for 1 h
mixture usin
ctions may b
cDNA, 1 mM
reaction buf
dom hexam
11, IISR, Cali
ora capsici fo
ction. The foll
a-Aldrich)
ntaining 0.5 g
phosphoryla
and buffer, an
action ( total v
of MMLV rev
2 U RNaseH,
e - final con.5
nCl2 ( final c
hour. Inactiva
g QIAquick
be performed
M dNTPs, 20
ffer (NEB),
mers or
icut
or isolating a
lowing steps m
g oligo(dT)-
ated at the 5 e
nd 200 U Mol
volume 50 l)
verse transcri
Purify the rea
5 U/l (Epice
con. 2.5 mM/
ate the enzym
PCR purific
in 50 L vo
00g/ml BSA
10 U 29
1M In
65
full
may
end.
loney
) is
iptase
action
entre),
l) in
me by
cation
olume
A(New
DNA
nVUP
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 66
primer(5GTACTAGTCGACGCGTGGCC-3) both modified by the addition of two
phosphothioate linkages on the 3 end. Incubate at 30C for 21 hour and heat inactivate the
enzyme at 60C for 10 minutes. Verify RCA products by electrophoresis.
6. Inverse-PCR: Perform the Inverse-PCR reaction using using 0.5 L neat or serially diluted
(10-2, 10-4, and 10-6) RCA reaction as template, along with the 0.4M each of gene specific
forward and Reverse primer primers [To be designed based on gene of interest ] and
DyNAzyme II DNA polymerase (Finnzymes, Espoo, Finland). The cycling parameter of
94C for 3 min, followed by 35 cycles of 94C for30 s, 56C for 45 s, 72C for 1.5 min (
depending on the expected size of the product), and a final extension step of 72C for 10 min
are to be followed.
7. Run the PCR products in agarose gel stained with ethidium bromide.
8. Clone the large fragment obtained using TOPO- TA cloning kit (Invitrogen)
9. Screen the clones for the insert and do sequencing.
References:
Alexios N Polidoros, Konstantinos Pasentsis and Athanasios. S. Tsaftaris (2006) Rolling circle
amplification-RACE: a method for simultaneous isolation of 5 and 3 cDNA ends from amplified
cDNA templates. BioTechniques 41:35-42.
Apostolos Kalivas, Konstantinos Pasentsis, Anagnostis Argiriou, Nikos Darzentas and Athanasios S.
Tsaftaris (2010) famRCA-RACE: A Rolling Circle amplification RACE for isolating a family of
homologous cDNAs in one reaction and its application to obtain NAC genes transcription factors
from Crocus (Crocus sativus) flower. Preparative Biochemistry and Biotechnology 40:177-187.
Frohman MA, M K Dush and G R Martin (1988). Rapid production of full-length cDNAs from rare
transcripts: amplification using a single gene-specific oligonucleotideprimer. Proc. Natl. Acad. Sci.
USA 85:8998-9002.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 67
Protocols in development and analysis of mutants for functional genomics
Ramesh S. Bhat
Email: bhatramesh12@gmail.com
Department of Biotechnology
University of Agricultural Sciences, Dharwad 580 005
Conventional mutagenesis
Commonly used chemical mutagens are ethylmethane sulfonate, diepoxybutane, N-methyl-
N-nitrosourea and sodium azide. Irradiation mutagens include fast neutron, gamma irradiation and X-
rays and accelerated ions (Bhat et al., 2007).
Raising mutant populations
1. After treatment with an appropriate mutagen, M
1
generation is grown. Progenies of selfed M
1
plants are used to grow M
2
generation.
2. M
2
plants are used to prepare pooled DNA samples for reverse genetics screening, while their
seeds are inventoried.
3. Forward genetics screening (phenotypic analysis) is normally performed on M
3
plants.
4. For assaying quantitative traits, it is particularly important to advance the lines to M
4
or
beyond because of the need to evaluate phenotypes in replicated trials.
5. For the purpose of identifying mutated genes, it is better to aim for a moderate to high
mutation density in the genome so that fewer mutants are needed to achieve genome
coverage.
6. However, too high a dose presents practical problems. At high doses, lethality and sterility of
M
1
plants make it difficult to produce an appropriately large population in a single attempt.
7. Producing a useful mutant population therefore is often a trade-off between the need to
produce high-density mutations and the practicality of keeping a vigorous population without
too many deleterious effects and background mutations.
Forward genetics
Phenotyping
1. Mutant populations harbor a large amount of genetic variability that can be revealed when the
mutants are subjected to appropriate phenotypic screening.
2. Morphological mutants can be identified based on phenotypic categories.
3. Conditional mutants are studied with appropriate experimental conditions.
Map-based cloning
DNA sequence responsible for the trait is identified by walking down the chromosome
using genetic markers. Availability of the genome sequence will hasten gene identification
considerably. It can be used to identify any gene, given an adequate map of the region of the
chromosome in which it is located. The gene is first mapped to a specific region of a given
chromosome by genetic crosses. The gene is next localized on the physical map (in a library) of this
region of the chromosome. Candidate genes in the segment of the chromosome identified by physical
mapping are then isolated from mutant and wild type individuals and sequenced to identify mutations
that would result in a loss of gene function. An example of map-based cloning is the cloning of a
fertility restorer gene, Rf-1, in rice (Komori et al., 2004).
Detecting genomic changes using genome-wide chips
Single-feature polymorphisms (SFP) are detected using oligonucleotide (oligo) chips
containing 24-mer oligos representing genes to detect deletions (Borevitz et al., 2003; Chang et al.,
2003; Wang et al., 2004). Genes/probes that generate hybridization signals below those of the wild-
type cultivar (based on significant t-test) are considered as candidate genes. Genome coverage of the
oligoarray chip can be increased with newer versions of oligoarrays such as the 44K Agilent
oligoarray genome chips such as the 51K Affymetric GeneChip. Large deletions or multiple
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 68
mutations across the genome can be overcome by pooling of DNA from segregants with common
phenotypes. This also masks irrelevant mutations.
Differential cDNA screening
Useful for the identification of differentially expressed cDNAs. A recent resurgence in the
popularity of differential screening has come about through the development of DNA microarrays
(Meldrum, 2000). Alternatively, subtracted cDNA library is generated by enriching with differentially
expressed clones and by removing sequences that are common in two sources.
Arbitrarily primed PCR uses pairs of short arbitrary primers to amplify pools of partial cDNA
sequences. If the same primer combinations are used to amplify cDNAs from two different sources,
the products can be fractionated side by side on a sequencing gel, and differences in the pattern of
bands is generated, and reveal differentially expressed genes. In differential display PCR technique
(Liang and Pardee, 1992), the antisense primer is an oligo-dT primer with a specific two base
extension, which thus binds at the 3 end of the mRNA. Conversely, in the arbitrarily primed PCR
(Welsh et al., 1992), the antisense primer is arbitrary and can in principle anneal anywhere in the
mRNA. In both these methods, an arbitrary sense primer is used, allowing the amplification of partial
cDNAs from pools of several hundred mRNA molecules. Following electrophoresis, differentially
expressed cDNAs can be excised from the gel and characterized further.
In PCR subtraction method (Lisitsyn and Wigler, 1993), common sequences between two
sources are eliminated prior to amplification. cDNA from the two sources are prepared, digested with
restriction enzyme and amplified. The amplified products from one source (tester) are then annealed
to specific linkers that provide annealing sites for a unique pair of primers. These linkers are not
added to the driver cDNA. A large excess of driver cDNA is then added to the tester cDNA and the
populations are mixed. Driver/driver fragments posses no linkers and cannot be amplified, while
driver/tester fragments possess only one primer annealing site and will only be amplified in a linear
fashion. However, cDNAs that are present only in the tester will possess linkers on both strands and
will be amplified exponentially, and can therefore be isolated and cloned.
Global gene expression profiling can be made with large scale sequencing of random clones
from cDNA libraries. Further improvement in expression profiling has been made with serial analysis
of gene expression (SAGE) (Velculescu et al., 1995) and massively parallel signature sequencing
(MPSS) (Brenner et al., 2000).
Reverse genetics
PCR screening
Small to medium-sized deletions in genomes are detected through PCR analysis (Jansen et
al., 1997). This method identifies smaller than expected amplicons due to the presence of a deletion
(Li et al., 2001; Li and Zhang, 2002). Primers flanking a genomic region containing a target gene are
designed in such a way that the product generated by the wild-type allele is difficult to PCR amplify
because of its large size. When a deletion reduces the length of the region flanked by the primers, the
fragment with such deletion can often be amplified with higher efficiency. As a result, such smaller
product can be detected even if the DNA from the individual allele carrying the deletion is mixed
with DNA from many wild-type individuals.
TILLING
Targeting Induced Local Lesions in Genomes (TILLING) is a high-throughput reverse-
genetic technique for gene identification (Bhat et al., 2007). It is employed to discover point
mutations in the mutant libraries created via traditional chemical mutagenesis. TILLING approach
makes use of DNA strand mismatches formed between mutant and wild-type DNA. DNA from
individual M2 plants is isolated, pooled and arrayed in 96-well plates. Primers are designed to bracket
a 1-kb region that most likely contains a deleterious mutation in a target gene. The primers are then
used to amplify the gene of interest followed by denaturing and reannealing of DNA to allow
formation of homo- and heteroduplexes in the DNA pool. Originally, denatured high-performance
liquid chromatography (HPLC) was used to detect the presence of a DNA mismatch, but now it is
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 69
detected by enzymatic cleavage of PCR-amplified heteroduplexed DNA (Xin et al., 2008) and band
visualization using fluorescent endlabeling and denaturing polyacrylamide gel electrophoresis. A
modified procedure of TILLING, called EcoTILLING was applied to identify natural allelic variants
(Comai et al., 2004).
Gene tagging and trapping
Most popular mutagenesis strategy in functional genomics, where a piece of known DNA
(tag) is randomly inserted into the genome to have loss-of-function or gain-of-function of the tagged
gene (Guiderdoni et al., 2007; Johnson et al., 2007; Zhu et al., 2007; Upadhyaya et al., 2010). These
tags can be modified into traps by recombining reporters to gain additional information on the
expression pattern of the gene.
Insertional inactivation tagging with T-DNA
1. The construct may have gene trap feature (uni-directional or bidirectional), selection
markers/reporters and plasmid rescue cassette (Guiderdoni et al., 2007; Upadhyaya et al.,
2010).
2. Take up high throughput transformation, select the transgenics and confirm.
3. Check the copy number of T-DNA, select those with single copy.
4. Look for gene trap based on reporter gene expression.
5. Look for novel/mutant phenotype.
6. Isolate the flanking sequence tag using plasmid rescue or TAIL-PCR (Liu et al., 1995).
7. Identify the tagged gene.
8. Study the co-segregation between the tagged gene and mutant phenotype.
9. Validate gene function by complementation, RNAi etc.
Activation tagging with T-DNA
1. The construct shall have enhancer or strong promoter at one or both the ends of T-DNA along
with selection markers/reporters and plasmid rescue cassette (Johnson et al., 2007).
2. Take up high throughput transformation, select the transgenics and confirm.
3. Check the copy number of T-DNA, select those with single copy.
4. Look for novel/mutant phenotype.
5. Isolate the flanking sequence tag using plasmid rescue or TAIL-PCR. Identify the tagged
gene.
6. Study the co-segregation between the tagged gene and mutant phenotype.
7. Validate gene function by over-expression.
Insertional inactivation tagging with transposable element
1. The Ds construct may have gene trap feature (uni-directional or bidirectional), T-DNA
selection markers/reporters, Ds tracer, Ds excision marker, plasmid rescue cassette, T-DNA
gene trap counter selector, T-DNA repeat counter selector (Upadhyaya et al., 2006; Zhu et
al., 2007).
2. Ac construct can have T-DNA selection markers/reporters, transposase coding region, Ac
reporter.
3. Develop independent T-DNA/Ds lines and Ac lines by high throughput transformation. Select
Ds lines with single copy T-DNA/Ds and not showing T-DNA gene trap and repeats.
4. Alternatively develop double transformants by co-transformation (with Ds and Ac constructs)
or super-transformation. Select lines with single copy Ds.
5. Ds tagged mutants can be developed by
a. Independent Ds and Ac lines are crossed. F
1
s are mutagenic (contain both Ds and Ac).
Stable Ds tagged mutants are identified by screening F
2
s by Ds excision marker and
Ds tracer. Ac reporter is used to make sure that such plants are free from Ac.
b. Double transformants (T
1
) are mutagenic. Mutants are identified in T
2
and T
3
.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 70
c. Transient expression of transposase (TET) can be taken up by co-cultivating the calli
derived from plants showing single copy T-DNA/Ds with Ac construct. Mutants are
identified in T
1
and T
2
.
6. In the mutant, look for gene trap based on reporter gene expression.
7. Look for novel/mutant phenotype.
8. Isolate the flanking sequence tag using plasmid rescue or TAIL-PCR. Identify the tagged
gene.
9. Study the co-segregation between the tagged gene and mutant phenotype.
10. Validate gene function by complementation, RNAi, Ds reversion etc.
Activation tagging with transposable element
1. The Ds construct may have enhancer or strong promoter at one or both the ends of Ds, T-
DNA selection markers/reporters, Ds tracer, Ds excision marker, plasmid rescue cassette, T-
DNA gene trap counter selector, T-DNA repeat counter selector (Johnson et al., 2007).
2. Ac construct can have T-DNA selection markers/reporters, transposase coding region, Ac
reporter.
3. Develop independent T-DNA/Ds lines and Ac lines by high throughput transformation. Select
Ds lines with single copy T-DNA/Ds and not showing T-DNA gene trap and repeats.
4. Alternatively develop double transformants by co-transformation (with Ds and Ac constructs)
or super-transformation. Select lines with single copy Ds.
5. Ds tagged mutants can be developed by
a. Independent Ds and Ac lines are crossed. F
1
s are mutagenic (contain both Ds and Ac).
Stable Ds tagged mutants are identified by screening F
2
s by Ds excision marker and
Ds tracer. Ac reporter is used to make sure that such plants are free from Ac.
b. Double transformants (T
1
) are mutagenic. Mutants are identified in T
2
and T
3
.
c. Transient expression of transposase (TET) can be taken up by co-cultivating the calli
derived from plants with single copy T-DNA/Ds with Ac construct. Mutants are
identified in T
1
and T
2
.
6. In the mutant, look for gene trap based on reporter gene expression.
7. Look for novel/mutant phenotype.
8. Isolate the flanking sequence tag using plasmid rescue or TAIL-PCR. Identify the tagged
gene.
9. Study the co-segregation between the tagged gene and mutant phenotype.
10. Validate gene function by Ds reversion and over-expression etc.
Reverse genetics with tagged mutants
In the reverse genetics approach, one starts with a computer predicted gene from the genome
sequence and searches for an insertion mutant in that gene. Oligonucleotide primers from the
insertional element and from the gene of interest are used for PCR amplification. Appropriately
pooled DNA samples are used for high throughput screening for this often rare event in such
populations. Once a mutation in the gene of interest has been identified homozygotes are isolated and
the phenotype confirmed.
Trans-activation
Used to activate a gene in a target tissue and cell type, where it is usually active. Trans-
activation system makes use of enhancer trapping (Johnson et al., 2005; Johnson et al., 2007)
efficiency of yeast GAL4 transcriptional activator.
1. GAL4 construct will have a minimal promoter-driven Gal4 and a reporter gene driven by
upstream activating sequence (UAS) within T-DNA containing a selection marker.
2. Driver lines expressing GAL4 are produced by high throughput transformation. Tissue and
cell specific expression of GAL4 is determined by reporter gene expression.
3. Endogenous responder lines are produced by high throughput transformation using T-DNA
carrying single or multiple UAS elements and along with selection markers.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 71
4. Cell or tissue specific activation tagged mutants are generated by crossing endogenous
responders with selected GAL4 enhancer traps (driver lines)
5. The effect of gene activation in a mutant is studied for assigning the gene function.
RNA silencing
RNAi provides a new, reliable reverse genetic method to investigate gene function (Curtin et
al., 2007). Several platforms have been developed for delivering gene silencing in plants (Watson et
al., 2005; Curtin et al., 2007). Sense strand transgene expresses the same mRNA as that of the target
gene produces, whereas antisense transgenes express RNA complementary to target mRNA. In
amplicon transgenes, the cDNA of a virus, driven by a constitutive promoter (such as CaMV 35S), is
recombined with a target gene of interest. Hairpin RNA (hpRNA) is expressed from an inverted
repeat construct consisting of a promoter, a targeted sense sequence, a spacer region, a
complementary targeted antisense sequence, and a transcription terminator. The use of an intron
instead of a nonspecific DNA spacer has been shown to increase the silencing efficiency of these
hairpins. In direct repeat induced PTGS (driPTGS), RNA encoding multiple-copy direct repeats of the
target gene is expressed. Silencing efficiency of direct-repeat transgenes can be further improved by
increasing the number of repeats to three or four rather. To overcome the problem of multiple cloning
steps, Gateway system can be used. In another strategy called constructs with 3 inverted repeat, a
fragment of the target gene is fused at the 3 end with an inverted repeat arrangement of a nontarget
sequence. A transgene strategy that is similar to SHUTR uses a transgene of which the 5 end
contains both an inverted repeat and a direct repeat of the 5-UTR of an ethylene biosynthetic gene,
followed by the coding sequence of the target gene.
Artificial miRNAs are designed to target a specific gene, or a group of related genes using
naturally occurring miRNA precursor sequences as a backbone. Original miRNA sequence and its
complementary strand (miRNA*) are substituted with amiRNA and amiRNA* sequences,
respectively. Initial stem-loop structures of the natural miRNA precursor are well maintained, and
that the composition of the amiRNA sequences closely imitate those of the natural miRNAs. Other
important parameters include a preference for uridine at the 5 terminus and an adenine at the tenth
base of the amiRNA as these nucleotides are highly conserved in natural miRNA populations as well
as in highly efficient siRNAs. A mismatch corresponding to the 5 end of the amiRNA sequence in
the amiRNA/amiRNA* molecule, is included to increase the likelihood that the amiRNA strand is
preferentially incorporated into the RISC complex. To avoid the possibility of transitive RNA
silencing, triggered by a perfectly matching amiRNA hybridizing to the target and acting as a primer
for RDR6, one to three mismatches are incorporated into the 3 end of the amiRNA.
Virus-Induced Gene Silencing
It is a transient PTGS of plant genes by recombinant viruses carrying a near-identical
sequence (Baulcombe, 1999; Burch-Smith et al., 2004). A 300- to 800-nt exogenous sequence is
inserted into a specific location within the cDNA of PVX or TRV without the loss of infectivity of the
RNA transcript. Recombinant virus is allowed to infect the plant. Viral infections can be established
with naked viral RNA without the presence of coat proteins. In vitro-synthesized RNA transcripts,
from a plasmid containing a cDNA encoding a complete virus genome, can also initiate virus
infections. Also, the viral cDNA can be cloned into the T-DNA of Agrobacterium, which is delivered
to a plant by agroinfiltration and expressed by the CaMV 35S promoter to initiate infections.
Microprojectile bombardment can also be used for introducing DNA, and sometimes RNA, into cells.
Satellite virus-induced silencing system (SVISS) uses the vectors derived from RNA and DNA
satellite viruses. The target sequence is inserted into the satellite and coinoculated with its respective
helper virus, either TMV or tomato yellow leaf curl (TYLCV) geminivirus.
Gene targeting
Targeted mutagenesis is a powerful revesre genetic tool for generating specific and precise
DNA sequence alterations that enable a greater understanding of gene function (Iida et al., 2007).
Recently, zinc finger nucleases (ZFNs), which are the chimeric proteins composed of a synthetic zinc
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 72
fingerbased DNA binding domain and a DNA cleavage domain are being used to create double
strand breaks at specific sites. Such cleavages are then repaired by error-prone non-homologous end
joining (NHEJ). Hence this mode can be successfully used for gene targeting (Lloyd et al., 2005;
Osakabe et al., 2010; Zhang et al., 2010).
FOX hunting
FOX hunting system (full-length cDNA overexpressor gene hunting system) is an alternative
method of producing gain-of-function mutants for the ectopic expression of plant genes (Ichikawa et
al., 2006). A gene is over-expressed using DNA or cDNA of its own or from other system. Generally
it is done with constitutive promoter leading to ectopic expression. The mutants developed are called
gain-of-function mutants. In FOX hunting system, each cDNA from a normalized full-length cDNA
is introduced into plant, and the transgenic plant is observed for mutant phenotype (Nakamura et al.,
2007; Kondou et al., 2009).
References:
Baulcombe, D. C., 1999, Curr. Opin. Plant Biol., 2 (2): 109-113.
Bhat, R. S., et al., 2007. In: Upadhyaya N. M., ed. New York, USA. Springer Life Sciences, pp. 149-
180.
Borevitz, J. O., et al., 2003, Genome Res., 13 (3): 513-523.
Brenner, S., et al., 2000, Nat. Biotechnol., 18 (6): 630-634.
Burch-Smith, T. M., et al., 2004, Plant J., 39 (5): 734-746.
Comai, L., et al., 2004, Plant J., 37 (5): 778-786.
Curtin, S. J., et al., 2007. In: Upadhyaya N. M., ed. New York. Springer Life Sciences, pp. 291-332.
Guiderdoni, E., et al., 2007. In: Upadhyaya N. M., ed. New York. Springer Life Sciences, pp. 181-
222.
Ichikawa, T., et al., 2006, Plant J., 48 (6): 974-985.
Iida, S., et al., 2007. In: Upadhyaya N. M., ed. New York. Springer Life Sciences, pp. 273-289.
Jansen, G., et al., 1997, Nat. Genet., 17 (1): 119-121.
Johnson, A. A. T., et al., 2005, Plant J., 41 (5): 779-789.
Johnson, A. A. T., et al., 2007. In: Upadhyaya N. M., ed. New York. Springer Life Sciences, pp. 333-
353.
Komori, T., et al., 2004, Plant J., 37 (3): 315-325.
Kondou, Y., et al., 2009, Plant J., 57 (5): 883-894.
Li, X., et al., 2001, Plant J., 27 (3): 235-242.
Li, X., et al., 2002, Funct Integr Genomics, 2 (6): 254-258.
Liang, P., et al., 1992, Science, 257 (5072): 967.
Lisitsyn, N., et al., 1993, Science, 259 (5097): 946.
Liu, Y. G., et al., 1995, Plant J., 8 (3): 457-463.
Lloyd, A., et al., 2005, Proc. Natl. Acad. Sci. U.S.A., 102 (6): 2232-2237.
Meldrum, D., 2000, Genome Res., 10 (9): 1288.
Nakamura, H., et al., 2007, Plant Mol. Biol., 65 (4): 357-371.
Osakabe, K., et al., 2010, Proc. Natl. Acad. Sci. U.S.A., 107 (26): 12034-12039.
Upadhyaya, N. M., et al., 2010. In: Pereira A., ed. New York. Springer Life Sciences, pp. 147-177.
Upadhyaya, N. M., et al., 2006, Theor. Appl. Genet., 112 (7): 1326-1341.
Velculescu, V. E., et al., 1995, Science, 270 (5235): 484.
Wang, G. L., et al., 2004, Theor. Appl. Genet., 108 (3): 379-384.
Watson, J. M., et al., 2005, FEBS Lett., 579 (26): 5982-5987.
Welsh, J., et al., 1992, Nucleic Acids Res., 20 (19): 4965.
Xin, Z., et al., 2008, BMC Plant Biol., 8 (1): 103.
Zhang, F., et al., 2010, Proc. Natl. Acad. Sci. U.S.A., 107 (26): 12028-12033.
Zhu, Q. H., et al., 2007. In: Upadhyaya N. M., ed. New York. Springer, pp. 223271.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 73
Quantitative RT-PCR
Prasath D., Johnson K. George, Vijesh Kumar I.P.
Indian Institute of Spices Research, Calicut673012,Kerala
Introduction
The real-time polymerase chain reaction uses uorescent reporter dyes to combine DNA
amplication and detection steps in a single tube format. The increase in uorescent signal recorded
during the assay is proportional to the amount of DNA synthesised during each amplification cycle.
Individual reactions are characterised by the cycle fraction at which uorescence rst rises above a
dened background uorescence, a parameter known as the threshold cycle (Ct) or crossing point
(Cp). Consequently, the lower the Ct, the more abundant the initial target. This correlation permits
accurate quantication of target molecules over a wide dynamic range, while retaining the sensitivity
and specicity of conventional end-point PCR assays. The homogeneous format eliminates the need
for postamplication manipulation and signicantly reduces hands-on time and the risk of
contamination. Real-time PCR is often abbreviated to qPCR, although that abbreviation is not
universally accepted.
Real-Time chemistries allow for the detection of PCR amplification during the early phases
of the reaction. Measuring the kinetics of the reaction in the early phases of PCR provides a distinct
advantage over traditional PCR detection. Traditional methods use Agarose gels for detection of PCR
amplification at the final phase or end-point of the PCR reaction.
There are three main chemistries in general use:
Intercalating dyes, such as SYBR-Green, which fluoresce upon light excitation when bound to
double stranded DNA. These are cheap, easily added to legacy assays and amplification products can
be verified by the use of melt curves. They can lack specificity and fluorescence varies with amplicon
length. In general, they are one Ct or so more sensitive than probe-based assays.
Fluorophores attached to primers, e.g. Invitrogen's Lux or Promega's Plexor primers. These are
relatively inexpensive and amplification products can be verified by melt curves. Specificity depends
on the primers and specific, usually company-specific design software needs to be used for optimal
performance. This is not necessarily a bad thing (indeed the Plexor software is very useful), but it is
not always possible to change primer design parameters.
Hybridisation-probe based methods, e.g. hydrolysis (TaqMan) or Molecular Beacons. These are
the most specific, as products are only detected if the probes hybridise to the appropriate
amplification products. There are many variations on this theme, with melt curve analysis possible for
some chemistries. Their main disadvantages are cost, complexity and occasional fragility of probe
synthesis. There are potential problems associated with the fact that probe-based assays do not report
primer dimers that can interfere with the efficiency of the amplification reaction.
The 5 Nuclease Assay In the 5 nuclease assay, an oligonucleotide called a TaqMan Probe is
added to the PCR reagent master mix. The probe is designed to anneal to a specific sequence of
template between the forward and reverse primers. The probe sits in the path of the enzyme as it starts
to copy DNA or cDNA. When the enzyme reaches the annealed probe the 5 exonuclease activity of
the enzyme cleaves the probe.
SYBR Green Dye SYBR Green chemistry is an alternate method used to perform real-time PCR
analysis. SYBR Green is a dye that binds the Minor Groove of double stranded DNA. When SYBR
Green dye binds to double stranded DNA, the intensity of fluorescent emissions increases. As more
double stranded amplicons are produced, SYBR Green dye signal will increase. SYBR Green dye
will bind to any double stranded DNA molecule, while the 5 Nuclease assay is specific to a pre-
determined target. The increase in reporter signal is captured by the Sequence Detection instrument
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 74
and displayed by the software. The Figure below shows an increase in the reporter signal over time.
The amount of reporter signal increase is proportional to the amount of product being produced for a
given sample. When the fluorescent signal Reporter increases to a detectable level it can be captured
and displayed as an Amplification Plot, The Amplification Plot contains valuable information for the
quantitative measurement of DNA or RNA. The Threshold line is the level of detection or the point at
which a reaction reaches a fluorescent intensity above background. The threshold line is set in the
exponential phase of the amplification for the most accurate reading. The cycle at which the sample
reaches this level is called the Cycle Threshold, Ct. These two values are very important for data
analysis using the 5 nuclease assay.
Protocol
Reaction Set up
1. Gently vortex and briefly centrifuge all solutions after thawing.
2. Prepare a reaction master mix by adding the following components(except template DNA)
3. Usually the total reaction volume is 25l, prepare reaction as follows :
Reagents Con. required volume
SYBR Green Qiagen master
Mix(2X)
2X 12.5l
Foraward primer 10pM 1l
Reverse Primer 10pM 1l
Template cDNA (diluted cDNA) 5l
Nuclease free water - 9.5l
Total = 25 l
Reaction Conditions
95C 5 min - 1 cycle
95C 30 sec
58C 30 sec 35 cycles
72C 30 sec
Hold @ 4
o
C
Labo
ampli
RNA.
incub
produ
pyrop
equip
DNA
comp
poten
low c
sensit
two o
specif
highe
succe
al., 20
al., 20
Desig
FIP (
the F2
F3: F
BIP (
the B2
B3: B
FL (F
and F
BL (
B1 an
Natio
ratory Manu
L
Loop med
ification of D
. LAMP is a
ation thereby
uct can be
phosphate in
ment (Notom
amplificatio
lexation of m
ntial to be use
cost, LAMP c
tivity. In LAM
or three sets
fic nature of t
r than PCR
ssfully used f
000; Mori et a
010) .
gning prime
(Forward I
2c region, and
Forward Oute
(Backward
2c region, and
Backward Out
Forward Lo
2 regions on
Backward
nd B2 regions
onal training o
ual
Loop medi
Indian
diated isothe
NA. It may b
a novel appro
y obviating t
by photome
solution or w
mi et al., 2000
on is possibl
manganese b
ed as a simple
could provide
MP, the target
of primers an
the action of t
based ampli
for the detect
al., 2001; Nag
ers
nner Prime
d the same seq
r Primer cons
Inner Prim
d the same se
ter Primer con
oop): sequen
the 5' end of
Loop): sequ
s on the 5' end
on Allele Mi
iated isoth
n Institute of S
ermal amplif
be combined
oach to nucl
the need for
etry for turb
with addition
0; Mori et al.,
le using man
by pyrophosp
e screening a
e major adva
t sequence is
nd a polyme
these primers
ification. Hen
tion several fu
gamine et al.,
er): consists
quence as the
sists of the F3
mer): consists
equence as the
nsists of the B
nces complem
the dumbbell
uences compl
d of the dumb
ining 12
th
- 2
hermal am
A.I.Bh
Spices Resear
fication (LAM
with a revers
eic acid amp
expensive th
bidity caused
n of SYBR g
, 2001; Nagam
nganese load
phate during
assay in the fi
antages comp
amplified at
rase with hig
s, the amount
nce, LAMP
fungal, bacter
2002; Fukuta
of the F2 reg
e F1c region a
3 region that i
s of the B2 re
e B1c region
B3 region that
mentary to the
l-like structur
ementary to t
bbell-like struc
25
th
Sept, 201
mplificatio
hat
rch, Calicut 6
MP) is a s
se transcriptio
plification wh
hermal cycler
d by increa
green, a colo
mine et al., 2
ded calcein w
in vitro DN
field. Due to
pared to PCR
a constant tem
gh strand dis
of DNA prod
can also be
rial and viral
a et al., 2003;
gion (at the 3
at the 5' end.
is complemen
egion (at the 3
at the 5' end
t is compleme
e single strand
re
the single str
cture
11, IISR, Cali
n (LAMP
73 012, Kera
single tube t
on step to allo
hich uses a s
rs. Detection
asing quantit
or change can
2002). Also in
which starts
NA synthesis
its simplicity
R without com
mperature of
splacement ac
duced in LAM
quantitative.
pathogens of
; 2004; Nie, 2
' end) that is
ntary to the F3
3' end) that is
entary to the
ded loop regi
randed loop r
icut
)
la
technique fo
ow the detecti
single tempe
n of amplific
ty of magne
n be seen w
n-tube detecti
fluorescing
s. LAMP ha
y, ruggedness
mpromising o
f 65 C using
ctivity. Due t
MP is conside
LAMP has
f plants (Noto
2005; Tomlin
complement
3c region
complement
B3c region
ion between t
region betwee
75
or the
ion of
rature
cation
esium
without
ion of
upon
as the
s, and
on the
either
to the
erably
been
omi et
son et
ary to
tary to
the F1
en the
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 76
General considerations for primer design:
The distance between 5' end of F2 and B2 is considered to be 120-180bp, and the distance between
F2 and F3 as well as B2 and B3 is 0-20bp; The distance for loop forming regions (5' of F2 to 3' of F1,
5' of B2 to 3' of B1) is 40-60bp; About 50-60% in the case of GC rich and Normal, about 40-50% for
AT rich; Primers should be designed so as not to easily form secondary structures. 3' end sequence
should not be AT rich or complementary to other primers; If the restriction enzyme sites exist on the
target sequence, except the primer regions, they can be used to confirm the amplified products.
Performing LAMP
General procedure for performing LAMP reaction includes isolation of nucleic acid, amplification
and detection. In order to perform amplification, six primers (FIP, F3, BIP, B3; F- Loop and B-Loop),
DNA polymerase with strand displacement activity, substrates (deoxynucleotide triphosphate), and
the reaction buffer are required. The procedure simply consists of incubating the template sample and
the above reagents at a constant temperature between 60-65C for 15 minutes to 1 hour. The presence
of amplified product can be detected in a short time so as to provide a simple and rapid gene
amplification method. Both simple detection and real-time detection of the reaction are possible.
Various detection methods include:
Visual methods:
- The turbidity of magnesium pyrophosphate, a by-product of the amplification reaction, is
produced in proportion to the amount of amplified products. Since LAMP amplification can
produce extremely large amount of amplified products, white turbidity can be visually
observed. From this feature, the presence of turbidity can indicate the presence of target gene
and visual detection can be achieved
- If the tube containing the amplified products in the presence of fluorescent intercalating dye
(ethidium bromide, etc.) is illuminated with a UV lamp, the fluorescence intensity increases.
From this feature, the presence of fluorescence can indicate the presence of target gene and
visual detection can be achieved.
Detection by electrophoresis
- LAMP products are run on a 2% agarose gel. Electrophoresis pattern of LAMP amplified
product is not a single band but a ladder pattern because LAMP method can form amplified
products of various sizes consisting of alternately inverted repeats of the target sequence on
the same strand.
Procedure
DNA isolation
Samples used: Piper yellow mottle virus infected black pepper. The procedure is as follows:
1. Grind 100 mg of leaf tissue in 500 l extraction buffer (100mM Tris Hcl (pH8.0), 4mM
EDTA,1.4 mM NaCl, 2% CTAB, 1% PVP,0.5% -Mercaptoethanol) using chilled mortar
and pestle and collect the filtrate in an Eppendorff tube.
2. Incubate in a water bath at 65
o
C for 30 min.
3. The homogenate is allowed to cool to room temperature and add equal volume of
Phenol:Chlorofom:Isoamylalcohol (25:24:1) and mix well.
4. Centrifuge at 2500g for 10 min at room temperature.
5. Collect the supernatant in a new tube and add 0.1 V of 10% CTAB, equal volume
Chloroform:isoamylalcohol (24:1)and mix well.
6. Centrifuge at 2500g for 10 min at room temperature.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 77
7. Collect the supernatant in a new tube and add 0.1V of 3M sodium acetate (pH 5.2) and add
equal volume of ice-cold isopropanol.
8. Mix well and incubate in ice for 30min.
9. Centrifuge the mixture at 10,000 rpm for 15 min at 4
o
C.
10. Discard the supernatant. Add about 500 l of 70% ethanol to the pellet and centrifuge for 5
min at 12,000 rpm.
11. Discard the supernatant and air dry the pellet.
12. Dissolve the pellet in 100 l of HPLC grade water and store the DNA at -20
o
C.
LAMP reaction mix
Thermopol buffer (10x) 2.5 l
MgSO
4
(50 mM/l) 4.0 l
dNTP mix (10 mM/l) 3.5 l
F3 Primer (10 M/l) 0.5 l
B3 primer (10 M/l) 0.5 l
FIP Primer (100 M/l) 0.5 l
BIP primer (100 M/l) 0.5 l
F-Loop primer (100 M/l) 0.25 l
B-Loop primer (100 M/l) 0.25 l
Betaine (5M) 5.0 l
Bst Polymerase (8U/ l) 1.0 l
Water 5.5 l
Template 1.0 l
Total 25.0 l
Incubate the above reaction mix at 65 C for 60 min followed by 80 C for 10 min
Products (10 l) run on 1.2% agarose gel at 130 V for 45 min along with a marker. Positive reaction
identified by presence of multiple bands of different sizes.
Selected references
Fukuta, S., Iida, T., Mizumkami, Y., Ishida, A., Ueda, J., Kanbe, M., and Ishimoto, Y. 2003.
Detection of Japanese yam mosaic virus by RT-LAMP.Arch. Virol. 148:1713-1720.
Fukuta, S., Ohishi, K., Yoshida, K., Mizukami, Y., Ishida, A., and Kanbe,M. 2004. Development of
immunocapture reverse transcription loopmediated isothermal amplification for the detection
of Tomato spotted wilt virus from chrysanthemum. J. Virol. Methods 121:49-55.
Mori, Y., Nagamine, K., Tomita, N., and Notomi, T. 2001. Detection of loop-mediated isothermal
amplification reaction by turbidity derived from magnesium pyrophosphate formation.
Biochem. Biophys. Res. Commun. 289:150-154.
Yasuyoshi Mori, Masataka Kitao, et al. 2004.Real-time turbidimetry of LAMP reaction for
quantifying template DNA. Journal of Biochemical and Biophysical Methods, Vol.59 145-
157.
Nagamine, K., Hase, T., and Notomi, T. 2002. Accelerated reaction by loop-mediated isothermal
amplification using loop primers. Mol. Cell.Probes 16:223-229.
Nie, X. 2005. Reverse transcription loop-mediated isothermal amplification of DNA for detection of
Potato virus Y. Plant Dis. 89:605-610
Notomi, T., Okayama, H., Masubuchi, H., Yonekawa, T., Watanabe, K., Amino, N., and Hase, T.
2000. Loop-mediated isothermal amplification of DNA. Nucleic Acids Res. 28:e63.
Tomlinson, J.A., Dickinson, M.J. and Boonham, N. 2010. Rapid detection of Phytophthora ramorum
and P. kernoviae by two minute DNA extraction followed by isothermal amplification and
amplicon detection by generic lateral flow device. Phytopathology, 100: 143-149.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 78
Two Dimensional Gel Electrophoresis
R. Viswanathan,
Sugarcane Breeding Institute, Coimbatore 641007
P. R. Rahul,
Division of Crop Improvement, IISR, Calicut 673012
Introduction
Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) is a method of protein
separation, by which proteins in a mixture are separated according to their isoelectric point (pI) in the
horizontal direction (isoelectric focusing [IEF]) and molecular weight in the vertical direction
(sodium dodecyl sulfate polyacrylamide gel electrophoresis [SDS-PAGE]). 2D-PAGE is used for the
isolation/separation/purification of proteins and further characterization with mass spectrometry and
identification of specific proteins. The isoforms of a protein can easily be isolated with 2D-PAGE.
2.1. Sample Preparation
Appropriate sample preparation is absolutely essential for excellent 2-D results. In general, it
is advisable to keep sample preparation as simple as possible. A sample with low protein and high salt
concentration, for example, could be diluted normally and analyzed or desalted, then concentrated by
lyophilization or precipitated with TCA and ice-cold acetone, then re-solubilized with rehydration
solution. The composition of sample solution is particularly critical for 2-D because solubilization
treatments for the first-dimension separation must not affect the protein pI, nor leave the sample in a
highly conductive solution. The protocol described herein has been used in the Plant Pathology lab,
Sugarcane Breeding Institute, Coimbatore. The protocol has been standardized for use with sugarcane
leaf and cane samples. Suitable methodologies may be adapted for different crops based on
experimental results using different methods of protein extraction.
Sample preparation includes the following steps:
1. Take one gram of fresh tissue, grind in liquid nitrogen to a fine powder.
2. Resuspend the powder in an ice-cold solution of 10% w/v trichloroacetic acid (TCA) in
acetone with 0.07% w/v Dithiotrietol (DTT) for at least 1 h at -20 C.
3. Centrifuge it for 30 min at 12,000 rpm and discard the supernatant.
4. Rinse the pellet thrice with acetone containing 0.07% w/v DTT for 1 h at -20 C.
5. Lyophilize the pellet for two hours to remove any traces of acetone.
6. Solubilize the resulting lyophilized powder in lysis buffer (7 mM urea, 4% CHAPS, 14 mM
DTT, and 0.2% Ampholyte) for 1 h at 37C.
7. Centrifuge at 12,000 rpm for 15 min.
8. Collect the supernatant in a fresh tube.
9. Quantify the protein concentration using the Bradford method (Bradford, 1976).
Note:
- The samples must be stored at -80 C, if stopped at any step during the sample preparation.
- All the reagents and buffers should be prepared with ultra pure chemicals and use ddH
2
O in
all the steps.
- DTT should be added freshly wherever applicable.
2.2. Rehydration
Dissolve 100g of protein in rehydration buffer containing 8 M urea, 2% w/v CHAPS, 18 mM
DTT, 0.5% w/v IPG buffer pH 47 and a trace of bromophenol blue. The steps mentioned below are
to be carried out.
1. Clean the Immobiline strip tray (Ettan IPGphor) and wipe out with paper towels and Kimwipes.
2. Take the strips from - 80 C and remove the plastic cover carefully.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 79
3. Apply the sample on the strip tray and carefully place the strip over the sample, ensuring that the
entire length of the strip touches the sample.
4. Cover the strip tray with coverfluid and close the tray.
5. Leave the sample tray at room temperature for 18 h.
Note : Care should be taken to wear gloves while handling the strips and ensure that the gel side of
the strip faces down.
2.3. First Dimension Isoelectric focusing (IEF)
1. After rehydration, wet the pre-made IEF strips with HPLC-grade water.
2. Dry the strips slightly between two pieces of Whatman paper to remove water.
3. Make sure that the square end of each strip is at the cathode (Black/-) and the +pointed end at
the anode (Red/+). Also note that the anode and cathode ridges are in the correct orientation.
4. Electrophores for 24-36 hr or 45000 Vhrs, using the following sequence of settings:
Voltage Amps Wattage Time
500 V 100mA 33V 1 hr
1000 V 110mA 70V 1 hr
2950 V 140mA 32V 24 hr
5. Bromophenol blue migrates towards the anode within 1 hr from the start of the electrophoresis.
Note: If by the next day the bromophenol blue has not disappeared (strip becomes colorless),
continue running until the dye disappear.
2.4. Second Dimension
2.4.1. Equilibration
Equilibrate the focused strips twice for 15 min in 20 ml equilibration solution as follows:
1. First equilibration in solution containing 6 M urea, 30% glycerol, 2% SDS, 2%
DTT, 50 mM TRIS-HCl buffer (pH 8.8).
2. Second equilibration in solution containing 2.5% iodoacetamide in place of DTT.
2.4.2. Second-Dimension SDS-PAGE:
Perform second dimensional electrophoresis with 1 mm thick, 12 % SDS- polyacrylamide gel
in BioRad Protean xi - vertical slab gel electrophoresis unit.
Casting of acrylamide gels (70 ml) and Reagents Preparation:
Acrylamide gel % 12 % 15 %
40 % Acrylamide 21 ml 26.5 ml
1.5 M Tris HCl pH 8.8 17.5 ml 17.5 ml
ddH
2
O 31.5 ml 26.25 ml
10% (w/v) APS* 0.35 ml 0.35ml
TEMED 17.5 l 17.5 l
*Ammonium persulphate (APS): 0.1 g per ml of ddH
2
O. Prepare freshly each time.
Electrophoresis buffer (25 mM Tris, 190 mM glycine and 0.1% SDS)
1% agarose in electrophoresis buffer.
After casting the gel, perform the following steps:
1. Add 1x running buffer powder and 2 liter ddH
2
O to the electrophoresis tank.
Allow the buffer to mix and cool at least 3 hr before running the gels.
2. Rinse top of gel with 1x electrophoresis buffer.
3. Align the acidic end of the strip with the left end of the gel and lower the strip carefully using a
pair of forcep. Make sure that the strip lies flat on the surface of the gel by gently pressing down
with two rulers. Eliminate bubbles between the gel surface and the strip.
4. Overlay 1-2ml of agarose and allow it to solidify (10 min).
5. Place the 2-D gels into the electrophoresis tank by sliding gel plate sets between the rubber gaskets.
Lubricate the gasket with the running buffer prior to inserting the plates. Make sure gaskets are
not folded and that they form a smooth seal along the entire length of each set of gel plates.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 80
6. Place the lid over the gel tank, ensuring the electrodes make good connection.
Plug leads into power supply.
7. Run the gel at a constant current of 8 mA for 18-20 hr.
8. Turn off the power supply and remove gels from unit
9. Gels can be stained by either coomassie brilliant blue (CBB) staining or silver staining methods.
Note: Do not let the agarose to solidify for more than 15 min as the proteins will begin to diffuse. If
agarose solution fails to polymerize fully within the first 10 min, place the gel plate at 4C for
approximately 2 to 4 min.
2.5. Gel Staining
I. Coomassie brilliant blue staining
Reagents:
Fixing solution : 7% acetic in 50% methanol prepared with ddH
2
O
Dye solution : 0.1% Coomassie brilliant blue R 250 in fixing solution
Destaining solution : 5% acetic acid in 20% methanol prepared with ddH
2
O
1. After SDS-PAGE, transfer the gel to a plate containing the fixing solution and shake for at least 1
h.
2. Pour out the fixing solution, and replace with the dye solution and incubate for 20 min.
3. Destain the gel with destaining solution and continue with fresh solution until the background is
clear.
4. Wash the gel thrice with ddH
2
O for 5 min.
5. Acquire the image of the gel in a densitometer (Bio-Rad or GE health care).
6. Gels can be stored in ddH
2
O at 4C for several months.
II. Silver Staining
Reagents
Fixing solution : 10% acetic acid in 40% ethanol prepared with ddH
2
O.
Sensitizing solution* : 30% ethanol, 0.2% sodium thiosulphate and 6.8%
sodium (oxidizer) acetate in ddH
2
O.
Silver nitrate solution * : 0.25% silver nitrate in ddH
2
O.
Developing solution* : Add 0.015% formaldehyde to 2.5% sodium carbonate
prepared in ddH
2
O just before use.
Stop solution : 5% acetic acid in ddH
2
O.
Note: * - To be prepared fresh.
Staining procedure (with gentle shaking all through)
1. Place the gel in a tray containing fixing solution and agitate on a shaker for at least 1 hr. Ensure
that the fixer solution covers the gel completely.
2. Drain the fixer solution from the tray.
3. Add sensitizing solution and agitate for 30 min.
4. Drain the sensitizing solution from the tray.
5. Wash the gel thrice in ddH
2
O for 5 min.
6. Add silver nitrate solution and agitate for 20 min.
7. After 20 min, drain the silver nitrate solution into an appropriate waste beaker.
8. Wash the gel twice in ddH
2
O for 1 min.
9. Add developing solution to the gel, and agitate until yellow or until brown "smokey" precipitate
appears. Then pour off developer, add fresh developer as needed and continue in this manner
until desired intensity of spots is achieved.
10. Drain the developing solution, add stopper solution and agitate for 10 min.
11. Wash the gel thrice in ddH
2
O for 5 min.
12. Acquire the image of the gel with a densitometer.
13. Store the gels in ddH
2
O at 4C.
Note: Gels can be stored at 4C in ziplock bags for up to two years.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 81
2.6. Image Analysis
Analyze gel image using IMAGE MASTER Software (GE Health care) or other suitable softwares
and mark protein spots for excision.
2.7. Excision of Protein Spots (for sequencing by Mass spectrometry)
1. Assign the spot(s) in the gel that are to be sequenced.
2. Cut out the protein spot with a pipette tip.
3. Transfer the gel piece to a microfuge tube.
4. Chop up the gel piece with pipette tip.
5. Add a solution of 50% methanol/10% acetic acid to the gel pieces.
6. Incubate for 30 min.
7. Spin down and discard the supernatant.
8. The sample is ready for Mass Spectrometry Sequencing.
2.8. Useful Links and references :
Ramesh Sundar, A., Nagarathinam, S., Ganesh Kumar, V., Rahul, P.R., Raveendran, M., Malathi, P.,
Ganesh Kumar, A., Rakwal, R., Viswanathan, R. (2010) Sugarcane proteomics: Establishment of a
protein extraction method for 2-DE in stalk tissues and initiation of sugarcane proteome reference
map. Electrophoresis 31: 1959-1974.
Commercial 2-D Electrophoresis and Proteomics Sites
Amersham Biosciences
http://www5.amershambiosciences.com/APTRIX/upp00919.nsf/Content/proteomics_HomePage
BioRad Proteomics Workstation
http://www.proteomeworks.bio-rad.com/
Genomic Solutions
http://www.genomicsolutions.com/
2-D Analysis Software Sites
Nonlinear Dynamics: http://www.phoretix.com/
PDQuest: http://proteomeworks.bio-rad.com/html/tech5.html
Flicker for 2D gel analysis: http://www-lecb.ncifcrf.gov/flicker/
NCI/NCRDC LMMB Image Processing Section (GELLAB software):
http://www-lecb.ncifcrf.gov/lemkin/gellab.html
Compugen (Z3 software): http://www.2dgels.com/
Expasy Index to 2D PAGE databases and services:
http://www.expasy.ch/ch2d/2d-index.html
HSC 2DE Gel Protein Databases list:
http://www.harefield.nthames.nhs.uk/nhli/protein.html
Phosphoprotein Database: http://www-lecb.ncifcrf.gov/phosphoDB/
Cambridge Proteomics Facility:
http://www.bio.cam.ac.uk/proteomics/index.html
Rice 2D Database: http://semele.anu.edu.au/2d/2d.html
COMPLUYEAST-2D PAGE Database: http://babbage.csc.ucm.es/2d/2d.html
Danish Centre for Human Genome Research 2D PAGE Databases (Aarhus):
http://proteomics.cancer.dk/
Siena-2DPAGE: http://www.bio-mol.unisi.it/2d/2d.html
PMMA-2D Page at Purkyne Military Medical Academy, Czech:
http://www.pmma.pmfhk.cz/
HP-2D PAGE (Max Delbruck Center, Berlin):
http://www.mdc-berlin.de/~emu/heart
MitoDatMendelian Inheritance and the Mitochondrion:
http://www-lmmb.ncifcrf.gov/mitoDat/
SWISS-2DPAGE at Geneva University Hospital:
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 82
http://www.expasy.ch/ch2d/ch2d-top.html
Proteome BioKnowledge Library: http://www.proteome.com/YPDhome.html
Yeast Proteome Map: http://www.ibgc.u-bordeaux2.fr/YPM/
Melanie: http://us.expasy.org/melanie
Sources of Information and Methods on 2-D Electrophoresis and Proteomics
Australian Proteome Analysis Facility: http://www.proteome.org.au/
The Tubingen Proteome Project: http://www.uni-tuebingen.de/uni/kxm/Proteome/
University of Aberdeen Protein Lab and Proteomics Facility:
http://www.abdn.ac.uk/~mmb023/proteome/index.htm
The EXPASY Swiss 2D-PAGE http://www.expasy.ch/
The Harefield Hospital in London provides links to worldwide databases,upcoming meetings, 2-D gel
analysis software, and more:
http://www.harefield.nthames.nhs.uk/nhli/protein/
The laboratory of Dr. James R. Jefferies, parasitology Group, Institute of Biological Sciences,
University of Wales at Aberystwyth, Ceredigion, Wales, UK:
http://www.aber.ac.uk/~mpgwww/Proteome/Tut_2D.html#Section 1
Proteomics tools for mining sequence databases in conjunction with Mass Spectrometry experiments:
http://prospector.ucsf.edu/
Websites for theoreticaland technical procedures on 2-D gel electrophoresis
+ http://www5.amershambiosciences.com/applic/upp00738.nsf/vLookupDoc/172581038-
R140/$file/80-6429-60AB_Version_May_2002.pdf
+ http://www.bio-rad.com/LifeScience/pdf/Bulletin_2651.pdf
+ http://proteomics.cancer.dk/procedures/procedure.html
+ http://www.aber.ac.uk/parasitology/Proteome/Tut_2D.html
+ http://ca.expasy.org/ch2d/protocols
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 83
Bioinformatics- Data mining tools, Identification of microsatellite sites,
EST analysis and Annotation
S. J. Eapen
Indian Institute of Spices Research, Calicut 673 012, Kerala
In Silico Analysis - Annotation of ESTs, SSR and SNP identification
SSR identification
Exercise 1. Collection of ESTs.
1. Go to NCBI site http://www.ncbi.nlm.nih.gov/ select db EST and type Citrus macrophylla in
text box.
2. Observe the results and download the fasta format file for analysis.
3. By selecting the display format as fasta will provide the fasta file.
4. By selecting file, whole ESTs can be downloaded in a single file.
Exercise 2. Assembling the ESTs
1. Go to CAP3 website http://mobyle.pasteur.fr/cgi-bin/MobylePortal/portal.py?form=cap3 and
type your Email ID.
2. Paste the EST sequences in the text box or upload the file by selecting file option.
3. Click the run button.
4. CAP3 will make the EST sequence in to contigs and singletons.
5. Once analysed, save the contigs file and singleton file.
6. These contigs were used for further analysis.
Exercise 3. Detecting SSRs using contigs ESTs
1. Go to WEBTROLL (Tandom Repeat Occurrence Locator) website
(http://wsmartins.net/webtroll/troll.html).
2. In WEBTROLL upload the contigs file for the analysis of SSRs (mono, di, tri, tetra, penta
repeats).
3. Click troll it button.
4. Observe the results and by clicking the Design primer button you can design the primers.
5. Collect the repeats and make a table in a proper manner using Excel.
Exercise 4. Detecting SSRs using MISA- MIcroSAtellite identification tool (MISA) tool
1. Go to MISA website http://pgrc.ipk-gatersleben.de/misa/ download misa.pl and misa.ini and
put in a folder in LINUX OS.
2. Download fasta file and put it in a same folder.
3. In command line type misa.pl <FASTAfile>
4. Get the result output <FASTfile>.misa, <FASTfile>.statistics and interpret your results.
SNP identification
Exercise 4. SNP discovery using ESTs
1. This part is analysed only on LINUX OS.
2. AUTOSNP program is customized and used for detecting SNPs.
3. CAP3 is integrated in AUTOSNP is used for making contigs.
4. In command mode just type ./cap3snp.pl f piper.fasta if it is ace file means type
./cap3snp.pl a piper.ace for analysis.
5. AUTOSNP will convert the fasta file in to results of html files.
6. Count the detected SNPs and DNA substitution (indel, transition, transversion) for
interpretation
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 84
Expressed Sequence Tags Analysis & Annotation
EST clustering and assembly:
Currently the majority of the coding portion is in the form of expressed sequence tags (ESTs), and the
need to discover the full length cDNAs of each human gene is frustrated by the partial nature of this
data delivery. There is significant value in attempting to consolidate gene sequences as they are
produced, in lieu of a yet-to-be-completed reference sequence. ESTs offer a rapid and inexpensive
route to gene discovery, reveal expression and regulation data, highlight gene sequence diversity and
splicing, and may identify more than half genes of organisms. Unfortunately, most EST data remains
unprocessed, and thus does not provide the important high value sequence consensus information that
it contains. The low quality sequence data provided can be much improved on, and in order to achieve
quality information, pre-processing, clustering and post-processing of the results is required. The
steps for EST processing are given below.
Exercise 1. Collection of ESTs.
NCBI dbEST is a division of GenBank that contains sequence data and other information on "single-
pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms. For
downloading the complete EST sequences of organisms of your interest type the scientific name of
organism in the text box, available EST sequences of organism of your interest can be obtained
1. Go to NCBI site http://www.ncbi.nlm.nih.gov/ select db EST and type Phytophthora capsici
in text box.
2. Observe the results and download the fasta format file for analysis.
(FASTA format is a text-based format for representing either nucleotide sequences or
peptide sequences, in which base pairs or amino acids are represented using single-letter
codes. The format also allows for sequence names and comments to precede the sequences.
Which begins with '>', and then give a name and/or a unique identifier for the sequence)
3. By selecting the display format as fasta will provide the fasta file.
4. By selecting file, whole ESTs can be downloaded in a single file.
Exercise 2: Vector Screening
Downloaded ESTs may contain vector and poly A tail contaminations, these vector sequences and
poly A tail sequences must be removed to avoid errors during annotation. The vector screening step
will show you whether your EST sequences contain Vector contamination.
1. Go to http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html for removing vector
contamination.
2. Copy and paste your fasta file in to the text box.
3. Click Run VecScreen button.
4. Find the similarities using Vector Blast. If similarities found delete the similar sequence from
the fasta file.
5. Use the preprocessed file for further analysis.
Exercise 3: TrimEST
If the vector sequence is detected during vector screening, use TrimEST tool to trim out the vector
sequences present in the EST.
6. Go to the website http://inn.weizmann.ac.il/cgi-
bin/EMBOSS/emboss.pl?_action=input&_app=trimest for removing PolyA tail.
7. Browse and choose your fasta file or paste your EST sequence.
8. Observe the options field manipulate it and click run trimEST. Immediately output file will
open.
9. For larger size sequences submit mail ID so that you will get a mail your job is over or not.
Exercise 4: TrimSeq Trim ambiguous bits off the ends sequences. Specifically, it removes all gap
characters from the ends, removes X's and N's (in nucleic sequences) from the ends, optionally
removes *'s from the ends, optionally removes IUPAC ambiguity codes from the ends (B and Z in
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 85
proteins, M, R, W, S, Y, K, V, H, D and B in nucleic sequences). It then optionally trims off poor
quality regions from the end, using a threshold percentage of unwanted characters in a window which
is moved along the sequence from the ends. The unwanted characters which are used are X's and N's
(in nucleic sequences), optionally *'s, and optionally IUPAC ambiguity codes.
1. Go to the website http://mobyle.pasteur.fr/cgi-bin/MobylePortal/portal.py?form=trimseq.
2. Check file option and browse and choose your file click the RUN button to trim the sequence.
3. Save the TrimSeq output file for further analysis.
Exercise 5. Repeat masking
Repeat masking is not a necessary step; this tool is used to mask the repeated regions of EST, which
may create problems for clustering algorithm for EST assembly.
1. RepeatMasker - screens DNA sequences in FASTA format against a library of repetitive
elements and returns a masked query sequence ready for database searches.
2. Go to the website http://repeatmasker.org in services click repeatmasking
3. Browse and choose your fasta file or paste your EST sequence.
4. Check search engine as wublast and choose the DNA source.
5. Result output will be in HTML format.
6. This repeat masked file is used for clustering.
Exercise 6. Clustering and Assembling the ESTs
A cluster is fragmented, EST data (DNA or protein) and (if known) gene sequence data, consolidated,
placed in correct context and indexed by gene such that all expressed data concerning a single gene is
in a single index class, and each index class contains the information for only one gene. Clustering
refers to assembling sequences in specific order as such; they were placed in the genome of organism.
1. Go to CAP3 website http://mobyle.pasteur.fr/cgi-bin/MobylePortal/portal.py?form=cap3 and
type your Email ID.
2. Paste the EST sequences in the text box or upload the file by selecting file option.
3. Click the run button.
4. CAP3 will make the EST sequence in to contigs and singletons.
5. Once analyzed, save the contigs file and singleton file. These contigs and singleton were used
to further analysis.
EST Annotation:
Genome or EST annotation is the process of attaching biological information to sequences. It
consists of two main steps:
1. Identifying elements on the genome, a process called gene prediction, and
2. Attaching biological information to these elements.
The basic level of annotation is using BLAST for finding similarities, and then annotating genomes
based on that. However, nowadays more and more additional information is added to the annotation
platform. The additional information allows manual annotators to deconvolute discrepancies between
genes that are given the same annotation. Some databases use genome context information, similarity
scores, experimental data, and integrations of other resources to provide genome annotations through
their Subsystems approach. Other databases (e.g Ensembl) rely on both curated data sources as well
as a range of different software tools in their automated genome annotation pipeline.
Structural annotation consists of the identification of genomic elements.
- ORFs and their localization
- gene structure
- coding regions
- location of regulatory motifs
Functional annotation consists of attaching biological information to genomic elements.
- biochemical function
- biological function
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 86
- involved regulation and interactions
- expression
These steps may involve both biological experiments and in silico analysis. A variety of software
tools have been developed to permit scientists to view and share genome annotations.
Steps
Exercise 1. The clustered EST sequence obtained from EST clustering and assembly step is used
for annotation of EST
Exercise 2. Annotation of ESTs using blast
1. Go to BLASTX site (Search protein database using a translated nucleotide query)
http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Translations&PROGRAM=blastx&BLAST_P
ROGRAMS=blastx&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on
and paste your contigs sequence for blast search.
2. Type the organism as Phytophthora with default parameters.
3. Observe the search results.
4. In Blast results click the gene ontology GO and observe the function of the gene.
5. Prepare the table for the functional annotation.
6. Interpret your results.
Exercise 3. Annotation of ESTs using ESTEXPLORER
1. Go to ESTEXPLORER (Web server) site and observe the interface for EST analysis
(http://estexplorer.els.mq.edu.au/estexplorer/main_page.php ).
2. Select organism as Phytophthora
3. Check EST sequences and upload your data.
4. Tick PHASE I , PHASE II, PHASE III
5. Provide your name and mail ID
6. Click Process data button.
7. It will provide the request ID to see the results via status of the work.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 87
SEQUENCE-BASED MARKER DESIGNING
Rajesh M.K.
1
and Senthil Kumar R.
2
1
Central Plantation Crops Research Institute, Kasaragod 671124, Kerala
2
Indian Institute of Spices Research, Appangala, Karnataka
Introduction
The advent of next-generation sequencing (NGS) has revolutionized genomic and
transcriptomic approaches to biology. These new sequencing tools are also valuable for the discovery,
validation and assessment of genetic markers in populations. Restriction enzymes have been a core
tool for marker discovery and genotyping for decades, ever since the development and use of RFLPs.
The diversity of restriction enzymes available (which vary in the length, symmetry or GC versus AT
bias of their recognition sites, and also in their methylation-sensitivity) makes them an extremely
versatile assay tool. Their flexibilities allow researchers to customize marker discovery approaches to
individual projects.
Methodology
1. Sequencing of the genome and comparison with the reference genome.
When a reference genome sequence is available, sequence reads produced by any of the
technologies can be aligned and positioned on a physical map. The higher the quality of the
reference genome assembly, the easier it is to impute missing genotypes, thus reducing the
coverage that is required to genotype each individual. Reference genomes can also be used to
design marker discovery experiments by simulating in silico the number of markers produced
by different enzymes. Challenges arise when a reference genome sequence is not available, or
even when a reference sequence is available but is poorly assembled, comes from a distantly
related taxon or is large and highly repetitive. Reference genomes can also be used to design
marker discovery experiments by simulating in silico the number of markers produced by
different enzymes.
2. SHORE analysis of the two genomes
SHORE, for Short Read, is a mapping and analysis pipeline for short DNA sequences
produced on a Illumina Genome Analyzer. It is designed for projects whose analysis strategy
involves mapping of reads to a reference sequence. This reference sequence does not
necessarily have to be from the same species, since weighted and gapped alignments allow
for accuracy even in diverged regions. The reads of the newly sequenced genome are aligned
to the reference genome to detect SNPs.
3. Retrieval of sequences around the region of SNP
Sequences, around 500 bp, are retrieved from the two genome sequences.
4. Detection of restriction enzyme sites within the two sequences
The two sequences are compared with respect to unique restriction enzyme site present in any
one of the sequences.
5. Design of primers for amplification of the sequence of interest
Primers are designed for amplification of the sequence of interest using primer design
softwares.
National training on Allele Mining 12
th
- 25
th
Sept, 2011, IISR, Calicut
Laboratory Manual 88
Annexure I
General Conversion Tables and Formulae
Common Conversions of Nucleic Acids and Proteins
Weight conversion
1g = 10
-6
g
1 ng = 10
-9
g
1 pg = 10
-12
g
1 fg = 10
-15
g
Spectrophotometric conversion
1 A
260
unit of double-stranded DNA = 50 g/ml
1 A
260
unit of single-stranded DNA = 33 g/ml
1 A
260
unit of single-stranded RNA = 40 g/ml
DNA molar conversions
1 g of 1,000 bp DNA 1.52 pmole (3.03 pmoles of ends)
1 pmole of 1000 bp DNA = 0.66 g
Protein molar conversion
100 pmoles of 100,000 dalton protein = 10 g
100 pmoles of 50,000 dalton protein = 5 g
100 pmoles of 10,000 dalton protein = 1 g
Protein/DNA conversion
1 kb of DNA = 330 amino acids of coding capacity 3.7 x 10
4
dalton protein
10,000 dalton protein = 270 bp DNA
50,000 dalton protein = 1.35 kb DNA
100,000 dalton protein = 2.7 kb DNA