You are on page 1of 20

18 Chapter 1

with improved characteristics. Thus far, growth of increased numbers of nitrogen-


plant breeders have mainly been concerned fixing microorganisms around their roots to
with bringing about a continuous improve- reduce the need for nitrogen fertilizer.
ment in the productivity of that part of the 8. More efficient use of water whether there
plant which is of economic importance, is a plentiful supply or dearth of water.
the stability of production through in-built 9. Stability of crop production by resilience
resistance to pests and diseases and nutri- to weather fluctuations, resistance to the
tive and organoleptic or other desired qual- multiple alliance of weeds, pests and patho-
ity characters. gens, and tolerance to various abiotic stresses
Many parameters and selection criteria such as heat, cold, drought, wind, and soil
should be included as breeding objectives. salinity, acidity or aluminium toxicity.
According to Sinha and Swaminathan 10. Insensitivity to photoperiod and tem-
(1984) and other sources, the major objec- perature: selection of crop cultivars that are
tives of plant breeders can be summarized insensitive to photoperiod or temperature
by the following list: and characterized by a high per-day bio-
mass production would allow the develop-
1. High primary productivity and efficient ment of contingency cropping patterns to
final production for each unit of cultivation suit different weather probabilities.
and solar energy invested: to ensure that all 11. Plant architecture and adaptability to
the light that falls on a field is intercepted mechanized farming: the number and posi-
by leaves and that photosynthesis itself is tioning of the leaves, branching pattern of
as efficient as possible. Greater efficiency in the stem, the height of the plant, and the
photosynthesis could perhaps be achieved positioning of the organs to be harvested are
by reducing photorespiration. all important to crop production and often
2. High crop yield: plants must be selected determine how well plants can be harvested
which invest a large proportion of their total mechanically.
primary productivity into those areas which 12. Elimination of toxic compounds.
are commercially desirable, e.g. seeds, roots, 13. Identification and improvement of
leaves or stems. hardy plants suitable for sources of biomass
3. Desirable nutritional value, organolep- and renewable energy.
tic properties and processing qualities: the 14. Multiple uses of a single crop.
proportion of essential amino acids and the 15. Environmentally-friendly and stable
total protein in cereal grains, for example, across environments.
should be increased to improve their nutri-
tional quality. In conclusion, plant breeding has many
4. Biofortifying crops with essential mineral breeding objectives and each of the objec-
elements that are frequently lacking in the tives can be addressed in a specific breeding
human diet such as Fe and Zn, vitamins programme. A successful breeding pro-
and amino acids (Welch and Graham, 2004; gramme consists of a series of activities as
White and Broadley, 2005; Bekaert et al., Burton (1981) summarized in six words:
2008; Mayer et al., 2008; Ufaz and Galili, variate, isolate, evaluate, intermate, multiply
2008; Naqvi et al., 2009; Xu et al., 2009a). and disseminate.
5. Modifying crop plants to generate plant-
derived pharmaceuticals to supply low-cost
drugs and vaccines to the developing world 1.8 Molecular Breeding
(Ma et al., 2005).
6. Adaptation to cropping systems: includ- By 2025, the global population will exceed
ing breeding for contrasting cropping, inter- seven billion. In the interim per-capita
cropping, and sustainable cropping systems availability of arable land and irrigation
(Brummer, 2006). water will decrease from year to year as
7. More extensive and efficient nitrogen biotic and abiotic stresses increase. Food
fixation: breeding cereals that encourage the security, best defined as economic, physical
Introduction 19

and social access to a balanced diet and breeding is becoming quicker, easier, more
safe dinking water will be threatened, with effective and more efficient (Phillips, 2006).
a holistic approach to nutritional and non- Plant breeders will be well equipped with
nutritional factors needed to achieve suc- innovative approaches to identify and/
cess in the eradication of hunger. Science or create genetic variation, to define the
and technology can play a very impor- genetic feature of the genes related to the
tant role in stimulating and sustaining an variation (position, function and relation-
Evergreen Revolution leading to long-term ship with other genes and environments),
increases in productivity without any asso- to understand the structure of breeding
ciated ecological harm (Borlaug, 2001; populations, to recombine novel alleles or
Swaminathan, 2007). The objectives of the allele combinations into specific cultivars
plant breeder can be realized through con- or hybrids, and to select the best individu-
ventional breeding integrated with various als with desirable genetic features which
biotechnology developments (e.g. Damude enable them to adapt to a wide range of
and Kinney, 2008; Xu et al., 2009c). environments.
Plant breeding can be defined as an Sequencing data for many plants is now
evolving science and technology (Fig. 1.2). readily available and the GenBank database
It has gradually been evolving from art to is doubling every 15 months. Over 20 plant
science over the last 10,000 years, starting species including many important crops are
as an ancient art to the present molecular in the process of being sequenced (Phillips,
design-based science. With the develop- 2008). The next challenge is to determine
ment of molecular tools which will be dis- the function of every gene and eventually
cussed further in Chapters 2 and 3, plant how genes interact to form the basis of com-
plex traits. Fortunately, DNA chips and
other technologies are being developed to
Art-based Plant Breeding study the expression of multiple or even
all genes simultaneously. High throughput
Collection of wild plants for food robotics and bioinformatics tools will play
Selection of wild plants for cultivation an essential role in this endeavour.
(starting from 10,000 years ago)
New information about our crop spe-
cies is expanding our capabilities to use
Large-scale breeding activities supported molecular genetics. For example, we did
by commercial seed production enterprises not previously realize how similar broadly
Hybridization combined with selection
related species are in terms of their gene
Evolution through natural selection
(1700s–1800s) content and gene order. Since these spe-
cies cannot usually be crossed, there was
Mendelian genetics no means of assessing their relatedness.
Quantitative genetics With the advent of DNA-based molecular
Mutation markers, the extensive genetic mapping of
Polyploidy chromosomes became readily possible for
Tissue culture a variety of species. We learned that the
(1900s) genomes were highly similar and that this
similarity allowed the prediction of gene
Gene cloning and direct transfer locations among species. For example, rice
Genomics-assisted breeding has become the model or reference spe-
(2000s and beyond)
cies for the cereals as many of the gene
sequences on the rice chromosomes are
Molecular Plant Breeding
shared with other cereals such as maize,
Fig. 1.2. The steps of evolution of ‘plant breeding’. sorghum, sugarcane, millet, oats, wheat
With the availability of more sophisticated tools, and barley (Xu et al., 2005). Knowing the
the art of plant breeding became science-based complete DNA sequence of a model or ref-
technology, molecular plant breeding. erence genome allows genes/traits from this
20 Chapter 1

model to be tracked to other genomes. We improve the understanding of the role of het-
have come to realize that the differences erosis in evolution and the domestication of
between species of plants are not due to crop plants (Lippman and Zamir, 2007), and
novel genes, but to novel allelic specifica- finally to make it possible to predict hybrid
tions and interactions. performance.
Since many fundamental aspects of Messenger RNA transcript profiling is
current plant breeding procedures are not an obvious candidate for functional genomic
well understood, further data relating to application to plant breeding. Although
the genetics of crop species may help to direct selection at the gene transcript level
shed light on the genetic gains obtained using microarray or real-time PCR may be
from plant breeding. For example, in suc- a long-term goal, other genomic tools can
cessful plant breeding programmes, the be used to achieve shorter term goals with
genetic base often becomes narrower rather more practical applications (Crosbie et al.,
than broader. ‘Elite by elite’ crosses may be 2006). Genetic modification of crops today
the rule in these programmes. Molecular involves the interfacing of molecular bio-
genetic markers have been widely employed logy, cell and tissue culture, and genetics/
to identify cryptic and novel genetic vari- breeding. The transfer of genes by cellu-
ation among cultivars and related species lar and molecular means will increase the
and used to increase the efficiency of selec- available gene pool and lead to second
tion for agronomic traits and the pyramid of generation biotechnology plant products
genes from different genetic backgrounds. such as those with a modified oil, protein,
Long-term selection programmes would vitamin, or micronutrient content or those
be expected to lead to genetic fixation, how- that have been engineered to produce com-
ever this has not been found to be the case pounds that can be used as vaccines or anti-
so far and variation is still observed. Several carcinogens.
mechanisms for de novo variation have been While all these new innovations have
described, including intragenic recombin- been useful, practical plant breeding con-
ation, unequal crossing over among repeated tinues to be based on hybridization and
elements, transposon activity, DNA methyl- selection with little change in the basic
ation, and paramutation. Another important procedures. A more complete understand-
feature in plant breeding whose molecular ing of the mechanisms by which genetic
basis is not understood is heterosis although and environmental variation modify yield
it is used as the basis for many seed-producing and composition is needed so that specific
industries. Genomics and particularly tran- quantitative and qualitative targets can be
scriptomics are now being used to identify identified. To achieve this aim, the exper-
the heterotic genes responsible for increas- tise of plant genomics (including various
ing crop yields. Comprehensive quantitative omics), physiology and agronomy, as well
trait locus-based phenotyping (phenomics) as plant modelling techniques must be com-
combined with genome-wide expression bined (Wollenweber et al., 2005) and many
analysis, should help to identify the loci logistic and genetic constraints also need to
controlling heterotic phenotypes and thus be resolved (Xu and Crouch, 2008).
2
Molecular Breeding Tools:
Markers and Maps

2.1 Genetic Markers markers can be used to facilitate studies of


inheritance and variation.
In conventional plant breeding, genetic Desirable genetic markers should
variation is usually identified by visual meet the following criteria: (i) high level of
selection. However, with the development genetic polymorphism; (ii) co-dominance
of molecular biology, it can now be identi- (so that heterozygotes can be distinguished
fied at the molecular level based on changes from homozygotes); (iii) clear distinct
in the DNA and their effect on the pheno- allele features (so that different alleles
type. Molecular changes can be identified can be identified easily); (iv) even distri-
by the many techniques that have been used bution on the entire genome; (v) neutral
to label and amplify DNA and to highlight selection (without pleiotropic effect);
the DNA variation among individuals. Once (vi) easy detection (so that the whole proc-
the DNA has been extracted from plants or ess can be automated); (vii) low cost of
their seeds, variation in samples can be marker development and genotyping; and
identified using a polymerase chain reac- (viii) high duplicability (so that the data
tion (PCR) and/or hybridization process fol- can be accumulated and shared between
lowed by polyacrylamide gel electrophoresis laboratories).
(PAGE) or capillary electrophoresis (CE) to Most molecular markers belong to the
identify distinct molecules based on their so-called anonymous DNA marker type and
sizes, chemical compositions and charges. generally measure apparently neutral DNA
Genetic markers are used to tag and track variation. Suitable DNA markers should
genetic variation in DNA samples. represent genetic polymorphism at the DNA
Genetic markers are biological features level and should be expressed consistently
that are determined by allelic forms and can across tissues, organs, developmental stages
be used as experimental probes or tags to and environments; their number should be
keep track of an individual, a tissue, cell, almost unlimited; there should be a high
nucleus, chromosome or gene. In classical level of natural polymorphism; and they
genetics, genetic polymorphism represents should be neutral with no effect on the
allelic variation. In modern genetics, genetic expression of the target trait. Finally, most
polymorphism is the relative difference at DNA markers are co-dominant or can be
any genetic locus across a genome. Genetic converted into co-dominant markers.

©Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 21


22 Chapter 2

Table 2.1 lists the major molecular Schwarz (2005) and Falque and Santoni
marker technologies that are currently (2007). Further information regarding the
available. Only a selection of widely-used application of DNA markers in genetics and
representative types of markers will be dis- breeding can be found in Lörz and Wenzel
cussed in this section. Figure 2.1 shows the (2005). After a brief review of the classical
molecular mechanism of several major DNA markers, DNA markers will be discussed in
markers and the genetic polymorphisms more detail in this section.
that can be generated by restriction site or
PCR priming site mutation, insertion, dele-
tion or by changing the number of repeat 2.1.1 Classical markers
units between two restriction or PCR prim-
ing sites and nucleotide mutation resulting Morphological markers
in a single nucleotide polymorphism (SNP).
There are several comprehensive reviews In the late 1800s, following his studies on
that cover all the important DNA markers, the garden pea (Pisum sativum), G.J. Mendel
e.g. Reiter (2001), Avise (2004), Mohler and proposed two basic rules of genetics,

Table 2.1. DNA markers and related major molecular techniques.

Southern blot-based markers


Restriction fragment length polymorphism (RFLP)
Single strand conformation polymorphic RFLP (SSCP-RFLP)
Denaturing gradient gel electrophoresis RFLP (DGGE-RFLP)
PCR-based markers
Randomly amplified polymorphic DNA (RAPD)
Sequence tagged site (STS)
Sequence characterized amplified region (SCAR)
Random primer-PCR (RP-PCR)
Arbitrary primer-PCR (AP-PCR)
Oligo primer-PCR (OP-PCR)
Single strand conformation polymorphism-PCR (SSCP-PCR)
Small oligo DNA analysis (SODA)
DNA amplification fingerprinting (DAF)
Amplified fragment length polymorphism (AFLP)
Sequence-related amplified polymorphism (SRAP)
Target region amplified polymorphism (TRAP)
Insertion/deletion polymorphism (Indel)
Repeat sequence-based markers
Satellite DNA (repeat unit containing several hundred to thousand base pairs (bp) )
Microsatellite DNA (repeat unit containing 2–5 bp)
Minisatellite DNA (repeat unit containing more than 5 bp)
Simple sequence repeat (SSR) or simple sequence length polymorphism (SSLP)
Short repeat sequence (SRS)
Tandem repeat sequence (TRS)
mRNA-based markers
Differential display (DD)
Reverse transcription PCR (RT-PCR)
Differential display reverse transcription PCR (DDRT-PCR)
Representational difference analysis (RDA)
Expression sequence tags (EST)
Sequence target sites (STS)
Serial analysis of gene expression (SAGE)
Single nucleotide polymorphism-based markers
Single nucleotide polymorphism (SNP)
Markers and Maps 23

A. Mutation at
enzyme restriction
or PCR priming site
RFLP, AFLP, CAPS

RAPD, AP-PCR, DAF, ISSR

B. Insertion
between enzyme
restriction or PCR
priming sites Insertion

RFLP, AFLP, CAPS, RAPD, AP-PCR, DAF, ISSR

C. Deletion
between enzyme
restriction or PCR
Deletion
banding sites

RFLP, AFLP, CAPS, RAPD,


AP-PCR, DAF, ISSR

D. Change of
tandem repeat
units between
enzyme restriction
or PCR banding
sites
SSR, VNTR, ISSR

E. Single GGACTACGT C GTATCATCGTACCG


nucleotide CCTGATACA G CATAGTAGCATGGC
mutation
GGACTACGT A GTATCATCGTACCG
CCTGATGCA T CATAGTAGCATGGC
SNP

Enzyme restriction site PCR primer

Tandem repeat sequence

Fig. 2.1. Molecular basis of major DNA markers. Parts A–E show different ways in which DNA markers
(listed below each diagram) can be generated. The cross in part A indicates that mutation has eliminated
the priming site. Abbreviations: as defined in Table 2.1; VNTR, variable number of tandem repeat; CAPS,
a DNA marker generated by specific primer PCR combined with RFLP; ISSR, inter simple sequence repeat.
24 Chapter 2

which were later known as the Mendelian 1998). Many of these markers have been
laws of equal segregation and independ- linked with other agronomic traits.
ent assortment. Mendel selected individu- Morphological markers are usually
als which differed in a particular trait and mapped by classical two- or three-point
used them as the parental lines in a cross linkage tests. The linkage groups are estab-
breeding experiment to determine the phe- lished and the order of the markers and
notype of the offspring with regard to the the relative distance between any two are
selected trait. The term phenotype (derived determined by their recombinant frequen-
from Greek) literally means ‘the form that cies. Relatively complete linkage maps
is shown’ and is used by both geneticists have been constructed in many crop spe-
and breeders. The seven pairs of contrasting cies using morphological markers and these
phenotypes studied by Mendel included maps provide the fundamental information
round versus wrinkled seeds, yellow ver- for the genetic mapping of many physiolog-
sus green seeds, purple versus white petals, ical and biochemical traits.
inflated versus pinched pods, green versus However, it is difficult to construct a
yellow pods, axial versus terminal flowers relatively saturated genetic map because of
and long versus short stems. The plants in the limitation in the number of morphologi-
the segregated populations of the pea, such cal markers with distinguishable polymor-
as F2 and backcross, were classified into two phisms. In addition, many morphological
distinct groups depending on their pheno- markers have deleterious effects on pheno-
types. These contrasting morphological types and some are significantly affected by
phenotypes are the starting point for any other factors such as environments or matu-
genetic analysis and can be mapped to par- rity which results in potential problems
ticular chromosomes using the Mendelian when these markers are used for genetics
laws of inheritance and can thus be used as and plant breeding.
morphological markers of the genome and
the particular trait. Cytological markers
Morphological markers therefore gen-
erally represent genetic polymorphisms By studying the morphology, number and
which are visible as differences in appear- structure of chromosomes from different
ance, such as the relative difference in plant species, particular cytogenetic features can
height and colour, distinct differences in be found, such as various types of aneu-
response to abiotic and biotic stresses, and ploidy, variants of chromosome structure
the presence/absence of other specific mor- and abnormal chromosomes. These can
phological characteristics. A large number be used as genetic markers to locate other
of variants showing particular morphologi- genes on to chromosomes and determine
cal or physiological phenotypes have been their relative positions, or used for genetic
generated by tissue culture and mutation mapping via chromosome manipulations
breeding. Using selection techniques these such as chromosome substitution.
variants can be genetically stabilized and The structural features of chromo-
then used as morphological markers. somes can be shown by chromosome kary-
Some genetic stocks contain more than otype and bands. The banding patterns are
one morphological marker, for example indicated by colour, width, order and posi-
there are a total of over 300 morphologi- tion, revealing the difference in distribu-
cal markers available for genetic studies in tions of euchromatin and heterochromatin.
rice (Khush, 1987) and more are being cre- There are Q bands (produced by quina-
ated for functional genomics. Many mor- crine hydrochloride), G bands (produced
phological marker stocks are also available by Giemsa stain) and R bands (reversed
for tomato (http://www.plantpath.wisc.edu/ Giemsa). These chromosome landmarks are
GeminivirusResistantTomatoes/MERC/ not only useful for characterizing normal
Tomato/Tomato.html), maize (Neuffer et al., chromosomes but also for detecting chro-
1997) and soybean (Palmer and Shoemaker, mosome mutation.
Markers and Maps 25

Cytological markers have been widely otide difference within a gene or between
used to identify linkage groups within spe- genes; and in others it represents the site
cific chromosomes and have been widely of a variable number of tandem repeats of
applied in physical mapping. However, ‘junk DNA’ present between genes. The
because of the limited number and reso- development of RFLP markers has acceler-
lution, they have limited applications in ated the construction of molecular linkage
genetic diversity analysis, genetic mapping maps for many organisms, improved the
and marker-assisted selection (MAS). accuracy of gene location, and reduced the
time required to establish a complete link-
Protein markers age map.
The digestion of purified DNA using
Isozymes are structural variants of an restriction enzymes which cut the DNA
enzyme and while they differ from the strand wherever there is a recognition
original enzyme in molecular weight and site sequence (usually four to eight base
mobility in an electric field, they have the pairs), leads to the formation of RFLPs
same catalytic activity. The difference in which yield a molecular fingerprint that
enzyme mobility is caused by point muta- may be unique to a particular individual.
tions resulting from amino acid substitu- If the bases are positioned at random in the
tion such that isozymes reflect the products genome, an enzyme having a recognition
of different alleles rather than different site with six bases will cleave the DNA at
genes. Therefore, isozymes can be geneti- every 4096 bases on average (46). A genome
cally mapped on to chromosomes and then of 109 bases could thus produce around
used as genetic markers for mapping other 250,000 restriction fragments of variable
genes. Isozyme markers are based on their length. Gel electrophoresis on such a large
biochemistry and thus are also known as number of genomic DNA digestion prod-
biochemical or protein markers. ucts produces a continuous smear image.
However, their use as markers is lim- Particular fragments that are homologous
ited. For example a total of 57 isozymes between several individuals, and possibly
representing about 100 loci have been iden- allelic, can be separated only by means
tified in plants (Vallegos and Chase, 1991) of molecular probes using the Southern
but for specific species only 10–20 iso- technique (Southern, 1975). RFLP analysis
zymes are available so that they cannot be includes the following steps (Fig. 2.2):
used to construct a complete genetic map.
Each isozyme can only be identified with a 1. DNA isolation: a significant amount of
specific stain which also limits their use in DNA must be isolated from multiple indi-
practice. viduals from target genotypes (parents and
segregating populations, germplasm survey,
garden blot, etc.) and purified to a fairly
2.1.2 DNA markers stringent degree as contaminants can often
interfere with the restriction enzyme and
RFLP inhibit its ability to digest the DNA.
2. Restriction digestion: restriction enzyme
Botstein et al. (1980) first used DNA restric- is added to purified genomic DNA under
tion fragment length polymorphism (RFLP) buffered conditions. The enzyme cuts at
in human linkage mapping and this pio- recognition sites throughout the genome
neered the utilization of DNA polymor- and leaves behind hundreds of thousands
phisms as genetic markers. It is known that of fragments.
the genomes of all organisms show many 3. Gel electrophoresis: digested products
sites of neutral variation at the DNA level. (restriction fragments) are electrophoresed
These neutral variant sites do not have any on agarose gel and when visualized appear
effect on the phenotype. In some cases a neu- as smears because of the large number of
tral site is nothing more than a single nucle- fragments.
26 Chapter 2

A1 A2 A1 A2 A1 A2

DNA Restriction Agarose gel DNA


extraction fragments electrophoresis denaturing

A1 A2

Radioactive Wash Hybridization Southern


autograph blotting

Fig. 2.2. RFLP workflow from DNA extraction to radio-autograph. Modified from Xu and Zhu (1994).

4. The agarose gel is denatured using NaOH DNA (cDNA). The standard procedure for
solution and then neutralized. developing genomic DNA probes is to digest
5. The DNA fragments are transferred to a total DNA with a methylation-sensitive
nitrocellulose membrane using Southern enzyme (e.g. PstI), thereby enriching the
blotting. library for single-copy sequences (Burr et al.,
6. Probe visualization: the membrane-bound 1988). Typically, the digested DNA is size
genomic DNA is probed by hybridization fractionated on a preparative agarose gel.
using a cloned fragment of the genome of DNA fragments ranging from 500 to 2000 bp
interest or a genome from a relatively close are excised and eluted for cloning into a
species as the probe. plasmid vector (e.g. pUC18). Digests of the
7. The membrane is washed to remove non- plasmids are screened for inserts and their
specifically hybridized DNA. lengths can be estimated. Southern blots of
8. In most cases the sizes of the fragments the inserts can be probed with total sheared
are determined by radioactive methods. genomic DNA to select clones that hybrid-
The probe-restriction enzyme combina- ize to single- and low-copy sequences and to
tions may identify two or more differently eliminate clones that hybridize to medium-
sized fragments. Polymorphism is revealed and high-copy repeated sequences. Single-
whenever the recognized fragments are of and low-copy probes are screened for RFLPs
non-identical lengths. among a sample of genotypes using genomic
DNAs digested with restriction endonucle-
Differences in size of restriction frag- ases (one per assay). Typically, in species
ments are due to: (i) base pair changes that with moderate to high polymorphism rates,
result in gain and loss of restriction sites; two to four restriction endonucleases with
and (ii) insertions/deletions at the restric- hexanucleotide recognition sites are tested.
tion sites within the restriction fragments EcoRI, EcoRV and HindIII are widely used.
on which the probe sequence is located. In species with low polymorphism rates,
Molecular probes are DNA fragments additional restriction endonucleases can
isolated and individualized by cloning or be tested to increase the chance of find-
PCR amplification. They may originate from ing a polymorphism. Both the theory and
fragmented total genomic DNA and thus the techniques for RFLP analysis in plant
contain coding or non-coding sequences, genome mapping have been intensively
unique or repeated, of nuclear or cytoplas- reviewed (Botstein et al., 1980; Tanksley
mic origin. They may also be complementary et al., 1988).
Markers and Maps 27

Most RFLP markers are co-dominant and is used to amplify random sequences from
locus specific. RFLP genotyping is highly a complex DNA template that is comple-
reproducible and the methodology is sim- mentary to it (or includes a limited number
ple and requires no special instrumenta- of mismatches). This means that the ampli-
tion. High-throughput markers (e.g. cleaved fied fragments generated by PCR depend
amplified polymorphic sequence (CAPS) on the length and size of both the primer
or insertion/deletion (indel) markers) can and the target genome. Ten-base oligomers
be developed from RFLP probe sequences. of varying GC content (ranging from 40 to
The CAPS technique, also known as PCR- 100%) are usually used. If two hybridiza-
RFLP, consists of digesting a PCR-amplified tion sites are similar to one another (at least
fragment with one or several restriction 3000 bp) and in opposite directions, that is,
enzymes, and detecting the polymorphism in a configuration that will allow the PCR,
by the presence/absence restriction sites amplification will take place. The amplified
(Konieczny and Ausubel, 1993). products (of up to 3.0 kb) are usually sepa-
RFLP markers are powerful tools rated on agarose gels and visualized using
for comparative and synteny mapping. ethidium bromide staining. The use of a
However, RFLP analysis requires large single 10-mer oligonucleotide promotes the
amounts of high quality DNA and has low generation of several discrete DNA products
genotyping throughput and is very diffi- and these are considered to originate from
cult to automate. Most genotyping involves different genetic loci. Polymorphisms result
radioactive methods so its use is limited to from mutations or rearrangements either at
specific laboratories. RFLP probes must be or between the primer binding sites and are
physically maintained and it is therefore visible in conventional agarose gel electro-
difficult to share them between laboratories. phoresis as the presence or absence of a par-
In addition, the level of RFLP is relatively ticular RAPD band. RAPDs predominantly
low and selection for polymorphic parental provide dominant markers but homologous
lines is a limiting step in the development allele combinations can sometimes be iden-
of a complete RFLP map. tified with the help of detailed pedigree
information.
RAPD RAPDs have several advantages and for
this reason they are widely used (Karp and
Williams et al. (1990) and Welsh and Edwards, 1997). (i) Neither DNA probes nor
McClelland (1990) independently described sequence information is required for the
the utilization of a single, random-sequence design of specific primers. (ii) The proce-
oligonucleotide primer in a low stringency dure does not involve blotting or hybridiza-
PCR (35–45°C) for the simultaneous ampli- tion steps thus making the technique quick,
fication of several discrete DNA fragments simple and efficient. (iii) RAPDs require rel-
referred to as random amplified polymor- atively small amounts of DNA (about 10 ng
phic DNA (RAPD) and arbitrary primed PCR per reaction) and the procedure can be auto-
(AP-PCR), respectively. Another related mated; they are also capable of detecting
technique is DNA amplification fingerprint- higher levels of polymorphism than RFLPs.
ing (DAF) (Caetano-Anollés et al., 1991). (iv) Development of markers is not required
These methods differ from one another in and the technology can be applied to vir-
primer length, the stringency of the con- tually any organism with minimal initial
ditions and the method of separation and development. (v) The primers can be uni-
detection of the fragments. They all can be versal and one set of primers can be used for
used to identify RAPD. any species. In addition, RAPD products of
The principle of RAPD consists of a interest can be cloned, sequenced and con-
PCR on the DNA of the individual under verted into other types of PCR-based mark-
study using a short primer, usually ten ers such as sequence tagged sites (STS),
nucleotides, of arbitrary sequence. The sequenced characterized amplified regions
primer which binds to many different loci (SCAR), etc.
28 Chapter 2

Reproducibility affects the way in which clear what might be causing the problem, it
RAPD bands can be standardized for compar- is worth starting from the beginning by dis-
ison across laboratories, samples and trials posing of all the reagents used and preparing
and whether RAPD marker information can fresh ones. A careful experiment revealed
be accumulated or shared. Due to frequently that reproducibility could be improved and
observed problems with reproducibility of Taberner et al. (1997) reported that 3396 out
overall RAPD profiles and specific bands, of 3422 bands (99.2%) were reproducible.
this marker class is often treated with On the other hand, low reproducibility
reserve. In replication studies by Pérez et al. is a major limitation of RAPD markers, par-
(1998), mispriming error amounted to 60%. ticularly in ongoing genetic and plant breed-
Several factors have been shown to affect ing programmes in which the accumulated
the number, size and intensity of bands. information and markers and marker data
These include PCR buffers, deoxynucleo- are shared between laboratories and experi-
tide triphosphates (dNTPs), Mg2+ concen- ments. RAPD markers may still find their
tration, cycling parameters, source of Taq applications in independent genetic diver-
polymerase, condition and concentration sity and phylogenetic studies that do not
of template DNA and primer concentra- depend on data sharing or accumulation. As
tion. Results obtained by RAPDs are highly RAPD markers can be converted into other
prone to user error and bands obtained can types of markers, they have a unique role in
vary considerably between different runs of the development of target markers for crop
the same sample. To correct the problems species that have limited molecular markers
that may be encountered when carrying out available to cover the whole genome.
RAPD-PCR, it is important to bear in mind To overcome the problem associated
the following: (i) the concentration of DNA RAPD analysis, Paran and Michelmore
can alter the number of bands; (ii) RAPD (1993) converted RAPD fragments into
profiles vary depending on the Mg2+ con- simple and robust PCR markers known as
centration and the PCR buffer provided by SCARs. This procedure increases the repro-
Taq polymerase suppliers may or may not ducibility of RAPD markers and also avoids
contain Mg2+ ions; (iii) there are different the occurrence of non-homologous mark-
sources of Taq polymerase and there is great ers of equal molecular weight. These spe-
variation between profiles produced using cific markers are obtained by introducing
Taq polymerase obtained from different RAPD bands (polymorphic) into single
companies; (iv) there are a large number of markers which are then sequenced and
alternative cycling times and temperatures specific primers are designed usually by
which are equally important and depend on expanding the original decamer primer
the type of machine used and even the wall sequence with 10–15 bases so that only the
thickness of the PCR tubes. band of interest is amplified. In general,
Generally if a PCR does not work there DNA can be isolated from agarose gels,
is likely to be something wrong with the cloned and sequenced to produce the start-
template DNA, primers, Taq polymerase or ing DNA template for the development of a
choice of conditions. Initially it is impor- variety of PCR-based markers. The cloned
tant to try and repeat the PCR under the and sequenced DNA fragments can then be
same conditions to ensure that there was used for the development of CAPS, single
not a simple error that resulted in the fail- strand conformation polymorphism (SSCP)
ure. In addition it is recommended that both or SNP markers.
positive and negative controls are included.
A positive control with a template known AFLP
to amplify well will ensure that all reagents
have been added and that they are all func- Amplified fragment length polymorphism
tioning. A negative control without template (AFLP; Zabeau and Voss, 1993; Vos et al.,
DNA will reveal any contamination. In most 1995) is based on the selective PCR ampli-
cases if the PCR does not work and it is not fication of restriction fragments from a total
Markers and Maps 29

double-digest of genomic DNA under high GAATTC TTAA


CTTAAG AATT
stringency conditions, that is, the combi- Whole genome DNA
nation of polymorphism at restriction sites
and hybridization of arbitrary primers, and Restriction + EcoRI and MseI
because of this AFLP is also called selective
AATTC T
restriction fragment amplification (SRFA). G AAT
It was perfected by the company Keygene
in the Netherlands for initial use in plant TTAA
EcoRI Adaptor
Ligation +
improvement and has been patented. The TA
MseI Adaptor
AFLP technique combines the power of
RFLP with the flexibility of PCR-based AATTC TTA
markers and provides a universal, multi- TTAAG AAT

locus marker technique that can be applied


5′ A EcoRI Primer 1
Pre-amplification +
to complex genomes from any source. The C 5′ MseI Primer 1
method is based on the identification of
AFLP using selective PCR amplification of 5′ A
AATTCN NTTA
digested/ligated genomic or cDNA templates TTAAGN NAAT
separated on a polyacrylamide gel, includ- C 5′

ing restriction–ligation, pre-amplification Selective 5′* GA EcoRI Primer 1


and selective amplification (Fig. 2.3). The amplification CA 5′ MseI Primer 1
purified genomic DNA is first cleaved with
5′* GA
one or more restriction endonucleases, AATTCN NTTA
i.e. a 6-cutter (EcoRI, PstI and HindIII) and TTAAGN NAAT
CA 5′
a 4-cutter (MseI, TaqI). Adaptors of 18–20 bp
and of known sequence, adapted at the Electrophoresis

sticky ends of the restriction sites, are then


added to the ends of DNA fragments by a
ligation reaction using T4 DNA ligase. DNA
amplification is carried out using primers
with the sequence specificity of the adaptor
to generate a subset of fragments of differ-
ent sizes (∼up to 1 kb). The primer(s) also
contains one or more bases at their 3' ends
that provide amplification selectivity by
Fig. 2.3. AFLP flowchart. Adaptor DNA = short
limiting the number of perfect sequence
double strand DNA molecules, 18–20 bp in length,
matches between the primer and the pool representing a mixture of two types of molecules.
of available adaptor/DNA templates. The Each type is comparable with one restriction
resulting amplification products (50–400 bp enzyme generated DNA end. Pre-amplifications
size range) are typically observed by radio- uses selective primers, which contain an adaptor
labelling one of the primers followed by DNA sequence plus one or two random bases at
fragment separation on acrylamide gels to the 3' end for reading into the genomic fragments.
identify polymorphisms (changes in restric- Primers for re-amplification have the pre-amplification
tion sizes). primer sequence plus one or two additional bases
at the 3' end. A tag (*) is attached at the 5' end of
An AFLP primer is composed of a
one of the re-amplification primers for detecting
synthetic adaptor sequence, the restric-
amplified molecules.
tion endonuclease recognition sequence
and an arbitrary, non-degenerate ‘selec-
tive’ sequence (typically one, two or three one rare cutter (6-bp recognition site).
nucleotides). In the first step, 500 ng of Oligonucleotide adaptors are ligated to the
genomic DNA will be completely digested end of each restriction DNA which serve
with two restriction enzymes, one fre- together with restriction site sequences as
quent cutter (4-bp recognition site) and target sites for primer annealing, one end
30 Chapter 2

with a complementary sequence for the rare the detector near the bottom of the gel/end
cutter and the other with the complemen- of the capillary, resulting in a linear spac-
tary sequence for the frequent cutter. In this ing of DNA fragments and therefore increas-
way only fragments which have been cut by ing the resolution over the whole size range
the frequent cutter and rare cutter will be (Schwarz et al., 2000).
amplified. Primers are designed from the In general, AFLP assays can be carried
known sequence of the adaptor, plus one out using relatively small DNA samples
to three selective nucleotides which extend (typically 1–100 ng per individual). AFLP
into the fragment sequence. Sequences not has a very high multiplex ratio and genotyp-
matching these selective nucleotides in the ing throughput and is relatively reproduc-
primer will not be amplified so that the ible across laboratories. Simple off-the-shelf
specific amplification of only those frag- technology can be applied to virtually any
ments matching the primers is achieved. organism with no formal marker devel-
The option to permutate the order of the opment required and in addition, a set of
selective bases and to recombine the prim- primers can be used for different species.
ers with each other will theoretically lead However, there are limitations to the AFLP
to the gradual collection of all restriction assay. (i) The maximum polymorphic infor-
fragments from a particular enzyme com- mation content for any bi-allelic marker
bination that is of a suitable size for DNA is 0.5. (ii) High quality DNA is needed to
fragment analysis from a genotype. The ensure complete restriction enzyme diges-
multiplex ratio of an AFLP assay is a func- tion. Rapid methods for isolating DNA may
tion of the number selective nucleotides in not produce sufficiently clean template
the AFLP primer combination, the selective DNA for AFLP analysis. (iii) Proprietary
nucleotide motif, GC content and physical technology is needed to score heterozygotes
genome size and complexity. Typically, two and ++ homozygotes, otherwise AFLPs must
selective nucleotides are used for species be dominantly scored. (iv) AFLP markers
with small genomes (1 × 108–5 × 108 bp), often cluster densely in centromeric regions
e.g. Arabidopsis thaliana L. (1 × 108 bp) and in species with large genomes, e.g. barley
rice (Oryza sativa L.) (4 × 108 bp), and three (Qi et al., 1998) and sunflower (Gedil et al.,
selective nucleotides are used for species 2001). (v) Developing locus-specific mark-
with large genomes (5 × 108–6 × 109 bp), ers from individual fragments can be dif-
e.g. maize, soybean, sunflower and many ficult. (vi) AFLP primer screening is often
others. It is theoretically possible to use necessary to identify optimal primer spe-
several tens of combinations of restriction cificities and combinations otherwise the
enzymes at sites of four to six bases and a assays can be carried out using off-the-shelf
large number of combinations of selective technology. (vii) There are relatively high
bases on the amplification primers. Thus, technical demands in AFLP analysis includ-
as indicated by Falque and Santoni (2007), ing radio-labelling and skilled manpower.
the restriction–amplification combinations (viii) Marker development is complicated
are nearly infinite. and not cost-effective. (ix) Reproducibility
AFLP products can be separated in high- is relatively low compared to RFLP and
resolution electrophoresis systems. The simple sequence repeat (SSR) markers but
number of bands produced can be manipu- better than RAPD marker as AFLP reveals
lated by the number of selective nucleotides large numbers of bands and not all the bands
and the nucleotide motifs used. A well- will be comparable across laboratories or
balanced number of amplified restriction trials due to potential false positive, false
fragments ranges from 50 to150 bp. A major negative and complicated gel backgrounds.
improvement has been made by switching The AFLP technique can be modified
from radioactive to fluorescent dye-labelled so that one primer is obtained from a known
primers for the detection of fragments in multi-copy sequence to detect sequence-
gel-based or capillary DNA sequencers in specific amplification polymorphisms. This
which fluorescently labelled fragments pass approach was used successfully to generate
Markers and Maps 31

genome-wide Bare-1 retrotransposon-like The unique sequences bordering the SSR


markers in barley (Waugh et al., 1997) and motifs provide templates for specific prim-
diploid Avena (Yu and Wise, 2000) as well ers to amplify the SSR alleles via PCR.
as in lucerne by making use of consen- Referred to as simple sequence length poly-
sus sequences from long terminal repeats morphisms (SSLPs), they pertain to the
(LTRs) of Tms1 retrotransposon (Porceddu number of repeat units that constitute the
et al., 2002). The cDNA-AFLP technique microsatellite sequence. The rates of muta-
(Bachem et al., 1996) which applies the tion of SSR are about 4 × 104–5 × 106 per
standard AFLP protocol to a cDNA tem- allele and per generation (Primmer et al.,
plate, was used to display transcripts whose 1996). The predominant mutation mecha-
expression was rapidly altered during race- nism in microsatellite tracts is ‘slipped-
specific resistance reactions, for the isola- strand mispairing’ (Levinson and Gutman,
tion of differentially expressed genes from 1987). When slipped-strand mispairing
a specific chromosome region using aneu- occurs within a microsatellite array during
ploids and for the construction of genome- DNA synthesis, it can result in the gain or
wide transcription maps (as reviewed by loss of one or more repeat units depending
Mohler and Schwarz, 2005). In addition, on whether the newly synthesized DNA
there are several modified AFLP tech- chain or the template chain loops out. The
niques based on the use of endonucleases relative propensity for either chain to loop
such as single endonuclease (MspI) AFLP out seems to depend in part on the sequences
(Boumedine and Rodolakis, 1998), three making up the array and in part on whether
endonuclease-AFLP (van der Wurff et al., the event occurs on the leading (continuous
2000), and second digestion AFLP (Knox DNA synthesis) or lagging (discontinuous
and Ellis, 2001). Developments in the DNA synthesis) strand (Freudenreich et al.,
detection of AFLP include the replacement 1997). SSR loci are individually amplified
of radio-active detection with silver stain- by PCR using pairs of oligonucleotide prim-
ing, fluorescent AFLP or agrarose gels for ers specific to unique DNA sequences flank-
single endonuclease AFLP. Recent studies ing the SSR sequence.
have addressed specific areas of the AFLP Microsatellites may be obtained by
technique including comparison with other screening sequences in databases or by screen-
genotyping methods, assessment of errors, ing libraries of clones. If no sequence is
homoplasy, phylogenetic signal and appro- available, microsatellite markers can be
priate analysis techniques. The study by developed in the following steps: construct
Meudt and Clarke (2007) provides a syn- enriched or unenriched small-insert clone
thesis of these areas and explores new library; screen it by hybridizing labelled
directions for the AFLP technique in the oligo (with SSR motif of interest); sequence
genomic era. positive clones; design primers in single
copy regions flanking SSR repeats such that
SSR the amplified fragments will be > 50 bp and
< 350 bp; and identify size polymorphism on
Microsatellites, also known as SSRs, short PAGE gels. For multiplexing, design primers
tandem repeats (STRs) or sequence-tagged with similar melting temperature (Tm) and
microsatellite sites (STMS), are tandemly a range of expected amplicon sizes to have
repeated units of short nucleotide motifs non-overlapping groups of markers on a gel.
that are 1–6 bp long. Di-, tri- and tetranu- In rice, both an enzyme-digested (Chen, X.
cleotide repeats such as (CA)n, (AAT)n and et al., 1997) and a physically-sheared library
(GATA)n are widely distributed through- (Panaud et al., 1996) were constructed from
out the genomes of plants and animals cultivar IR36 based on size-selected DNA in
(Tautz and Renz, 1984). One of the most the 300–800-bp range. These libraries were
important attributes of microsatellite loci screened for the presence of (GA)n microsat-
is their high level of allelic variation, mak- ellites by plaque and colony hybridization.
ing them valuable as genetic markers. A pre-sequencing screening step was used
32 Chapter 2

to eliminate clones where the microsatellite Additional information based on genetic


repeat was too near one of the cloning sites mapping and ‘nearest marker’ informa-
to permit accurate design of primers and to tion provided the basis for locating a
determine which end should be sequenced total of 1825 designed markers along rice
with priority. The basic steps include: chromosomes.
Compared with library-derived SSRs,
● PCR amplification of clone inserts and
EST-derived SSRs are expected to dis-
determination of their lengths before
play slightly fewer polymorphisms as
sequencing. Short and long insert
there is pressure for sequence conserva-
clones are usually discarded.
tion in the coding regions (Scott, 2001).
● Selected clones are sequenced and
However, the availability of SSR markers
searched for SSRs.
from the expressed portion of the genome
● Sequences within motif classes are
might facilitate their transferability across
grouped and aligned using sequence
genera compared to the low efficiency
alignment software to identify redun-
of SSR markers that have been retrieved
dant sequences.
from gene-poor areas (Peakall et al., 1998).
● Oligonucleotide primers are designed
This approach could be used in plant spe-
for unique DNA sequences flanking
cies with minimal resources and research
non-redundant SSRs.
expenditure.
● Primers are tested and genotypes are
Once a plant species has been com-
screened for SSR length polymorphisms.
pletely sequenced, the entire set of available
An alternative source of SSRs is to SSRs in the genome can be easily accessed
utilize expressed sequence tag (EST) and through online databases. For example,
other sequence databases (e.g. Kantety the International Rice Genome Sequencing
et al., 2002). SSRs can be identified com- Project identified 18,828 di, tri and tetra-
putationally, using a BLAST query (see nucleotide SSRs that were over 20 bp in
Simple Sequence Repeat Identification length and developed flanking primers for
Tool available at www.gramene.org) and use as SSR markers (IRGSP, 2005). The loca-
available genomic or EST sequences. Using tions of these SSRs on the physical map of
this method, a total of 2414 new di-, tri- rice in relation to other genetic markers can
and tetra-nucleotide non-redundant SSR be found using the online Gramene Genome
primer pairs, representing 2240 unique Browser (http://www.gramene.org/Oryza_
marker loci, were developed and experi- sativa_japonica/index.html).
mentally validated in rice (McCouch et al., The usual method of SSR genotyping is to
2002). SSR-containing sequences that separate radio-labelled or silver-stained PCR
consisted of perfect repeat motifs (> 24 bp products by denaturing or non-denaturing
in length) flanked by 100 bp of unique PAGE using ethidium bromide or SYBR stain-
sequence on either side of the SSR were ing although distinguishing SSRs on agarose
chosen from GenBank. Primer pairs con- gels is sometimes possible (Fig. 2.4). These
taining 18–24 nucleotides devoid of sec- assays can usually distinguish alleles which
ondary structure or consecutive tracts of differ by 2–4 bp or more.
a single nucleotide, with a GC content of Semi-automated SSR genotyping
around 50% (Tm approximately 60°C) and can be carried out by assaying fluores-
preferably G- or C-rich at the 3' end were cently labelled PCR products for length
automatically designed. Using electronic variants on an automated DNA sequencer
PCR (e-PCR) to align these designed primer (e.g. Applied Biosystems and Li-Cor)
pairs against 3284 publicly sequenced rice (Fig. 2.4). One drawback of fluorescent
BAC and PAC clones (representing about SSR genotyping is the cost of end-labelling
83% of the total rice genome), 65% of the primers with the necessary fluorophores,
SSR markers hit a BAC or PAC clone con- e.g. 6-carboxy-fluorescine (FAM), hexachloro-
taining at least one genetically mapped 6-carboxy-flurescine (HEX) or tetrachloro-
marker and could be mapped by proxy. 6-carboxy-fluorescine (TET). SSR length
Markers and Maps 33

by several repeat units can often be


distinguished on agarose gels (Fig. 2.4).
SSRs assayed on polyacrylamide gels
typically show characteristic ‘stuttering’.
Agarose gel-based SSR genotyping Stutter bands are artefacts produced by
DNA polymerase slippage. Typically, the
most prominent stutter bands are +1 and −1
repeats (e.g. + or − 2 bp for a di-nucleotide
repeat), and, if visible, the next most prom-
inent stutter bands are +2 and −2 repeats.
Stuttering reduces the resolution between
PAGE gel-based SSR genotyping alleles such that 2- or possibly 4-bp differ-
ences between alleles cannot be sharply or
unequivocally distinguished on polyacry-
lamide gels. Figure 2.4 shows examples
of different genotyping systems used for
SSR analysis including multiplexing and
stutter bands.
Another source of noise is the incom-
plete addition of non-templated ade-
nine to PCR products thereby producing
adenylated (+A) and non-adenylated (–A)
DNA fragments (Magnuson et al., 1996).
Adding a ‘pigtail’ sequence (e.g. GTCTCTT)
Semi-automated SSR genotyping to the 5' end of the reverse primer pro-
motes the adenylation of the 3' end of the
156
forward strand (Brownstein et al., 1996),
thereby virtually eliminating the –A prod-
ucts and producing a more homogenous
158 set of fragments.
SSR markers are characterized by
their hypervariability, reproducibility, co-
156 158 dominant nature, locus specificity and
random dispersion throughout most genomes.
In addition, SSRs are reported to be more
Automated SSR genotyping using fluorescent variable than RFLPs or RAPDs. The advan-
labelling tages of SSRs are that they can be readily
analysed by PCR and are easily detected on
polyacrylamide gels. SSLPs with large size
differences can be also detected on agarose
gels. SSR markers can be multiplexed, either
functionally by pooling independent PCR
products or by true multiplex-PCR. Their
genotyping throughput is high and can be
Stutter bands and multiple alleles automated. In addition, start-up costs are
low for manual assay methods (once the
Fig. 2.4. Examples of genotyping systems used markers have been developed) and SSR
for SSR analysis. assays require only very small DNA samples
(∼100 ng per individual).
polymorphisms can be also assayed using The disadvantages of SSRs are the labour-
non-denaturing high pressure liquid chro- intensive development process particularly
matography (HPLC). SSR alleles differing when this involves screening genomic DNA
34 Chapter 2

libraries enriched for one or more repeat ing barley, soybean, sugarbeet, maize,
motifs (although SSR-enriched libraries can cassava and potato; typical SNP frequen-
be commercially purchased) and the high cies are also in the range of one SNP every
start-up costs for automated methods. 100–300 bp in plants (see Edwards et al.,
2007a for a review).
SNP SNPs may fall within coding sequences
of genes, non-coding regions of genes or in
A single nucleotide polymorphism or the intergenic regions between genes at dif-
SNP (pronounced snip) is an individual ferent frequencies in different chromosome
nucleotide base difference between two regions. In Arabidopsis the distribution of
DNA sequences. SNPs can be catego- SNPs was found to be even across the five
rized according to nucleotide substitu- chromosomes with the exception of cen-
tion as either transitions (C/T or G/A) or tromeric regions which contain few tran-
transversions (C/G, A/T, C/A or T/G). For scribed genes (Schmid et al., 2003). SNPs
example, sequenced DNA fragments from within a coding sequence will not neces-
two different individuals, AAGCCTA to sarily change the amino acid sequence of
AAGCTTA, contain a single nucleotide dif- the protein that is produced due to redun-
ference. In this case there are two alleles: dancy in the genetic code. A SNP in which
C and T. C/T transitions constitute 67% of both forms lead to the same polypeptide
the SNPs observed in humans, and about sequence is termed synonymous, while if
the same rate was also found in plants a different polypeptide sequence is pro-
(Edwards et al., 2007a). In practice, single duced they are non-synonymous. SNPs
base variants in cDNA (mRNA) are consid- that are not in protein coding regions may
ered to be SNPs as are single base inser- still have consequences for gene splic-
tions and deletions (indels) in the genome. ing, transcription factor binding or the
As a nucleotide base is the smallest unit sequence of non-coding RNA. Of the 3–17
of inheritance, SNPs provide the ultimate million SNPs found in the human genome,
form of molecular marker. 5% are expected to occur within genes.
For a variation to be considered a SNP, Therefore, each gene may be expected to
it must occur in at least 1% of the popula- contain ∼6 SNPs.
tion. SNPs make up about 90% of all human A variety of approaches have been
genetic variation and occur every 100–300 adopted for discovery of novel SNPs in a
bases. Two of every three SNPs involve the wide range of organisms including plants.
replacement of cytosine (C) with thymine These fall into three general categories
(T). This is supported by a genome-wide (Edwards et al., 2007b): (i) in vitro discov-
analysis in rice. A polymorphism data- ery, where new sequence data is generated;
base constructed to define polymorphisms (ii) in silico methods that rely on the analysis
between cultivars Nipponbare (from sub- of available sequence data; and (iii) indirect
species japonica) and 93-11 (from subspe- discovery, where the base sequence of the
cies indica) contains 1,703,176 SNPs and polymorphism remains unknown. On the
479,406 indels (Shen et al., 2004), which other hand, a large number of different SNP
equates to approximately 1 SNP/268 bp genotyping methods and chemistries have
in the rice genome. Using alignments of been developed based on various meth-
the improved whole-genome shotgun ods of allelic discrimination and detection
sequences for japonica and indica rice, platforms. A convenient method for detect-
SNP frequencies varied from 3 SNPs/kb in ing SNPs is RFLP (SNP-RFLP) or by using
coding sequences to 27.6 SNPs/kb in the the CAPS marker technique. If one allele
transposable elements with a genome-wide contains a recognition site for a restriction
measure of 15 SNPs/kb or 1 SNP/66 bp enzyme while the other does not, digestion
(Yu et al., 2005). Based on partial genomic of the two alleles will give rise to fragments
sequence information, SNP frequencies of different length. A simple procedure is
have been revealed in many crops, includ- to analyse the sequence data stored in the
Markers and Maps 35

major databases and identify SNPs. Four be bound to streptavidin-coated wells and
alleles can be identified when the complete denatured under alkaline conditions. An
base sequence of a segment of DNA is con- oligonucleotide probe complementary to
sidered and these are represented by A, T, G one allele is added to the single-strand target
and C at each SNP locus in that segment. DNA molecules. The differences in melting
Sobrino et al. (2005) assigned the major- curves are measured by slowly heating and
ity of SNP genotyping assays to one of four observing the changes in fluorescence of a
groups based on the molecular mechanisms: double-strand-specific, intercalating dye.
allele-specific hybridization, primer exten- The 5' nuclease or TaqMan assay, molecu-
sion, oligonucleotide ligation and invasive lar beacon and the scorpion assays are all
cleavage. These four are described below. examples of ASH SNP genotyping technolo-
Chagné et al. (2007) added three methods gies. Large-scale scanning of SNPs in a vast
to this list, sequencing, allele-specific PCR number of loci using allele-specific hybridi-
amplification, DNA conformation methods zation can be carried out on high-density
and also generalized the enzymatic cleav- oligonucleotide chips.
age method to include the invader assay 2. The Invader assay, also known as flap
and also dCAPS and targeting induced endonuclease discrimination, is based on
local lesions in genomes (TILLING). the specificity of recognition and cleavage
by a three-dimensional flap endonuclease
1. Allele-specific hybridization (ASH), also which is formed when two overlapping oli-
known as allelic-specific oligonucleotide gonucleotides hybridize perfectly to a target
hybridization, is based on distinguishing by DNA (Lyamichev et al., 1999). The cleaved
hybridization between two DNA targets dif- fragment may be labelled with a probe-
fering at one nucleotide position (Wallace specific fluorescent dye which fluoresces
et al., 1979). Allelic discrimination can be following probe cleavage due to spatial sep-
achieved using two allele-specific probes aration from the quencher. Alternatively, the
labelled with a probe-specific fluorescent flap may act as the invader probe in a sec-
dye and a generic quencher that reduces flu- ondary reaction to amplify the fluorescent
orescence in the intact probe. During ampli- signal (Invader squared) (Hall et al., 2000).
fication of the sequence surrounding the Third Wave Technologies Inc. (http://www.
SNP, probes complementary to the DNA tar- twt.com) has manufactured an Invader assay
get are cleaved by the 5' exonuclease activ- for flap endonuclease discrimination which
ity of Taq polymerase. Spatial separation of can be carried out in solid phase using
the dye and quencher results in an increase oligonucleotide-bound streptavidin-coated
in probe-specific fluorescence which can be particles (Wilkins-Stevens et al., 2001).
detected with a plate reader. 3. Primer extension is a term used to
Under optimized assay conditions, describe mini-sequencing, single-base exten-
the SNP can be detected by the difference sion or the GOOD assay (Sauer et al., 2002).
in Tm of the two probe–template hybrids A popular method which was designed
as only the perfectly matched probe–target specifically for genotyping SNPs is the
hybrids are stable and those with one-base mini-sequencing technique (Syvänen, 1999;
mismatch are unstable. To increase the reli- Syvänen et al., 1990). The method forms the
ability of SNP genotyping the probes should basis of a number of methods for allelic dis-
be as short as possible. Originally, ASH crimination. The robust detection of known
used the dot blot format in which probes are mutations employs oligonucleotides which
hybridized to membrane-bound genomic anneal immediately upstream of the query
DNA or PCR fragments. However, the SNP and are then extended by a single
more advanced PCR-based dynamic allele- dideoxynucleotide triphosphate (ddNTP)
specific hybridization (DASH) method uses in cycle sequencing reactions. The fidel-
a microtitre plate format (Howell et al., ity of thermostable proof-reading DNA
1999). Since one of the PCR primers is bioti- polymerases guarantees that only the com-
nylated at the 5' end, the PCR products can plementary ddNTP is incorporated. Several
36 Chapter 2

detection methods have been described on an automated sequencer and rolling-


for the discrimination of primer extension circle amplification with one of the ligation
(PEX) products. Most popular is the use of probes bound to a microarray surface.
ddNTP terminators that are labelled with
different fluorescent dyes. The differentially DETECTION SYSTEMS. There are several detec-
dye-labelled PEX products can readily be tion methods for analysing the products of
detected on charge coupled device camera- each type of allelic discrimination reaction:
based DNA sequencing instruments. gel electrophoresis, fluorescence resonance
In the case of a single base extension energy transfer (FRET), fluorescence polari-
(SBE), a primer is annealed adjacent to a zation, arrays or chips, luminescence, mass
SNP and extended to incorporate a ddNTP spectrophotometry, chromatography, etc.
at the polymorphic site. SNaPshot (Applied Fig. 2.5 summarizes the enzyme chemistry,
Biosystems) uses differential fluorescent demultiplexing and detection options in
labelling of the four ddNTPs in a SBE reac- SNP genotyping.
tion allowing fluorescent detection of the Fluorescence is the most widely applied
incorporated nucleotide. SNP-IT (Orchid detection method currently employed for
Biosciences) is also based on fluorescent high-throughput genotyping in general. The
SBE and uses solid phase capture and detec- use of fluorescence has been teamed with
tion of extension products. The GOOD assay a number of different detection systems
involves extension of a primer modified including plate readers, capillary electro-
near the 3' end with a charged tag to increase phoresis and DNA arrays. In addition to
sensitivity to mass spectrometry detection. fluorescence detection, mass spectrometry
Alternatives to SBE include pyrose- and light detection represent novel appli-
quencing, allele specific primer extension cations of established technology for high-
and the amplification refractory mutation throughput genotyping of SNPs.
system. Real-time monitoring of PEX relies
on the bioluminometric detection of inor- PLATE READERS. There are many fluores-
ganic pyrophosphate released upon incor- cent plate readers capable of detecting
poration of dNTP (Ahmadian et al., 2000). fluorescence in a 96- or 384-well format
4. The oligonucleotide ligation assay (OLA) (Jenkins and Gibson, 2002). Most models
for SNP typing is based on the ability of use a light source and narrow band-pass
ligase to covalently join two oligonucle- filters to select the excitation and emis-
otides when they hybridize next to one sion wavelengths and enable semi-quan-
another on a DNA template (Landegren titative steady state fluorescence intensity
et al., 1988). Both primers must have perfect readings to be made. This technology has
base pair complementarity at the ligation been applied to genotyping with TaqMan,
site which makes it possible to discriminate Invader and rolling-circle amplification.
two alleles at a SNP site. The OLA has been Fluorescence plate readers are also avail-
modified to exploit a thermostable DNA able which allow measurement of addi-
ligase, interrogate PCR templates and uti- tional fluorescence parameters including
lize a dual-colour detection system. OLA polarization, lifetime and time-resolved
also gave rise to another technique, Padlock fluorescence and FRET.
probes (Nilsson et al., 1994), which uses
oligonucleotide probes that ligate into cir- DNA ARRAY. Oligonucleotide arrays bound to
cles upon target recognition and isothermal a solid support have been proposed as the
rolling-circle amplification. As reviewed future detection platform for high-through-
by Chagné et al. (2007), there are several put genotyping. Two distinct approaches
applications which have been developed to have been adopted involving ASH whereby
detect SNP variation using OLA, including the oligonucleotide directly probes the
colorimetric assays in ELISA plates, sepa- target and tag arrays that capture solution
ration of the ligated oligonucleotides that phase reaction products via hybridization
have been labelled with a fluorescent dye to their anti-tag sequences.
Markers and Maps 37

Enzyme chemistry Demultiplexing Detection method Platform/company

Illumina
BeadArray™
Allele-specific Luminex 100 Flow
Semi-homogen.
extend  ligate Cytometry

Sequenom iPlex™
Oligonucleotide Solid phase Mass Spec.
ligation assay microspheres Fluorescence

ABI SNPlex™
Single nucleotide
primer extension Homogeneous Mass Microarray
spectrometry minisequencing

ABI TaqMan™
Capillary 5′-Nuclease
Allele-specific
electrophoresis
hybridization Fluor. res. energy
transfer (FRET) ABI SNaPshot™

Solid phase
‘DASH’,
microarray
Amplicon Tm
Allele-specific Fluorescence
PCR polarization Perkin-Elmer
FP-TDI

Fig. 2.5. Chemistry, demultiplexing, detection options in SNP genotyping. From Syvänen (2001) reprinted
by permission from Macmillan Publishers Ltd.

The Affymetrix® Genome-Wide Human MASS SPECTROMETRY. Many genotyping tech-


SNP Array 6.0 features more than 1.8 mil- niques involve the allele-specific incorpora-
lion markers for genetic variation, includ- tion of two alternative nucleotides into an
ing more than 906,600 SNPs and more oligonucleotide probe. Due to the inherent
than 946,000 probes for the detection of molecular weight difference of DNA bases,
copy number variation. The SNP Array 6.0 mass spectrometry can be used to determine
enables high-performance, high-powered which variant nucleotide has been incorpo-
and low-cost genotyping (http://www.affy rated by measuring the mass of the extended
metrix.com). Luminex has developed a primers and this approach has been applied
panel of 100 bead sets with unique fluores- primarily to genotyping by primer exten-
cent labels, identifiable by flow analyser. sion using the MALDI-TOF (matrix assisted
The bead sets can be derivatized with allele laser desorption/ionization-time of flight)
specific oligonucleotides to create a bead- mass spectrometry approach. The MALDI-
based array for multiplex genotyping by TOF method is particularly advantageous
ASH. for detection of PEX products in multiplex.
Tag arrays are generic assemblies of The polyanionic nature of oligonucle-
oligonucleotides that are used to sort or otides results in low signal to noise ratios,
deconvolute mixtures of oligos by hybri- particularly for longer (> 40 mer) fragments.
dization to the anti-tag sequences. The This has been addressed by specifically
current Affymetrix GeneChip® Universal cleaving long probes by acidolysis of P3'-N5'
Tag Arrays are available in 3, 5, 10 or 25 K phosphoramidate bonds and by a combined
configurations and contain novel, bio- approach whereby the probe is digested to a
informatically designed tag sequences very short fragment which has been deriva-
that result in minimal potential for cross- tized to lower its charge to a single positive
hybridization. or negative charge.

You might also like