Professional Documents
Culture Documents
Eugene R. Zabarovsky
Microbiology and Tumor Biology Center, Karolinska Institute, Stockholm, Sweden
1 Principles 443
2 Techniques 445
2.1 General Characteristics of λ-based Vectors Used for Construction of
Genomic Libraries 445
2.2 Construction of General Genomic Libraries 447
2.3 Construction of Jumping and Linking Libraries. Use of Linking
and Jumping Clones to Construct a Physical Chromosome Map 451
4 Summary 465
Bibliography 465
Books and Reviews 465
Primary Literature 465
Encyclopedia of Molecular Cell Biology and Molecular Medicine, 2nd Edition. Edited by Robert A. Meyers.
Copyright 2004 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
ISBN: 3-527-30547-5
442 Genomic DNA Libraries, Construction and Applications
Keywords
Blue–white Selection
Not really selection but color identification. Vectors carrying the β-galactosidase (lacZ)
gene (or part of it) produce blue plaques in the presence of 5-bromo-4-chloro-3-indolyl-
β-D-galactopyranoside (X-gal). If this gene is located in a stuffer fragment, then all
recombinants will form white plaques and parental vectors will produce blue plaques
in the presence of X-gal.
Genetic Selection
Usually, in cloning, selection against parental, nonrecombinant molecules in favor of
the recombinant. For λ-based vectors used for construction of genomic libraries, the
two most commonly used types of selection are Spi and supF. Spi+ phages carrying red
and gam genes cannot grow in E. coli lysogens carrying prophage P2; since, however,
the majority of the vectors contain these genes in a stuffer fragment, only recombinant
phages can grow in such E. coli strains. Selection for supF exploits λ vectors carrying
amber mutations. These vectors cannot replicate without the supF gene, which must be
present either in the host or in the cloned insert. If the insert carries the supF gene,
only recombinant phages will be able to replicate in an E. coli host without the
suppressor gene.
Polylinker
A short DNA fragment (in the vector) that contains recognition sites for many
restriction enzymes, which can be used for cloning DNA fragments into this vector.
Restriction Enzyme
An enzyme that recognizes a specific sequence in DNA and can cut at or near this
sequence. In cloning procedures, the most commonly used enzymes produce specific
protruding (sticky) ends at the ends of the DNA molecule. Each enzyme produces
unique sticky ends. The DNA molecules possessing the same sticky ends can be
efficiently joined with the aid of DNA ligase (see ligation).
fragment was derived, that fragment acquires the property of a DNA marker. Such
DNA markers are a prerequisite for physical and genetic mapping of the genome
of the organism. DNA markers are also of importance for the diagnosis of genetic
diseases. DNA markers can be divided into several different classes depending on
the way in which the markers were selected among the fragments of genomic DNA.
Examples of such classes are anonymous, micro- and minisatellites, restriction
fragment length polymorphism (RFLP) markers, and NotI linking clones.
Vectors and clone libraries of different types can be used to clone markers.
Lambda-based vectors and genomic libraries of different kinds are commonly used
for this purpose. Many different variants of λ-based vectors that combine features
of different cloning vehicles (plasmids, M13 and P1 phages) have been created
for this purpose. The use of each vector is usually limited to a specific task:
the construction of general genomic libraries (which contain all genomic DNA
fragments) or special genomic libraries (which contain only a particular subset of
genomic DNA fragments). Among these special libraries, NotI linking and jumping
libraries have particular value for physical/genetic mapping and sequencing of the
human genome. Shotgun and slalom libraries are usually used for sequencing
purpose and comparative genomics.
Allele 2
6 kb
VNTP
markers Repeats
Allele 1
XbaI
Eco RI
Avr II
BamHI
Sfi l
XhoI
BamHI
EcoRI
XbaI
XhoI
SacI
Sfi I
Avr ll
XbaI
SacI
XhoI
Bam HI
Hind III
Sfi I
Nae I
Not I
Xho I
Eco RI
Avr ll
XbaI
XbaI
Avr II
EcoRI
XhoI
Not I
NaeI
Sfi I
Hind III
Bam HI
XhoI
Sac I
XbaI
Sac I
λ GEM-11 14.0 9.2 λSK6 9.2
13.0
20.3 Sal I 20.3 lacZO SalI lacZO
T3 T7
Sac I
KpnI
Sm aI
NcoI
Nhe I
Avr II
Sfi I
NaeI
Not I
XhoI
EcoRI
BamHI
Sal I
Sal I
BamHI
EcoRI
XhoI
Not I
NaeI
Sfi I
Avr II
NheI
NcoI
XbaI
XbaI
XhoI
Sac I
Hind III
BamHI
Eco RI
Not I
Sal I
XbaI
XbaI
Sal I
Not I
EcoRI
BamHI
Hind III
Sac I
XhoI
XbaI
Mst II
ClaI BspMlI
λ DASHII 13.0 9.2 λSK17 13.0 XhoI
XbaI
Eco RI
Xho I
Sal I
Sac I
Not I
Sac I
XbaI
XbaI
Sac I
Not I
Sac II
Sal I
Xho I
Eco RI
XbaI
XbaI
BamHI
Xho I
Eco RI
Avr II
Nae I
Nae I
Avr II
Eco RI
Xho I
BamHI
SmaI
KpnI
SacI
Sal I
Bam HI
Eco RI
Eco RI
Bam HI
Sal I
pBR322 M13
ori 3.9 ori
Sal I
I-SceI
SpeI
Not I
XmaIII
XmaIII
Not I
SpeI
I-Scel
SalI
20.3 Sal I
Fig. 2 Examples of λ-based vectors: standard λ vectors (λGEM11, λFIXII, λDASH, λEMBL3, λSK6) and diphasmid vectors (λSK17 and λSK40). Sizes
are in kilobases. Not all restriction sites are shown. Heavy lines represent vector arms, thin lines denote the stuffer fragment; open boxes mark plasmid
and M13 sequences; lacZO is the lac operator sequence. T3, T7, SP6 – promoters for T3, T7, and SP6 RNA polymerases.
Genomic DNA Libraries, Construction and Applications 447
Cosmids are essentially plasmids that of phasmid and diphasmid vectors. Often
contain the cos region of phage λ a genomic library is constructed in a λ
responsible for packaging of DNA into phage and then converted in its entirety to
the phage particle. The advantages of cos- plasmid form.
mids are easy handling (as with plasmids)
2.2
and large cloning capacity. Since the plas-
Construction of General Genomic Libraries
mid body is usually small (3–6 kb), large
DNA molecules (46–49 kb) can be cloned Representativity is one of the most im-
in these vectors. portant features of a genomic library. In a
Phasmids are λ phages that have an ‘‘representative’’ genomic library, every ge-
inserted plasmid. They have the same ba- nomic DNA fragment will be present in at
sic features as λ phage vectors, but the least one of the recombinant phages of the
inserted foreign DNA fragment can be library. In practice, however, this is diffi-
separated from the body of the phage DNA cult to achieve. Some genomic fragments
and converted into the plasmid form. After are not clonable because of the strategy
the conversion, the cloned DNA fragment used for construction of the genomic li-
will exist as a recombinant plasmid. brary. For example, if the maximal cloning
Hyphages represent another type of λ- capacity of the vector is 18 kb and EcoRI
based vectors. They were constructed from digestion is used to construct the library,
M13 vectors with a built-in cos site of λ. no genomic EcoRI fragments bigger than
Since these vectors have the main features 18 kb will be present. In some cases, ge-
of M13 vectors, they can be obtained in nomic DNA fragments can suppress the
single-stranded form. Their distinctive fea- growth of the vector or the host cell, with
ture is the capability to be packaged with the result that its cloning can be restricted
high efficiency into λ-phage-like particles. to specific vector systems.
This decreases the chance of recovering The important reason for decreased
nonrecombinant vectors and opens the representativity is the different replication
possibility of constructing a representative potential of different recombinant phages.
genomic library in single-stranded form. Most researchers work with amplified
Diphasmids are vectors, which offer the libraries. The ligated DNA molecules
opportunity to combine the advantages of are packaged into the λ phage particles
phages (λ and M13) and plasmids. Diphas- and plated on a lawn of E. coli cells
mids can be divided into two classes: those (usually many petri dishes are used for
that can replicate as phage λ (an improve- such plating). Then all λ phage particles
ment over phasmids) and those that are are eluted from the plaques, and the
incapable of replicating as phage λ (i.e. liquid eluted from all the petri dishes is
a cosmid capable of being packaged into mixed. Glycerol or dimethyl sulfoxide is
phage M13 particles). added to the eluate, and the aliquots are
In some cases, it is more convenient to kept at – 76 ◦ C. This procedure is called
work with a genomic library in plasmid amplification of the library.
than in λ phage form. The construction of With amplification, a library can be kept
a representative genomic library directly for many years and can be used for many
in a plasmid vector has several drawbacks experiments by many researchers. On
and difficulties. However, all these prob- the other hand, each recombinant phage
lems can be easily solved with the help present in the library gives a single plaque
448 Genomic DNA Libraries, Construction and Applications
at the first plating, and since different representativity of the library. To calculate
recombinant phages differ in their growth the percentage of recombinants, one can
potential, there may be differences of 100 use genetic (Spi) selection as in the case of
times in the abundance of clones after λEMBL vectors. Another approach relies
amplification. This means that in the on blue–white color identification (e.g. λ-
amplified library, some of the phages are Charon series). A third class of vectors
present 100 times more often than others. has both genetic selection and blue–white
In this case, to recover all recombinant color identification (λSK4, λSK6). There
phages obtained after packaging into λ are three commonly used ways to construct
phage particles, one needs to plate 100 genomic libraries (Table 1, Fig. 3).
times more phages than that obtained after The original (‘‘classical’’) method in-
the original plating. Since such quantities cludes generation of sheared genomic
are difficult to achieve in practice, some DNA fragments using physical or en-
recombinant phages are virtually lost from zymatic manipulations followed by the
the library after amplification. physical separation of fragments of a
How is the representativity of a library particular size using, for example, ultra-
estimated? A library is considered to be centrifugation or gel electrophoresis. The
representative if after the first plating (be- vector DNA is digested with two (or even
fore amplification) it contains a number three) restriction enzymes whose recogni-
of recombinant clones together, contain- tion sites are located in the polylinker and
ing genomic DNA fragments equal to 7 are separated by a few base pairs. The arms
to 10 genome equivalents. For example, and the stuffer piece are purified from the
human genomic DNA contains approx- small oligonucleotide molecules released
imately 3 × 109 bp. If the vector contains after the digestion by, for example, pre-
on an average 15 kb inserts, the representa- cipitation with polyethylene glycol (PEG)
tive library should contain 1.4 to 2.0 × 106 6000. The stuffer piece and both arms will
recombinant clones. now have different sticky ends, preventing
The way in which the genomic DNA re-creation of the original vector molecules
fragments are produced for cloning is during subsequent ligation with genomic
also important. The more randomly the DNA fragments.
genomic DNA is broken, the more repre- In the second ‘‘dephosphorylation’’ ap-
sentative a library can be obtained. Clearly, proach, the phage arms are prepared by si-
the EcoRI enzyme (6 bp recognition site) multaneous digestion with two restriction
will cut genomic DNA less randomly than enzymes as shown earlier. Genomic DNA
Sau3AI (4 bp recognition). Probably, the is partially digested to the extent that DNA
shearing of DNA molecules using phys- fragments with sizes in the range of 15
ical methods (e.g. syringe, sonication) is to 20 kb will represent the majority of the
the most reliable way to obtain randomly products. These DNA fragments are de-
broken DNA molecules. phosphorylated to prevent their ligation to
An important characteristic feature of a each other and are then ligated to the vector
library is the percentage of recombinants. arms. If too big or too small genomic DNA
For most purposes, if a library contains fragments are ligated to the phage arms,
more than 80% of recombinant phages, size limitations will make it impossible for
it is better to omit the genetic selection these recombinant molecules to yield vi-
procedure because it usually decreases the able phages. Compared to the preceding
Tab. 1 Comparison of three basic methods used in construction of representative genomic libraries.
Method DNA [µg] DNA Self-ligation of Effectiveness of packaging Number of Genetic selection
needed to fractionation packaging necessary to remove
construct a vector genomic Per Per reactions to get nonrecombinants
representative DNA DNA microgram of microgram of representative
genomic vector DNA genomic DNA library at
library maximal
efficiency
per microgram
of genomic DNA
Classical 100–1000 Yes Yes Yes 105 –107 105 –107 1 Yes
Dephosphorylation 5–10 No Yes No 104 –105 105 –106 3–5 Yes
Partial filling in 5–10 No No No 105 –107 105 –107 1 No
Genomic DNA Libraries, Construction and Applications
449
450 Genomic DNA Libraries, Construction and Applications
Packaging in vitro
into λ phage
particles
Packaging in vitro
into λ phage
particles
ori
M13
ori Ampr R
lacI S
SK18 B
A
lacZ H
R L
Fig. 4 One approach to the construction of genomic libraries in cosmid vectors (not all
restriction sites are shown): A, AccI; B, BamHI; H, HindIII; R, EcoRI; S, SmaI.
method, this procedure is quick, and rep- fragments make it even more important to
resentative libraries can be obtained from prevent self-ligation of vector fragments.
a small quantity of genomic DNA. Many similar approaches have been sug-
The third ‘‘partial filling-in’’ method, gested and one of them is shown in (Fig. 4)
also avoids fractionation steps (Fig. 3). where the formation of vector-concatemers
Phage arms are prepared as described be- is prevented by partial filling in.
fore (in this particular case SalI and EcoRI A similar effect can be achieved by
are shown, but many other combinations dephosphorylation or by digestion at the
can be used), and the sticky ends produced first step with AccI and SmaI instead of
are partially filled in with the Klenow frag- EcoRI and HindIII. SmaI produces blunt
ment of DNA polymerase I (or other DNA ends and AccI gives sticky ends with only
polymerase) in the presence of dTTP and two protruding base pairs. The ligation of
dCTP. Genomic DNA partially digested these ends will be far less effective than
with Sau3AI is also partially filled in, but that for BamHI and Sau3AI sticky ends
in the presence of dATP and dGTP. Un- (four protruding base pairs).
der such conditions, self-ligation of vector
2.3
arms or genomic DNA is impossible.
Construction of Jumping and Linking
Genomic libraries can be constructed in Libraries. Use of Linking and Jumping
cosmids using the same approaches just Clones to Construct a Physical
described. The absence of selection against Chromosome Map
nonrecombinant vector and the possi-
bility of packaging into phage particles For long-range mapping and cloning of
concatemers composed solely of cosmid large stretches of genomic DNA, the
452 Genomic DNA Libraries, Construction and Applications
two most widely used methods are con- There are two main approaches for
struction of overlapping DNA sequences the construction of NotI jumping and
(contigs) using chromosome walking (e.g. linking libraries. According to the first
BAC cloning) and chromosome jumping. method (Fig. 5a), jumping libraries are
The technique of BAC cloning is now used constructed as follows: DNA of high
by many laboratories. Still, this approach molecular mass, isolated in low-melting
is not devoid of problems and drawbacks. agarose, is completely digested with NotI.
These problems could be diminished, The DNA is ligated at very low concentra-
however, by using jumping/linking li- tion, in the presence of a dephosphorylated
braries. Moreover, jumping and linking plasmid containing a marker (supF gene),
libraries can be used independently for and is then circularized, trapping the supF
construction of a long-range restriction gene, which acts like a marker to select
map using pulsed field gel electrophore- clones that contain the ends of a long frag-
sis (PFGE). Jumping clones contain DNA ment. Another enzyme, one that has no
sequences adjacent to neighboring NotI recognition site in the plasmid, is used
sites, and linking clones contain DNA to digest the large circular molecules into
sequences surrounding the same restric- small fragments, each of which is cloned
tion site. in a vector phage carrying amber muta-
The two best-known kinds of jumping tions. Recombinant phages containing the
libraries are the NotI jumping library and plasmid with the two terminal fragments
the ‘‘general’’ jumping (hopping) library. are selected in an E. coli strain lacking the
The basic principle of both methods is suppressor gene.
to clone only the ends of large DNA The linking library can be constructed
fragments rather than continuous DNA in different ways. In the original protocol,
segments, as in BAC clones. Internal the genomic DNA is partially digested
DNA is deleted by controlled biochemical with Sau3A and size selected to obtain
techniques. The main difference is that in 10 to 20 kb fragments. The DNA is then
the first type of library, complete digestion diluted and circularized in the presence
with a rare cutting enzyme (NotI is the of supF marker plasmid. The circular
most popular) is used, and the second products are digested with NotI, ligated
is based on a partial digestion with a into a NotI-digested suppressor-dependent
frequently cutting enzyme, followed by vector (NotEMBL3A), and plated on a
isolation of DNA fragments of desired size. suppressor-negative host.
Using the first type of library, it is possible In another approach, DNA from a total
to jump over long distances (>1000 kb), genomic library in a circular form (e.g. cos-
but only from certain starting points mid) is digested with NotI, and a selectable
(i.e. those containing the recognition site marker (e.g. resistance to the antibiotic) is
for the rare cutting enzyme). Using the inserted into recombinants containing this
hopping library, it is possible to start site. Then these recombinants are selected
jumping from practically any point and by their resistance to the antibiotic.
to cover a defined but shorter distance The most important drawback is that all
(<150 kb). Only the first type of jumping these methods used to construct linking
libraries can be used in conjunction with libraries exploit strategies and vectors
linking libraries to create genomic maps, different from those used to construct
as described next. jumping libraries. Thus, some fragments
Genomic DNA Libraries, Construction and Applications 453
Genomic DNA
JUMPING LINKING
NotI complete Sau 3A partial digestion
digestion Selection of 20-kb size
DNA fragments
B B
B
B B B Circularization
in presence
of supF marker
B
B B
N
B B
Ligate into
B B λ vector N N
Ligation 5′ GATC 3′
3′ CTAG 5′
Partial filling-in
dATP+ dGTP
5′ GATC GA 3′
Not I 3′AG CTAG 5′
(cannot ligate)
present in one library (e.g. jumping) will An integrated approach for construc-
be absent in another (e.g. linking), which tion of jumping and linking libraries is
creates serious problems for the use of outlined in Fig. 5(b). The most important
these libraries in mapping. feature here is that the same vectors and
454 Genomic DNA Libraries, Construction and Applications
protocol are used for construction of both linking library. An obvious difference
libraries. between this approach and the preced-
For the linking library, genomic DNA ing ones is that the procedure combines
is completely digested with BamHI. Sub- a biochemical selection for NotI jump-
sequently, the DNA is self-ligated at a ing fragments with improved ligation
very low concentration (without a supF kinetics during the preparation of the li-
marker) to yield circular molecules as the braries (see Table 2 for a comparison of
main product. To eliminate any linear two methods).
molecules, the sticky ends are partly filled In theory, a linking library is sufficient
in with the Klenow fragment in the pres- to construct a physical chromosomal
ence of dATP and dGTP. Since the Klenow map. When linking clones are used to
fragment also has exonuclease activity, all probe a PFGE genomic blot (NotI-digested
the BamHI sticky ends are neutralized genomic DNA), each clone should reveal
and nearly all ends generated upon ran- two DNA fragments, which are adjacent
dom DNA breakage become unavailable in the genome. Thus, in principle, one
for ligation. Subsequently, the DNA is cut should be able to order the rare cutting
with NotI and cloned in λSK4, λSK17, and sites with just a single library and one
λSK22 vectors. digest, although it will generally not
The same strategy is applied in the con- be possible to distinguish between two
struction of a NotI jumping library. One fragments of the same size. To resolve such
initial step is added: genomic DNA is ambiguities, it is important to use several
fully digested with NotI and self-ligated different libraries, each for a particular
at a very low concentration. The sub- enzyme, and to overlap the resulting
sequent steps are the same as for the patterns just as in ordinary restriction
Tab. 2 Comparison of two main methods for construction of Not I jumping libraries.
Jumping clones
2 3 4 5 6 7
Linking clones
Fig. 6 Long-range mapping using jumping and linking libraries.
fragment analysis. To accomplish this for discriminate even between different in-
the whole human genome will be very stances of the same class of repeats.
laborious. However, for small stretches
of the genome containing 5 to 10 NotI
sites, this approach can be efficient. The 3
use of jumping and linking libraries in Applications and Perspectives
a complementary fashion simplifies this
approach (Fig. 6). 3.1
Moreover, by cross-screening the two Cloning DNA Markers Specific for a
libraries, it should be possible in prin- Particular Chromosome
ciple to jump from clone to clone
(–jumping–linking–jumping, etc.), to It is important, for many purposes, to clone
generate an ordered map without us- individual DNA markers for a specific
ing PFGE techniques at all. One of the chromosome. The most straightforward
main problems with this approach is approach for such purposes is to prepare
that the presence of very CG-rich re- special libraries that contain recombinant
gions and repeats in the human DNA clones from a particular chromosome.
may result in cross-hybridization between One approach is based on the use of the
different clones. fluorescence-activated cell sorter (FACS)
In the shotgun sequencing approach system. FACS operates on the principle
for long-range genome mapping, the hy- of rapid analysis of suspended particles,
bridization technique is replaced by se- single cells, or even chromosomes. A
quencing. Jumping and linking clones suspension of chromosomes stained with
are sequenced from the ends and, sub- fluorescent dyes (usually Hoechst 33258
sequently, the linear order of the NotI and chromomycin A3) passes through
clones on a chromosome can be estab- the focus of a laser beam that excites
lished using a computer program. Even a the DNA-bound fluorescent dyes. Equal-
20 bp sequence is likely to uniquely iden- sized droplets are formed by ultrasonic
tify a sequence in the human genome. dispersion, and the droplets containing the
The sequence data provide a means to desired chromosome as indicated by the
456 Genomic DNA Libraries, Construction and Applications
No PCR amplification
20 kb 1 kb 12 kb 0.5 kb 0.8 kb 6 kb
Human
chromosomal ZAlu5
DNA Alu repeat Different Alu-specific
primers ZAlu3
DNA markers. The approach is mainly cosmids are present in one YAC, which in
used in two variants. other YACs, and which are present in one
In one variant, Alu-PCR is done using but absent in another. Such an approach
DNA from both cell lines, and the products is also useful for mapping.
of the reactions are separated by agarose Another approach to obtaining region-
gel electrophoresis. Some bands present specific libraries is to use chromosome
in the products from HCL1 will be absent microdissection to physically remove the
among the products from HCL2. These chromosomal region of interest; the
bands can be excised and cloned, giving minute quantities of microdissected DNA
markers localized in the deletion. The can be subjected to a microcloning proce-
disadvantage of this approach is that dure. Spreads of human chromosomes are
usually such Alu-PCR results in a large made and stained using standard cytoge-
number of products that have a very netic techniques. DNA from an individual
complex pattern and look like a smear band is then cut from the chromosome
on the gel. Among the solutions to this using ultrafine glass needles or is isolated
problem that have been suggested is the with the help of laser equipment. In the
use of more specific primers (only for a latter case, all other chromosomes are de-
subset of the Alu repeats), or genomic stroyed by the laser, and intact DNA is
DNA digested with restriction enzyme. present only in the chromosome of in-
Another solution is to use hybrid cell lines terest. DNA obtained from only a few
that contain only small pieces of human (2–20) chromosomes is enough for con-
chromosomes. structing a region-specific library. This
The second variant of the Alu-PCR DNA is amplified using PCR and cloned
approach is mainly used in connection in plasmid or λ vectors.
with sources that contain only a limited
amount of human material: YAC clones 3.2
and radiation hybrid cell lines containing CpG Islands as Powerful Markers for
small pieces of the human chromosomes. Genome Mapping; CpG Islands and
The YACs can, for example, be used for Functional Genes
Alu-PCR and the total products of the
PCR reaction can be used as a probe to Although human DNA is highly methy-
screen genomic libraries (e.g. in cosmids). lated, stably unmethylated sequences
The hybridization pattern reveals which (about 1% of the genome) have been
458 Genomic DNA Libraries, Construction and Applications
N3 N1 N2 N*3
N1 N2
R R R R RR R R R R R R R R R R
R-digestion
N1 N2 N3 N1 N*3
R R Dilution and R R
R R self ligation R
R
N3 N1 N*3
N1 N2 N-digestion R
R R R R
R N3 R N3
N1 N2 N1 N1 N*3
N1 N2
Not I linker ligation
R R R R
N1 N1 N2 N2 N3 N3 N1 N1
dNTP PCR amplification with dUTP
R R R R
N1 N1 N2 N2 N3 N3 1:100 N1 N1
Denaturation, hybridization
N2 R N2 N1 N1 N1 N1
R R
N3 N3
N1 N1 N1 N1
UDG, mung bean nuclease
N2 R N2 N3 R N3
DNA A DNA B
x x
B B B B B B
B B Bam HI
Ligate linkers B B B B
B B
B B B B
x x
PCR amplification with
dUTP( ) and dCTP and biotinylated
Bm mB
d5mCTP(m) primers (b) B B
B B b
m Bm mB b
Bm m B b b B B
x x m b b
m
Homohybrids A Heterohybrids Denaturation,
hybridization
Bm mB Bm mB Homohybrids B
x x B B B B
b
m b b b
b
Bm mB B m mB
m b Digest with Mvnl
(destroy homohybrids B)
Bm mB Bm mB
x x B B
m b b
B m mB b
Bm mB
b Mung bean nuclease
m
(destroy imperfect hybrids
Homohybrids A and single-stranded DNA)
Heterohybrids
Bm B Bm mB Bm mB Bm m B
x x
m m b b
b b
Purification of biotinylated
molecules with streptavidin beads,
PCR amplification
Fig. 9 General scheme of the CIS, cloning of identical sequences, procedure. The same
enzymes are used as in the CODE method but in a different order, and the result
is opposite.
valuable. This method involves end se- make it possible to locate virtually every
quencing of large- (PAC, BAC, cosmids) gene in a genome, for more detailed (com-
and small-insert (2 and 10 kb) clones. parative) study. Furthermore, since the
DNA fragments in the small-insert clones efficiency of contig assembly in the slalom
are generated by physical shearing of approach is virtually independent of se-
whole genomic DNA. All resulting reads quence read length, even short sequences,
are joined in one sequence with special as produced by rapid high-throughput se-
computer programs. The WGS method quencing techniques suffice to complete
requires the generation of sequences cov- a physical map and sequence scan of a
ering the whole genome 10 to 15 times. small genome. A combination of these new
If sequence coverage is less, then the con- sequencing techniques with the slalom ap-
tig assembly process cannot be completed, proach increases the power of the method
and sequences and clones will just repre- 10 to 50 times more and makes it an effi-
sent islands, without connection or order. cient tool for comparative genomics. The
Therefore, despite impressive technologi- main principle of the slalom libraries is
cal progress, mapping and sequencing of shown in Fig. 10.
even small bacterial genomes is expensive Two standard genomic EcoRI- and
and laborious. BamHI-digested libraries are constructed
After completion of the genomic se- and they will completely cover the whole
quence from one organism, there will genome. The problem is how to put EcoRI
be a great demand, in many cases, for and BamHI fragments in the correct order.
comparison with the genomes of other in- It can be solved using the connecting
dividuals, related species, pathogenic and library. The connecting library can be
nonpathogenic strains, and so on, in the constructed as follows: DNA isolated ‘‘en
growing field of comparative genomics. masse’’ from an EcoRI library is digested
Such comparisons are highly relevant to with BamHI, circularized in the presence
our understanding of human and animal of the Kanr gene, and plated on agar
health, evolution, and ecology. with kanamycin. The clones isolated in
An efficient strategy for simultaneous this manner will be identical in structure
genome mapping and sequencing was re- to the clones from an EcoRI jumping
cently developed. The approach is based on library prepared in the classical way.
slalom libraries, which combine features By comparing end sequences in these
of general genomic, jumping, and linking three slalom libraries, all clones can be
libraries. First experiments demonstrated positioned relative to each other and a
the feasibility of the approach, and showed minimal contig of overlapping clones may
that the efficiency (cost-effectiveness and be established.
speed) of existing mapping/sequencing
methods can be improved at least 5- to 3.6
10-fold. The slalom allows the establish- Restriction-site-tagged Microarrays to Study
ment of a physical map, with minimal CpG-Island Methylation
sets of overlapping clones, which will pin-
point differences in genome organization Methylation, deletions, and amplifications
between organisms. At the same time, con- of cancer genes constitute important
siderable sequence coverage of the genome mechanisms in carcinogenesis, and CGI
(about 50%) will be achieved. This will (CpG-island-containing) microarrays were
Genomic DNA Libraries, Construction and Applications 463
Slalom library
1 2 6 7
Eco RI
6 7
1 2
Connecting
5 8
3 4
Contig of
overlapping clones
Genomic DNA
3 4 5 8 9
this NotI site is deleted or methylated, then to isolate new genes or uniquely describe
the NR will not contain the correspond- eukaryotic or prokaryotic genomes.
ing DNA sequences. The NotI microarrays These results led to the realization that
can simultaneously detect copy-number it would be possible to use short se-
changes and methylation and, therefore, quences surrounding NotI sites or, in
they allow the simultaneous study of ge- general, restriction-site-tagged sequences
netic and epigenetic factors. (RSTS) for the analysis of complex mi-
The technique underlying the prepa- crobial mixtures. The collection of NotI
ration and use of NotI microarrays is tags represents the NotI sequence passport
applicable to any restriction enzyme and or in short NotI passport. NotI passport-
represents a new type of microarray, re- ing means the process of creating NotI
ferred to as restriction-site-tagged (RST) tags/passports.
microarrays. Such RST microarrays can be The general design of the experiment
used for different purposes, for example, is as follows. Genomic DNA is digested
to study species composition of complex with NotI and ligated to a linker with
microbial systems. NotI sticky ends. This linker contains
the BpmI recognition sites. This restric-
tion nuclease cuts 16/14 bp outside of the
3.7
recognition site. The ligation mixture is
Restriction-site-tagged Sequences to Study
Biodiversity PCR-amplified with special primers and,
finally, 19 bp tags flanking NotI sites are
generated. DNA from, for instance, fe-
There is still much to learn about the
cal samples and surgical specimens is
human normal microflora. The human
digested with NotI, and a NotI passport
gut contains approximately 1 to 2 kg
for the particular specimen is gener-
of bacterial cells. The number of these
ated. A comparison of such passports
cells in the intestine is 10 to 100 times
from different individuals or from the
larger than the number of cells in the
same individual before and after drug
human body, but at best 10 to 15%
treatment will reveal the differences be-
of the microbial species are known. To tween them.
be able to analyze complex microbial Analysis of tags for NotI, PmeI, and
mixtures is of great importance for many SbfI for 70 completely sequenced bacteria
applications. For instance, differences revealed that more than 95% of tags are
between individual compositions of the species-specific and even different strains
normal flora will be instrumental for of the same species can be distinguished.
future analysis of the effects on the None of these tags matched human or
normal flora composition of diet, foods, rodent sequences. Therefore, the approach
geographical location, and medication. allows analysis of complex microbial
Conversely, the effects of gut microflora on mixtures such as those in the human gut
aging, autoimmunity, and colonic cancer and identification with high sensitivity of a
risk can be studied. particular bacterial strain on a quantitative
Analysis of human NotI flanking se- and qualitative basis.
quences (see Sect. 2.3) have demonstrated A similar approach can be used for
that even short sequences surrounding eukaryotic cells, for example, for analysis
NotI sites can yield information sufficient of cancer cells.
Genomic DNA Libraries, Construction and Applications 465
parallel signature sequencing (MPSS) on scanning method and its various applications,
microbead arrays, Na. Biotechnol. 18, 630–634. Electrophoresis 14, 251–258.
Brookes, A.J., Porteous, D.J. (1991) Coincident Kunkel, L.M., Monaco, A.P., Middlesworth, W.,
sequence cloning, Nucleic Acids Res. 19, Ochs, H.D., Latt, S.A. (1985) Specific cloning
2609–2613. of DNA fragments absent from the DNA
Brown, P.O., Botstein, D. (1999) Exploring the of a male patient with an X chromosome
new world of the genome with DNA deletion, Proc. Natl. Acad. Sci. U.S.A. 82,
microarrays, Nat. Genet. 21(Suppl. 1), 33–37. 4778–4782.
Broder, S., Venter, J.C. (2000) Sequencing the Kutsenko, A., Gizatullin, R., Al-Amin, A.N., et al.
entire genomes of free-living organisms: the (2002) NotI flanking sequences: a tool for
foundation of pharmacology in the new gene discovery and verification of the human
millennium, Annu. Rev. Pharmacol. Toxicol. genome, Nucleic Acids Res. 30, 3163–3170.
40, 97–132. Lamar, E.E., Palmer, E. (1984) Y-encoded,
Carninci, P., Shibata, Y., Hayatsu, N., et al. species-specific DNA in mice: evidence
(2001) Balanced-size and long-size cloning of that the Y chromosome exists in two
full-length, cap-trapped cDNAs into vectors of polymorphic forms in inbred strains, Cell 37,
the novel lambda-FLC family allows enhanced 171–177.
gene discovery rate and functional analysis, Lander, E.S., Linton, L.M., Birren, B., et al.,
Genomics 77, 79–90. International Human Genome Sequencing
Cheung, V.G., Gregg, J.P., Gogolin-Ewens, K.J., Consortium (2001) Initial sequencing and
et al. (1998) Linkage-disequilibrium mapping analysis of the human genome, Nature 409,
without genotyping, Nat. Genet. 18, 225–230. 860–921.
Collins, F.S., Weissman, S.M. (1984) Directional Larsen, F., Gundersen, G., Prydz, H. (1992)
cloning of DNA fragments at a large distance Choice of enzymes for mapping based on
from an initial probe: a circularization method, CpG islands in the human genome, Genet.
Proc. Natl. Acad. Sci. U.S.A. 81, 6812–6816. Anal. Tech. Appl. 9, 80–85.
Collins, F.S., Drumm, M.L., Cole, J.L., et al. Li, J., Protopopov, A., Wang, F., et al. (2002) NotI
(1987) Construction of a general human subtraction and NotI-specific microarrays to
chromosome jumping library, with application detect copy number and methylation changes
to cystic fibrosis, Science 235, 1046–1049. in whole genomes, Proc. Natl. Acad. Sci. U.S.A.
Costello, J.F., Fruhwald, M.C., Smiraglia, D.J., 99, 10724–10729.
et al. (2000) Aberrant CpG-island methylation Li, J., Wang, F., Kashuba, V., et al. (2001)
has non-random and tumour-type-specific Cloning of deleted sequences (CODE): a
patterns, Nat. Genet. 24, 132–138. genomic subtraction method for enriching and
Cross, S.H., Charlton, J.A., Nan, X., Bird, A.P. cloning deleted sequences, Biotechniques 31,
(1994) Purification of CpG islands using a 788–793.
methylated DNA binding column, Nat. Genet. Li, J., Wang, F., Zabarovska, V.I., et al. (2000)
6, 236–244. COP – a new procedure for cloning single
Eads, C.A., Danenberg, K.D., Kawakami, K., et al. nucleotide polymorphisms, Nucleic Acids Res.
(2000) MethyLight: a high-throughput assay to 28, e1,1–5.
measure DNA methylation, Nucleic Acids Res. Lindblad-Toh, K., Tanenbaum, D.M., Daly, M.J.,
28, e32, 1–8. et al. (2000) Loss-of-heterozygosity analysis
Galm, O., Rountree, M.R., Bachman, K.E., et al. of small-cell lung carcinomas using
(2002) Enzymatic regional methylation assay: single-nucleotide polymorphism arrays, Nat.
a novel method to quantify regional Biotechnol. 18, 1001–1005.
CpG methylation density, Genome Res. 12, Lisitsyn, N., Lisitsyn, N., Wigler, M. (1993)
153–157. Cloning the differences between two complex
Gonzalgo, M.L., Liang, G., Spruck, C.H., et al. genomes, Science 259, 946–951.
(1997) Identification and characterization of Lucito, R., West, J., Reiner, A. (2000) Detecting
differentially methylated regions of genomic gene copy number fluctuations in tumor
DNA by methylation-sensitive arbitrarily cells by microarray analysis of genomic
primed PCR, Cancer Res. 57, 594–599. representations, Genome Res. 10, 1726–1736.
Hayashizaki, Y., Hirotsune, S., Okazaki, Y. Mirzayans, F., Mears, A.J., Guo, S.W., Pearce,
(1993) Restriction landmark genomic W.G., Walter, M.A. (1998) Identification of
Genomic DNA Libraries, Construction and Applications 467
the human chromosomal region containing Smith, C.L., Lawrance, S.K., Gillespie,
the iridogoniodysgenesis anomaly locus by G.A., et al. (1987) Strategies for map-
genomic-mismatch scanning, Am. J. Hum. ping and cloning macroregions of mam-
Genet. 61, 111–119. malian genomes, Methods Enzymol. 151,
Myers, E.W., Sutton, G.G., Delcher, A.L., et al. 461–489.
(2000) A whole-genome assembly of Droso- Snijders, A.M., Nowak, N., Segraves, R., et al.
phila, Science 287, 2196–2204. (2001) Assembly of microarrays for genome-
Nelson, S.F. (1995) Genomic mismatch wide measurement of DNA copy number, Nat.
scanning: current progress and potential Genet. 29, 263, 264.
applications, Electrophoresis 16, 279–285. Sugimura, T., Ushijima, T. (2000) Genetic and
Nelson, S.F., McCusker, J.H., Sander, M.A. epigenetic alterations in carcinogenesis,
(1993) Genomic mismatch scanning: a new Mutat. Res. 462, 235–246.
approach to genetic linkage mapping, Nat. Ushijima, T., Morimura, K., Hosoya, Y., et al.
Genet. 4, 11–18. (1997) Establishment of methylation-sensitive-
Nussbaum, R.L., Lesko, J.G., Lewis, R.A., Led- representational difference analysis and
better, S.A., Ledbetter, D.H. (1987) Isolation isolation of hypo- and hypermethylated
of anonymous DNA sequences from within genomic fragments in mouse liver tumors,
a submicroscopic X chromosomal deletion Proc. Natl. Acad. Sci. U.S.A. 94, 2284–2289.
in a patient with choroideremia, deafness, Velculescu, V.E., Zhang, L., Vogelstein, B.,
and mental retardation, Proc. Natl. Acad. Sci. Kinzler, K.W. (1995) Serial analysis of gene
U.S.A. 84, 6521–6525. expression, Science 270, 484–487.
Palmisano, W.A., Divine, K.K., Saccomanno, G., Venter, J.C., Adams, M.D., Myers, E.W., et al.
et al. (2000) Predicting lung cancer by (2001) The Sequence of the Human Genome,
Science 291, 1304–1351.
detecting aberrant promoter methylation in
Waterston, R.H., Lindblad-Toh, K., Birney, E.,
sputum, Cancer Res. 60, 5954–5958.
et al. (2002) Initial sequencing and
Pinkel, D., Segraves, R., Sudar, D., et al. (1998)
comparative analysis of the mouse genome,
High resolution analysis of DNA copy
Nature 420, 520–562.
number variation using comparative genomic
Worm, J., Aggerholm, A., Guldberg, P. (2001)
hybridization to microarrays, Nat. Genet. 20,
In-tube DNA methylation profiling by
207–211.
fluorescence melting curve analysis, Clin.
Poustka, A., Pohl, T.M., Barlow, D.P., Frischauf, Chem. 47, 1183–1189.
A.M., Lehrach, H. (1987) Construction and Yan, P.S., Chen, C.M., Shi, H., et al. (2001)
use of human chromosome jumping libraries Dissecting complex epigenetic alterations in
from NotI-digested DNA, Nature 325, breast cancer using CpG island microarrays,
353–355. Cancer Res. 61, 8375–8380.
Protopopov, A., Kashuba, V., Zabarovska, V.I., Zabarovska, V.I., Gizatullin, R.G., Al-Amin,
et al. (2003) An integrated physical and gene A.N., et al. (2002) Slalom libraries: a new
map of the 3.5-Mb chromosome 3p21.3 approach to genome mapping and sequencing,
(AP20) region implicated in major human Nucleic Acids Res. 30, e6, 1–8.
epithelial malignancies, Cancer Res. 63, Zabarovska, V., Kutsenko, A., Petrenko, L., et al.
404–412. (2003) NotI passporting to identify species
Ronaghi, M., Pettersson, B., Uhlen, M., Ny- composition of complex microbial systems,
ren, P. (1998) A sequencing method based Nucleic Acids Res. 31, e5, 1–10.
on real-time pyrophosphate, Science 281, Zabarovska, V., Li, J., Fedorova, L., et al. (2000)
363–365. CIS – cloning of identical sequences between
Rosenberg, M., Przybylska, M., Straus, D. (1994) two complex genomes, Chromosome Res. 8,
‘RFLP subtraction’: a method for making 77–84.
libraries of polymorphic markers, Proc. Natl. Zabarovsky, E.R., Allikmets, R.L. (1986) An
Acad. Sci. U.S.A. 91, 6113–6117. improved technique for the efficient
Shi, H., Maier, S., Nimmrich, I., et al. (2003) construction of gene library by partial filling-in
Oligonucleotide-based microarray for DNA of cohesive ends, Gene 42, 119–123.
methylation analysis: principles and applica- Zabarovsky, E.R., Boldog, F., Thompson, T., et al.
tions, J. Cell Biochem. 88, 138–143. (1990) Construction of a human chromosome
468 Genomic DNA Libraries, Construction and Applications
3 specific NotI linking library using a novel Zabarovsky, E.R., Winberg, G., Klein, G. (1993)
cloning procedure, Nucleic Acids Res. 18, The SK-diphasmids – vectors for genomic,
6319–6324. jumping and cDNA libraries, Gene 127,
Zabarovsky, E.R., Boldog, F., Erlandsson, R., 1–14.
et al. (1991) A new strategy for mapping the Zardo, G., Tiirikainen, M.I., Hong, C.,
human genome based on a novel procedure et al. (2002) Integrated genomic and
for constructing jumping libraries, Genomics epigenomic analyses pinpoint biallelic gene
11, 1030–1039. inactivation in tumors, Nat. Genet. 32,
Zabarovsky, E.R., Kashuba, V.I., Zakharyev, 453–458.
V.M., et al. (1994) Shot-gun sequencing Zoubak, S., Clay, O., Bernardi, G. (1996) The
strategy for long range genome mapping: first gene distribution of the human genome, Gene
results, Genomics 21, 495–500. 174, 95–102.