You are on page 1of 28

441

Genomic DNA Libraries,


Construction and Applications

Eugene R. Zabarovsky
Microbiology and Tumor Biology Center, Karolinska Institute, Stockholm, Sweden

1 Principles 443

2 Techniques 445
2.1 General Characteristics of λ-based Vectors Used for Construction of
Genomic Libraries 445
2.2 Construction of General Genomic Libraries 447
2.3 Construction of Jumping and Linking Libraries. Use of Linking
and Jumping Clones to Construct a Physical Chromosome Map 451

3 Applications and Perspectives 455


3.1 Cloning DNA Markers Specific for a Particular Chromosome 455
3.2 CpG Islands as Powerful Markers for Genome Mapping;
CpG Islands and Functional Genes 457
3.3 Alu-PCR and Subtractive Procedures to Clone CpG Islands
from Defined Regions of Chromosomes 458
3.4 IBD (Identical-by-descent) Fragments for Identification
of Disease Genes 460
3.5 Strategies to Map and Sequence Genomes; Hierarchical, Whole-genome,
and Slalom Sequencing Approaches 460
3.6 Restriction-site-tagged Microarrays to Study CpG-Island Methylation 462
3.7 Restriction-site-tagged Sequences to Study Biodiversity 464

4 Summary 465

Bibliography 465
Books and Reviews 465
Primary Literature 465

Encyclopedia of Molecular Cell Biology and Molecular Medicine, 2nd Edition. Edited by Robert A. Meyers.
Copyright  2004 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
ISBN: 3-527-30547-5
442 Genomic DNA Libraries, Construction and Applications

Keywords
Blue–white Selection
Not really selection but color identification. Vectors carrying the β-galactosidase (lacZ)
gene (or part of it) produce blue plaques in the presence of 5-bromo-4-chloro-3-indolyl-
β-D-galactopyranoside (X-gal). If this gene is located in a stuffer fragment, then all
recombinants will form white plaques and parental vectors will produce blue plaques
in the presence of X-gal.

Genetic Selection
Usually, in cloning, selection against parental, nonrecombinant molecules in favor of
the recombinant. For λ-based vectors used for construction of genomic libraries, the
two most commonly used types of selection are Spi and supF. Spi+ phages carrying red
and gam genes cannot grow in E. coli lysogens carrying prophage P2; since, however,
the majority of the vectors contain these genes in a stuffer fragment, only recombinant
phages can grow in such E. coli strains. Selection for supF exploits λ vectors carrying
amber mutations. These vectors cannot replicate without the supF gene, which must be
present either in the host or in the cloned insert. If the insert carries the supF gene,
only recombinant phages will be able to replicate in an E. coli host without the
suppressor gene.

Polylinker
A short DNA fragment (in the vector) that contains recognition sites for many
restriction enzymes, which can be used for cloning DNA fragments into this vector.

Restriction Enzyme
An enzyme that recognizes a specific sequence in DNA and can cut at or near this
sequence. In cloning procedures, the most commonly used enzymes produce specific
protruding (sticky) ends at the ends of the DNA molecule. Each enzyme produces
unique sticky ends. The DNA molecules possessing the same sticky ends can be
efficiently joined with the aid of DNA ligase (see ligation).

(STS) Sequence-tagged Site


A short (200–500 bp) sequenced fragment of genomic DNA that can be specifically
amplified using PCR. STS represents or is linked to some kind of marker (i.e. it is
mapped to a specific locus on a chromosome).

 By virtue of the powerful technology developed in molecular biology, it is pos-


sible to isolate any DNA fragment in the genome of an organism and, after
reverse transcription, any transcribed gene in the form of a complementary DNA.
The isolation (cloning) procedure involves the insertion of the DNA fragment
into a vector, capable of replication in a microorganism, which allows production
of large quantities of the DNA fragment for physical or biological analysis. Upon
determination of the location in the genome from which the particular DNA
Genomic DNA Libraries, Construction and Applications 443

fragment was derived, that fragment acquires the property of a DNA marker. Such
DNA markers are a prerequisite for physical and genetic mapping of the genome
of the organism. DNA markers are also of importance for the diagnosis of genetic
diseases. DNA markers can be divided into several different classes depending on
the way in which the markers were selected among the fragments of genomic DNA.
Examples of such classes are anonymous, micro- and minisatellites, restriction
fragment length polymorphism (RFLP) markers, and NotI linking clones.
Vectors and clone libraries of different types can be used to clone markers.
Lambda-based vectors and genomic libraries of different kinds are commonly used
for this purpose. Many different variants of λ-based vectors that combine features
of different cloning vehicles (plasmids, M13 and P1 phages) have been created
for this purpose. The use of each vector is usually limited to a specific task:
the construction of general genomic libraries (which contain all genomic DNA
fragments) or special genomic libraries (which contain only a particular subset of
genomic DNA fragments). Among these special libraries, NotI linking and jumping
libraries have particular value for physical/genetic mapping and sequencing of the
human genome. Shotgun and slalom libraries are usually used for sequencing
purpose and comparative genomics.

1 concentrates on the widely used λ-based


Principles vectors and the construction of genomic
libraries, which played an important role
In molecular biology, ‘‘cloning’’ is the in the Human Genome Project.
insertion of DNA with interesting infor- A genomic library is a collection of
mation into a specific vector that allows recombinant vectors; it contains DNA
replication and transfer of the cloned DNA fragments representing the genome of
from one host to another. The vector a particular organism. Genomic libraries
containing the inserted DNA is called a can be either general, containing DNA
recombinant vector to distinguish it from fragments covering the whole genome,
its parental vector, which does not contain or special, containing only specific ge-
any foreign DNA. Usually, ‘‘interesting in- nomic fragments that differ in certain
formation’’ is a piece of DNA obtained parameters. Some are CG rich whereas
from any target organism; it can be a gene others contain only particular size frag-
(or part of a gene) or simply anonymous ments of DNA obtained after digestion
DNA sequences for which no function is with a particular restriction enzyme, con-
yet known. It can originate directly from tain specific repeats, and so on. Important
DNA or can be obtained from reverse tran- special genomic libraries are the jumping
scription of RNA molecules. The main and linking types (see Sect. 2.3).
idea of cloning is to obtain the interesting Cloned DNA fragments can be located
piece of DNA in a quantity large enough to a specific site of a chromosome, af-
for analysis and further experiments. Now, ter which they can serve as markers for
the vectors and strategies used for cloning physical and genetic mapping. Different
come in many different types. This chapter types of markers are used. The so-called
444 Genomic DNA Libraries, Construction and Applications

anonymous markers represent randomly point mutation in some individuals of a


cloned DNA fragments whose functions population. The advantages of SNPs are
or specific features are not known. Other their abundant numbers (>106 ) and the
DNA markers can possess specific fea- fact that they can be detected by nonelec-
tures. They can contain a known gene or trophoretic methods, for example, using
expressed sequences with unknown func- microarrays. However, usually SNP has
tion, CpG islands (also associated with only two alleles. A subtype of SNPs are
genes, see Sect. 3.2), or recognition sites RFLP markers that recognize genomic
for rare cutting restriction enzymes con- fragments containing polymorphic recog-
venient for long-range mapping. Such nition sites for a particular restriction
markers can be polymorphic, that is, they endonuclease (e.g. TaqI, MspI). The same
have different structures in different indi- chromosomal regions in different individ-
viduals (they are usually distinguished on uals contain or lack this recognition site.
the basis of different mobility in gel elec- A second form of DNA polymorphism
trophoresis). Such markers are extremely results from variation in the number
important in mapping and cloning human of tandemly repeated (VNTR) DNA se-
disease genes and for construction of ge- quences in a particular locus. Usually,
netic maps. Three types of polymorphic they are divided in two types – mini- and
markers are commonly used (Fig. 1). microsatellites. Minisatellites are DNA
Single-nucleotide polymorphism (SNP) fragments 0.1 to 20 kb long that contain
markers are DNA fragments that have a many copies (from 3 to more than 40) of

Bam HI Bam HI Bam HI


RFLP 2 kb 4 kb
markers
Allele 1

Bam HI Bam HI Bam HI

Allele 2

6 kb
VNTP
markers Repeats

Allele 1

Bam HI 0.6 kb Bam HI


Repeats
Allele 2

Bam HI 1.4 kb Bam HI


Repeats
Allele 3

Bam HI 1.6 kb Bam HI


Fig. 1 The difference between RFLP and VNTR polymorphism.
Genomic DNA Libraries, Construction and Applications 445

6 to 60 bp repeats. All these repeats share a smear after electrophoretic separation of


a 10 to 15 bp core sequence similar to the the PCR products.
generalized recombination signal (chi) of An optimal marker should have the fea-
E. coli. When DNA from different individ- tures of all the different types of markers
uals is digested with a restriction enzyme just discussed. The ideal marker should
that does not cut inside these repeats, the contain (1) a gene, (2) a CpG island that
length of the fragments produced will de- has been shown to be very conserved in
pend on the number of repeats at the the genome and can be used for compar-
locus. Since the minisatellites and re- ative gene mapping in different species,
peats constitute a relatively large fragment, (3) a rare cutting restriction site useful for
it is possible to discriminate between physical mapping, and (4) polymorphic se-
different alleles using ordinary nondena- quences (e.g. microsatellites). One of the
turing gel electrophoresis and Southern best candidates for having all these fea-
blot analysis. Many different alleles (usu- tures together is NotI linking clones, that
ally more than five) can be distinguished is, recombinant clones containing the NotI
at a locus containing minisatellites. Min- restriction site.
isatellites cluster around the distal ends
of human chromosomes and, sometimes,
2
are located near the genes.
Techniques
Microsatellites are relatively short DNA
fragments (usually <100 bp) with repeat 2.1
units from 1 to 5 bp (such as A, AC, General Characteristics of λ-based Vectors
AG, AAC, AAAG, etc.). They are numer- Used for Construction of Genomic Libraries
ous (5 × 105 ) and uniformly distributed
throughout the human genome with an Among other more modern vectors used
estimated average spacing of about 6 kb. for construction of genomic libraries
Microsatellites can be very polymorphic (yeast, bacterial and P1 artificial chromo-
(more than 10 alleles at the same locus), somes; YAC, BAC, and PAC respectively),
and polymorphism usually increases with phage λ-based vectors are still very pop-
the number of repeats. Microsatellites with ular. The reason is that the genetics and
fewer than 10 copies are usually not poly- features of both λ phage and E. coli (the
morphic. Since microsatellites are short, host for λ phage) are well known. The
they can be analyzed quickly by using size of the phage DNA that can be pack-
the polymerase chain reaction (PCR) with aged into viable phage particles is limited
primers flanking each locus. Different al- between 37.7 and 52.9 kb. This means
leles can be resolved using denaturing that it is possible to biologically regulate
gel electrophoresis. Very frequently, mi- the size range of the cloned DNA frag-
crosatellites are associated with Alu repeats ment. There exist extremely efficient in
(see Sect. 3.1) and this creates problems for vitro systems for packaging such DNA
their use with PCR, since Alu repeats are into λ phage particles (109 plaque-forming
conserved in the human genome. Thus, units per µg of DNA) to produce vi-
flanking primers for the microsatellite able phages. To combine the features of
located in the Alu repeat are unlikely to be different vector systems, extensive modifi-
useful because they prime from many Alu cations of λ phage vectors were developed
repeats and, instead of discrete bands, give (Fig. 2).
446
T7 SP6

XbaI
Eco RI
Avr II
BamHI

Sfi l
XhoI
BamHI
EcoRI
XbaI
XhoI
SacI
Sfi I

Avr ll
XbaI
SacI
XhoI
Bam HI
Hind III
Sfi I
Nae I
Not I
Xho I
Eco RI
Avr ll
XbaI
XbaI
Avr II
EcoRI
XhoI
Not I
NaeI
Sfi I
Hind III
Bam HI
XhoI
Sac I
XbaI

Sac I
λ GEM-11 14.0 9.2 λSK6 9.2
13.0
20.3 Sal I 20.3 lacZO SalI lacZO
T3 T7

Sac I
KpnI
Sm aI
NcoI
Nhe I
Avr II
Sfi I
NaeI
Not I
XhoI
EcoRI
BamHI
Sal I
Sal I
BamHI
EcoRI
XhoI
Not I
NaeI
Sfi I
Avr II
NheI
NcoI
XbaI

XbaI
XhoI
Sac I
Hind III
BamHI
Eco RI
Not I
Sal I
XbaI

XbaI
Sal I
Not I
EcoRI
BamHI
Hind III
Sac I
XhoI
XbaI
Mst II
ClaI BspMlI
λ DASHII 13.0 9.2 λSK17 13.0 XhoI

Sal I lacZ 12.7


20.3 Sal I 20.3 Sal I ClaI SalI Bs pMlI
T3 ApR
T7
pBR322 M13
Genomic DNA Libraries, Construction and Applications

ori 3.9 ori

XbaI
Eco RI
Xho I
Sal I
Sac I
Not I
Sac I
XbaI

XbaI
Sac I
Not I
Sac II
Sal I
Xho I
Eco RI
XbaI
XbaI
BamHI
Xho I
Eco RI
Avr II
Nae I
Nae I
Avr II
Eco RI
Xho I
BamHI
SmaI
KpnI
SacI

λ FIXII 9.2 BspMlI


13.0 ClaI BspMlI ClaI
λ SK40 13.0
20.3 Sal I
20.3 Mst II 9.2
Sal I ApR

Sal I
Bam HI
Eco RI
Eco RI
Bam HI
Sal I
pBR322 M13
ori 3.9 ori
Sal I
I-SceI
SpeI
Not I
XmaIII
XmaIII
Not I
SpeI
I-Scel
SalI

λ EMBL3 13.0 9.2

20.3 Sal I
Fig. 2 Examples of λ-based vectors: standard λ vectors (λGEM11, λFIXII, λDASH, λEMBL3, λSK6) and diphasmid vectors (λSK17 and λSK40). Sizes
are in kilobases. Not all restriction sites are shown. Heavy lines represent vector arms, thin lines denote the stuffer fragment; open boxes mark plasmid
and M13 sequences; lacZO is the lac operator sequence. T3, T7, SP6 – promoters for T3, T7, and SP6 RNA polymerases.
Genomic DNA Libraries, Construction and Applications 447

Cosmids are essentially plasmids that of phasmid and diphasmid vectors. Often
contain the cos region of phage λ a genomic library is constructed in a λ
responsible for packaging of DNA into phage and then converted in its entirety to
the phage particle. The advantages of cos- plasmid form.
mids are easy handling (as with plasmids)
2.2
and large cloning capacity. Since the plas-
Construction of General Genomic Libraries
mid body is usually small (3–6 kb), large
DNA molecules (46–49 kb) can be cloned Representativity is one of the most im-
in these vectors. portant features of a genomic library. In a
Phasmids are λ phages that have an ‘‘representative’’ genomic library, every ge-
inserted plasmid. They have the same ba- nomic DNA fragment will be present in at
sic features as λ phage vectors, but the least one of the recombinant phages of the
inserted foreign DNA fragment can be library. In practice, however, this is diffi-
separated from the body of the phage DNA cult to achieve. Some genomic fragments
and converted into the plasmid form. After are not clonable because of the strategy
the conversion, the cloned DNA fragment used for construction of the genomic li-
will exist as a recombinant plasmid. brary. For example, if the maximal cloning
Hyphages represent another type of λ- capacity of the vector is 18 kb and EcoRI
based vectors. They were constructed from digestion is used to construct the library,
M13 vectors with a built-in cos site of λ. no genomic EcoRI fragments bigger than
Since these vectors have the main features 18 kb will be present. In some cases, ge-
of M13 vectors, they can be obtained in nomic DNA fragments can suppress the
single-stranded form. Their distinctive fea- growth of the vector or the host cell, with
ture is the capability to be packaged with the result that its cloning can be restricted
high efficiency into λ-phage-like particles. to specific vector systems.
This decreases the chance of recovering The important reason for decreased
nonrecombinant vectors and opens the representativity is the different replication
possibility of constructing a representative potential of different recombinant phages.
genomic library in single-stranded form. Most researchers work with amplified
Diphasmids are vectors, which offer the libraries. The ligated DNA molecules
opportunity to combine the advantages of are packaged into the λ phage particles
phages (λ and M13) and plasmids. Diphas- and plated on a lawn of E. coli cells
mids can be divided into two classes: those (usually many petri dishes are used for
that can replicate as phage λ (an improve- such plating). Then all λ phage particles
ment over phasmids) and those that are are eluted from the plaques, and the
incapable of replicating as phage λ (i.e. liquid eluted from all the petri dishes is
a cosmid capable of being packaged into mixed. Glycerol or dimethyl sulfoxide is
phage M13 particles). added to the eluate, and the aliquots are
In some cases, it is more convenient to kept at – 76 ◦ C. This procedure is called
work with a genomic library in plasmid amplification of the library.
than in λ phage form. The construction of With amplification, a library can be kept
a representative genomic library directly for many years and can be used for many
in a plasmid vector has several drawbacks experiments by many researchers. On
and difficulties. However, all these prob- the other hand, each recombinant phage
lems can be easily solved with the help present in the library gives a single plaque
448 Genomic DNA Libraries, Construction and Applications

at the first plating, and since different representativity of the library. To calculate
recombinant phages differ in their growth the percentage of recombinants, one can
potential, there may be differences of 100 use genetic (Spi) selection as in the case of
times in the abundance of clones after λEMBL vectors. Another approach relies
amplification. This means that in the on blue–white color identification (e.g. λ-
amplified library, some of the phages are Charon series). A third class of vectors
present 100 times more often than others. has both genetic selection and blue–white
In this case, to recover all recombinant color identification (λSK4, λSK6). There
phages obtained after packaging into λ are three commonly used ways to construct
phage particles, one needs to plate 100 genomic libraries (Table 1, Fig. 3).
times more phages than that obtained after The original (‘‘classical’’) method in-
the original plating. Since such quantities cludes generation of sheared genomic
are difficult to achieve in practice, some DNA fragments using physical or en-
recombinant phages are virtually lost from zymatic manipulations followed by the
the library after amplification. physical separation of fragments of a
How is the representativity of a library particular size using, for example, ultra-
estimated? A library is considered to be centrifugation or gel electrophoresis. The
representative if after the first plating (be- vector DNA is digested with two (or even
fore amplification) it contains a number three) restriction enzymes whose recogni-
of recombinant clones together, contain- tion sites are located in the polylinker and
ing genomic DNA fragments equal to 7 are separated by a few base pairs. The arms
to 10 genome equivalents. For example, and the stuffer piece are purified from the
human genomic DNA contains approx- small oligonucleotide molecules released
imately 3 × 109 bp. If the vector contains after the digestion by, for example, pre-
on an average 15 kb inserts, the representa- cipitation with polyethylene glycol (PEG)
tive library should contain 1.4 to 2.0 × 106 6000. The stuffer piece and both arms will
recombinant clones. now have different sticky ends, preventing
The way in which the genomic DNA re-creation of the original vector molecules
fragments are produced for cloning is during subsequent ligation with genomic
also important. The more randomly the DNA fragments.
genomic DNA is broken, the more repre- In the second ‘‘dephosphorylation’’ ap-
sentative a library can be obtained. Clearly, proach, the phage arms are prepared by si-
the EcoRI enzyme (6 bp recognition site) multaneous digestion with two restriction
will cut genomic DNA less randomly than enzymes as shown earlier. Genomic DNA
Sau3AI (4 bp recognition). Probably, the is partially digested to the extent that DNA
shearing of DNA molecules using phys- fragments with sizes in the range of 15
ical methods (e.g. syringe, sonication) is to 20 kb will represent the majority of the
the most reliable way to obtain randomly products. These DNA fragments are de-
broken DNA molecules. phosphorylated to prevent their ligation to
An important characteristic feature of a each other and are then ligated to the vector
library is the percentage of recombinants. arms. If too big or too small genomic DNA
For most purposes, if a library contains fragments are ligated to the phage arms,
more than 80% of recombinant phages, size limitations will make it impossible for
it is better to omit the genetic selection these recombinant molecules to yield vi-
procedure because it usually decreases the able phages. Compared to the preceding
Tab. 1 Comparison of three basic methods used in construction of representative genomic libraries.

Method DNA [µg] DNA Self-ligation of Effectiveness of packaging Number of Genetic selection
needed to fractionation packaging necessary to remove
construct a vector genomic Per Per reactions to get nonrecombinants
representative DNA DNA microgram of microgram of representative
genomic vector DNA genomic DNA library at
library maximal
efficiency
per microgram
of genomic DNA

Classical 100–1000 Yes Yes Yes 105 –107 105 –107 1 Yes
Dephosphorylation 5–10 No Yes No 104 –105 105 –106 3–5 Yes
Partial filling in 5–10 No No No 105 –107 105 –107 1 No
Genomic DNA Libraries, Construction and Applications
449
450 Genomic DNA Libraries, Construction and Applications

(a) λEMBL 3 Eukaryotic genomic DNA


SB R RBS
cos L R cos
Left arm Right arm Partial digestion with
Digestion with Sau 3AI, isolation of
Bam HI + Eco RI DNA fragments
cos L B B R cos 15–20 kb in length
R R
Stuffer
Ligation with
T4 DNA ligase
R cos L R cos L R cos L

Packaging in vitro
into λ phage
particles

λ SK5 Eukaryotic genomic DNA


(b) BXSR RSXB
cos L R cos
Left arm Right arm
Digestion with Partial digestion
Bam HI + EcoRI with Sau 3AI
cos L S S R cos
R R
Stuffer
piece Partial filling-in of
dCTP termini with Klenow dATP
dTTP fragment of E. coli dGTP
DNA polymerase
cos L GTC-3´
CAGCT-5´ Ligation with 5´-GATC GA-3´
3´-AG CTAG-5´
T4 DNA ligase
5´-TCGAC R cos
3´-CTG
R cos L R cos L R cos L

Packaging in vitro
into λ phage
particles

Fig. 3 Two approaches for constructing genomic libraries:


(a) classical method and (b) partial filling-in method. Lcos and
Rcos – left and right parts of the cos site correspondingly; B, BamHI;
R, EcoRI; S, SalI; X, XmaIII.
Genomic DNA Libraries, Construction and Applications 451

ori
M13
ori Ampr R
lacI S
SK18 B
A
lacZ H
R L

A. Digest with Hind III cos B. Digest with Eco RI


SBAH
RSBA
Ampr Ampr
R L R L
5´-AGCTT A-3´ lacI G-3´
5´-AATTC
3´-A cos ori TTCGA-5´ 3´-G cos GTTAA-5´
lacZ ori lacI ori ori
Partial filling-in lacZ
M13 M13
dATP

Digest with Bam HI AH


RS
Ampr Ampr
R L R L lacI
5´-AGCTT 5´-GATCC GAA-3´
G-3´
3´-AA cos 3´-G cos GTTAA-5´
lacZ ori lacI CCTAG-5´ ori ori
ori lacZ
M13 M13

Ligation with 30–45 kb Sau 3AI genomic DNA fragment


RS AH
Ampr Ampr
R L R L
5´-AGCTT lacI GAA-3´
3´-AA cos cos GTTAA-5´
lacZ ori ori lacI lacZ ori ori
M13 M13
in vitro packaging

Fig. 4 One approach to the construction of genomic libraries in cosmid vectors (not all
restriction sites are shown): A, AccI; B, BamHI; H, HindIII; R, EcoRI; S, SmaI.

method, this procedure is quick, and rep- fragments make it even more important to
resentative libraries can be obtained from prevent self-ligation of vector fragments.
a small quantity of genomic DNA. Many similar approaches have been sug-
The third ‘‘partial filling-in’’ method, gested and one of them is shown in (Fig. 4)
also avoids fractionation steps (Fig. 3). where the formation of vector-concatemers
Phage arms are prepared as described be- is prevented by partial filling in.
fore (in this particular case SalI and EcoRI A similar effect can be achieved by
are shown, but many other combinations dephosphorylation or by digestion at the
can be used), and the sticky ends produced first step with AccI and SmaI instead of
are partially filled in with the Klenow frag- EcoRI and HindIII. SmaI produces blunt
ment of DNA polymerase I (or other DNA ends and AccI gives sticky ends with only
polymerase) in the presence of dTTP and two protruding base pairs. The ligation of
dCTP. Genomic DNA partially digested these ends will be far less effective than
with Sau3AI is also partially filled in, but that for BamHI and Sau3AI sticky ends
in the presence of dATP and dGTP. Un- (four protruding base pairs).
der such conditions, self-ligation of vector
2.3
arms or genomic DNA is impossible.
Construction of Jumping and Linking
Genomic libraries can be constructed in Libraries. Use of Linking and Jumping
cosmids using the same approaches just Clones to Construct a Physical
described. The absence of selection against Chromosome Map
nonrecombinant vector and the possi-
bility of packaging into phage particles For long-range mapping and cloning of
concatemers composed solely of cosmid large stretches of genomic DNA, the
452 Genomic DNA Libraries, Construction and Applications

two most widely used methods are con- There are two main approaches for
struction of overlapping DNA sequences the construction of NotI jumping and
(contigs) using chromosome walking (e.g. linking libraries. According to the first
BAC cloning) and chromosome jumping. method (Fig. 5a), jumping libraries are
The technique of BAC cloning is now used constructed as follows: DNA of high
by many laboratories. Still, this approach molecular mass, isolated in low-melting
is not devoid of problems and drawbacks. agarose, is completely digested with NotI.
These problems could be diminished, The DNA is ligated at very low concentra-
however, by using jumping/linking li- tion, in the presence of a dephosphorylated
braries. Moreover, jumping and linking plasmid containing a marker (supF gene),
libraries can be used independently for and is then circularized, trapping the supF
construction of a long-range restriction gene, which acts like a marker to select
map using pulsed field gel electrophore- clones that contain the ends of a long frag-
sis (PFGE). Jumping clones contain DNA ment. Another enzyme, one that has no
sequences adjacent to neighboring NotI recognition site in the plasmid, is used
sites, and linking clones contain DNA to digest the large circular molecules into
sequences surrounding the same restric- small fragments, each of which is cloned
tion site. in a vector phage carrying amber muta-
The two best-known kinds of jumping tions. Recombinant phages containing the
libraries are the NotI jumping library and plasmid with the two terminal fragments
the ‘‘general’’ jumping (hopping) library. are selected in an E. coli strain lacking the
The basic principle of both methods is suppressor gene.
to clone only the ends of large DNA The linking library can be constructed
fragments rather than continuous DNA in different ways. In the original protocol,
segments, as in BAC clones. Internal the genomic DNA is partially digested
DNA is deleted by controlled biochemical with Sau3A and size selected to obtain
techniques. The main difference is that in 10 to 20 kb fragments. The DNA is then
the first type of library, complete digestion diluted and circularized in the presence
with a rare cutting enzyme (NotI is the of supF marker plasmid. The circular
most popular) is used, and the second products are digested with NotI, ligated
is based on a partial digestion with a into a NotI-digested suppressor-dependent
frequently cutting enzyme, followed by vector (NotEMBL3A), and plated on a
isolation of DNA fragments of desired size. suppressor-negative host.
Using the first type of library, it is possible In another approach, DNA from a total
to jump over long distances (>1000 kb), genomic library in a circular form (e.g. cos-
but only from certain starting points mid) is digested with NotI, and a selectable
(i.e. those containing the recognition site marker (e.g. resistance to the antibiotic) is
for the rare cutting enzyme). Using the inserted into recombinants containing this
hopping library, it is possible to start site. Then these recombinants are selected
jumping from practically any point and by their resistance to the antibiotic.
to cover a defined but shorter distance The most important drawback is that all
(<150 kb). Only the first type of jumping these methods used to construct linking
libraries can be used in conjunction with libraries exploit strategies and vectors
linking libraries to create genomic maps, different from those used to construct
as described next. jumping libraries. Thus, some fragments
Genomic DNA Libraries, Construction and Applications 453

Genomic DNA

JUMPING LINKING
NotI complete Sau 3A partial digestion
digestion Selection of 20-kb size
DNA fragments
B B
B
B B B Circularization
in presence
of supF marker
B
B B
N
B B

Digest with Bam HI Digest with Not I

Ligate into
B B λ vector N N

(a) λ NM1151ABS λ NotEMBL3A

I. Genomic DNA II. Digest with Bam HI

Digest with Not I


Ligation
Circles Linear

Ligation 5′ GATC 3′
3′ CTAG 5′
Partial filling-in
dATP+ dGTP
5′ GATC GA 3′
Not I 3′AG CTAG 5′
(cannot ligate)

(b) Ligation with λSK4, λSK17, and λSK22 arms


Fig. 5 Two approaches for the construction of jumping and linking libraries: (a) using
supF marker and (b) using partial filling in. (a) Black boxes, supF marker; B, BamHI.
(b) Black and white bars denote NotI sites and vertical slashes represent BamHI sites;
(b I), (b II), construction of the jumping library; (b II), construction of the linking
library. In this case, digestion of the genomic DNA with BamHI is the first step.

present in one library (e.g. jumping) will An integrated approach for construc-
be absent in another (e.g. linking), which tion of jumping and linking libraries is
creates serious problems for the use of outlined in Fig. 5(b). The most important
these libraries in mapping. feature here is that the same vectors and
454 Genomic DNA Libraries, Construction and Applications

protocol are used for construction of both linking library. An obvious difference
libraries. between this approach and the preced-
For the linking library, genomic DNA ing ones is that the procedure combines
is completely digested with BamHI. Sub- a biochemical selection for NotI jump-
sequently, the DNA is self-ligated at a ing fragments with improved ligation
very low concentration (without a supF kinetics during the preparation of the li-
marker) to yield circular molecules as the braries (see Table 2 for a comparison of
main product. To eliminate any linear two methods).
molecules, the sticky ends are partly filled In theory, a linking library is sufficient
in with the Klenow fragment in the pres- to construct a physical chromosomal
ence of dATP and dGTP. Since the Klenow map. When linking clones are used to
fragment also has exonuclease activity, all probe a PFGE genomic blot (NotI-digested
the BamHI sticky ends are neutralized genomic DNA), each clone should reveal
and nearly all ends generated upon ran- two DNA fragments, which are adjacent
dom DNA breakage become unavailable in the genome. Thus, in principle, one
for ligation. Subsequently, the DNA is cut should be able to order the rare cutting
with NotI and cloned in λSK4, λSK17, and sites with just a single library and one
λSK22 vectors. digest, although it will generally not
The same strategy is applied in the con- be possible to distinguish between two
struction of a NotI jumping library. One fragments of the same size. To resolve such
initial step is added: genomic DNA is ambiguities, it is important to use several
fully digested with NotI and self-ligated different libraries, each for a particular
at a very low concentration. The sub- enzyme, and to overlap the resulting
sequent steps are the same as for the patterns just as in ordinary restriction

Tab. 2 Comparison of two main methods for construction of Not I jumping libraries.

Method I (with supF marker) Method II (with partial filling in)

Materials required for construction of jumping library


2 µg genomic DNA (10 mL ligation volume) 1 µg genomic DNA (5 mL ligation volume)
40 µg vector arms 1 µg vector arms
500 µL-sonicated extract (SE) for in vitro 15 µL SE
packaging
2000 µL freeze-thaw extract for in vitro 10 µL FTL
packaging (FTL)
Cloning capacity
0–12 kb 0.2–23 kb
Expected yield
1–5 × 104 /µg genomic DNA 1–5 × 105 /µg genomic DNA
Percentage of recombinants before genetic selection
<1% 45–60%
Maximum sizes of jumps
450 kb >1000 kb
Genomic DNA Libraries, Construction and Applications 455

Jumping clones

2 3 4 5 6 7

Not I Not I Not I Not I


1 2 3 4 5 6 7 8
Chromosomal
DNA

Linking clones
Fig. 6 Long-range mapping using jumping and linking libraries.

fragment analysis. To accomplish this for discriminate even between different in-
the whole human genome will be very stances of the same class of repeats.
laborious. However, for small stretches
of the genome containing 5 to 10 NotI
sites, this approach can be efficient. The 3
use of jumping and linking libraries in Applications and Perspectives
a complementary fashion simplifies this
approach (Fig. 6). 3.1
Moreover, by cross-screening the two Cloning DNA Markers Specific for a
libraries, it should be possible in prin- Particular Chromosome
ciple to jump from clone to clone
(–jumping–linking–jumping, etc.), to It is important, for many purposes, to clone
generate an ordered map without us- individual DNA markers for a specific
ing PFGE techniques at all. One of the chromosome. The most straightforward
main problems with this approach is approach for such purposes is to prepare
that the presence of very CG-rich re- special libraries that contain recombinant
gions and repeats in the human DNA clones from a particular chromosome.
may result in cross-hybridization between One approach is based on the use of the
different clones. fluorescence-activated cell sorter (FACS)
In the shotgun sequencing approach system. FACS operates on the principle
for long-range genome mapping, the hy- of rapid analysis of suspended particles,
bridization technique is replaced by se- single cells, or even chromosomes. A
quencing. Jumping and linking clones suspension of chromosomes stained with
are sequenced from the ends and, sub- fluorescent dyes (usually Hoechst 33258
sequently, the linear order of the NotI and chromomycin A3) passes through
clones on a chromosome can be estab- the focus of a laser beam that excites
lished using a computer program. Even a the DNA-bound fluorescent dyes. Equal-
20 bp sequence is likely to uniquely iden- sized droplets are formed by ultrasonic
tify a sequence in the human genome. dispersion, and the droplets containing the
The sequence data provide a means to desired chromosome as indicated by the
456 Genomic DNA Libraries, Construction and Applications

fluorescence measurement are deflected from the library by hybridization to total


from the main stream by an electric field human DNA.
and collected in a tube. DNA isolated At least 40% of the human genome
from sorted chromosomes can be used consists of repetitive sequences of dif-
for construction of genomic libraries. ferent kinds. Among them, about 50%
Another approach is based on the use of are repeats randomly distributed in the
hybrid cell lines. To obtain such somatic human genome – different kinds of short
cell hybrids, human cells are fused with ro- (SINE) and long (LINE) interspersed (re-
dent cells (e.g. mouse or hamster). When peating) elements. The most abundant
the resulting hybrid cells are grown in cul- among them is the Alu repeat family. Alu
ture, there is a progressive loss of human repeats are present at >1.0 × 106 copies
chromosomes until only one or a few of per genome with an estimated average
them are left. At this step, some of the seg- spacing of about 3 kb. Alu repeats have
regant hybrid cells can be quite stable. In a length of about 300 bp and consist of
a modification of this technique, a human 2 homologous units. Related repeats also
cell line is transfected with a plasmid vec- exist in other mammals. Different mem-
tor or infected with a retroviral vector that bers of this family from the same species
contains a selectable marker (e.g. gpt or usually have homology of 80 to 90% but
are only about 50% identical in different
neo). Transfected clones are screened for
species. These repeats have conserved and
those that contain only a single integrated
variable regions. It is possible to find con-
plasmid per cell (and thus containing a
served sequences that are species-specific.
single marked chromosome). These trans-
These conserved sequences can be used
formants are micronucleated by prolonged
as primers for PCR, to specifically am-
colcemid treatment and the microcells,
plify human sequences in the presence of
containing only one chromosome, are
nonhuman DNA. These features are the
produced using special techniques. The
basis for using Alu repeats for isolation of
microcells are fused to mouse cells, and human chromosome–specific sequences
the resulting human–mouse microcell from hybrid cell lines containing human
hybrids, containing the single marked and nonhuman DNA sequences. More-
chromosome, are isolated by growth in over, if a hybrid cell line contains only
selective medium. Hybrid cells containing a short piece of human chromosome,
only fragments of human chromosomes the Alu-PCR approach can be used for
can be produced using human chromo- isolation of markers specific for a de-
somes with translocations and deletions, fined region of the chromosome. The
or the fragments can be produced ex- principles of the approach are shown
perimentally by X-irradiation (radiation in Fig. 7.
hybrids). Such hybrid cell lines are very In the case of two hybrid cell lines,
useful not only for constructing genomic one containing a complete human chro-
libraries but also for physical mapping of mosome (HCL1) and the other carrying
already isolated DNA markers and genes. the same chromosome but with a deletion
Another advantage of somatic hybrid cells (HCL2), this method offers the possibil-
is that DNA is available in large amounts ity of obtaining markers specific for the
and can be used for different purposes. deletion. This variant can be called the dif-
Human-specific clones can be isolated ferential Alu-PCR approach for obtaining
Genomic DNA Libraries, Construction and Applications 457

No PCR amplification

20 kb 1 kb 12 kb 0.5 kb 0.8 kb 6 kb
Human
chromosomal ZAlu5
DNA Alu repeat Different Alu-specific
primers ZAlu3

Fig. 7 General scheme for Alu-PCR.

DNA markers. The approach is mainly cosmids are present in one YAC, which in
used in two variants. other YACs, and which are present in one
In one variant, Alu-PCR is done using but absent in another. Such an approach
DNA from both cell lines, and the products is also useful for mapping.
of the reactions are separated by agarose Another approach to obtaining region-
gel electrophoresis. Some bands present specific libraries is to use chromosome
in the products from HCL1 will be absent microdissection to physically remove the
among the products from HCL2. These chromosomal region of interest; the
bands can be excised and cloned, giving minute quantities of microdissected DNA
markers localized in the deletion. The can be subjected to a microcloning proce-
disadvantage of this approach is that dure. Spreads of human chromosomes are
usually such Alu-PCR results in a large made and stained using standard cytoge-
number of products that have a very netic techniques. DNA from an individual
complex pattern and look like a smear band is then cut from the chromosome
on the gel. Among the solutions to this using ultrafine glass needles or is isolated
problem that have been suggested is the with the help of laser equipment. In the
use of more specific primers (only for a latter case, all other chromosomes are de-
subset of the Alu repeats), or genomic stroyed by the laser, and intact DNA is
DNA digested with restriction enzyme. present only in the chromosome of in-
Another solution is to use hybrid cell lines terest. DNA obtained from only a few
that contain only small pieces of human (2–20) chromosomes is enough for con-
chromosomes. structing a region-specific library. This
The second variant of the Alu-PCR DNA is amplified using PCR and cloned
approach is mainly used in connection in plasmid or λ vectors.
with sources that contain only a limited
amount of human material: YAC clones 3.2
and radiation hybrid cell lines containing CpG Islands as Powerful Markers for
small pieces of the human chromosomes. Genome Mapping; CpG Islands and
The YACs can, for example, be used for Functional Genes
Alu-PCR and the total products of the
PCR reaction can be used as a probe to Although human DNA is highly methy-
screen genomic libraries (e.g. in cosmids). lated, stably unmethylated sequences
The hybridization pattern reveals which (about 1% of the genome) have been
458 Genomic DNA Libraries, Construction and Applications

observed in human chromosomal DNA. is a natural marker on the chromosome,


Such sequences occur as discrete ‘‘is- convenient for linkage with other markers.
lands,’’ usually 1 to 2 kb long, that are Furthermore, linking clones are located in
dispersed in the genome. They are usu- CpG-rich islands that are associated with
ally called CpG (rich) islands because they genes. According to this scheme, linking
contain more than 50% of CG (human libraries are constructed from different hy-
genome contains, on average, about 40% brid cell lines containing either whole or
of CG contents). Their distinctive feature deleted human chromosomes. Then, total
is the presence of CpG pairs at a pre- DNA isolated from these libraries can be
dicted frequency, whereas elsewhere in used for Alu-PCR in the manner described
the genome, it is present at a frequency in Sect. 3.1. However, in this case every
less than 25%. Altogether, there are about PCR product (either discrete bands or total
30 000 islands in the haploid genome (the product) is used as a probe to isolate link-
average spacing is about 1 per 100 kb). It ing clones from the defined region of the
is now clear that the majority (if not all) chromosomes.
of CpG islands are associated with genes. Genomic subtractive methods represent
It has been shown that recognition sites potentially powerful tools for identification
for many of the rare cutting enzymes are of deleted sequences and cloning re-
closely associated with CpG islands. For gion–specific markers. This approach has
example, at least 82% of all NotI and 76% given rewarding results in less-complex
of all XmaIII sites are located in the CpG systems such as yeast or cDNA libraries,
islands. More than 20% of CpG-island- but the great complexity of the human
containing genes have at least one NotI genome has generated serious problems.
site in their sequence, while about 65% of These problems can be overcome by reduc-
these genes have XmaIII site(s). ing the complexity of the human genomic
sequences. Two approaches have been
3.3 suggested to achieve this aim. In one (rep-
Alu-PCR and Subtractive Procedures to resentational difference analysis), only a
Clone CpG Islands from Defined Regions of subset of genomic sequences (e.g. BamHI
Chromosomes fragments less then 1 kb) is used for
subtractive procedures; this approach will
The Alu-PCR approach is used success- result in cloning of random sequences. In
fully for cloning DNA markers, but it the other, NotI linking libraries are used
does result in cloning small DNA frag- instead of whole genomic DNA. Interme-
ments (500 bp) between Alu sequences. diate products, that is, circles after the
Alu sequences are distributed in a ran- first ligation step can also be successfully
dom fashion and are not linked with utilized for subtraction (Fig. 8).
genes or other markers. An obvious sug- The NotI linking library is at least 100
gestion for making Alu-PCR more useful times lower in complexity than the whole
for mapping is to use not simply genomic human genome. It is approximately equal
DNA from different sources but linking in complexity to the yeast genome. Since
libraries constructed from these sources. this approach is not linked with Alu
This modification has certain advantages: repeats, it offers the possibility of isolating
using isolated probes, it is easy to clone NotI linking clones that are unavailable for
a parental linking clone (e.g. NotI), which cloning using Alu-PCR.
Genomic DNA Libraries, Construction and Applications 459

Normal DNA Tumor DNA


R R R R R R R R R R R

N3 N1 N2 N*3
N1 N2
R R R R RR R R R R R R R R R R
R-digestion

N1 N2 N3 N1 N*3
R R Dilution and R R
R R self ligation R
R

N3 N1 N*3
N1 N2 N-digestion R
R R R R
R N3 R N3

N1 N2 N1 N1 N*3
N1 N2
Not I linker ligation
R R R R

N1 N1 N2 N2 N3 N3 N1 N1
dNTP PCR amplification with dUTP
R R R R

N1 N1 N2 N2 N3 N3 1:100 N1 N1

Denaturation, hybridization

Homohybrids A Heterohybrids Homohybrids B


R R
R

N2 R N2 N1 N1 N1 N1
R R
N3 N3
N1 N1 N1 N1
UDG, mung bean nuclease

N2 R N2 N3 R N3

PCR amplification, labeling, and cloning


Fig. 8 General scheme for the NotI-CODE (Cloning Of DEleted sequences)
procedure. N, NotI sites; R, restriction enzymes recognizing a 4 to 6 bp sequence.
Methylated NotI site is indicated by an asterisk. UDG, uracil-DNA glycosylase
destroys all DNA molecules containing dUTP. Mung bean nuclease digests
double-stranded DNA with mismatches and all single-stranded molecules. Circles
digested with NotI and PCR-amplified are called NR, NotI representation, because
only sequences surrounding NotI sites are present in this amplification product. The
N2 site is deleted and the N*3 site is methylated in tumor DNA.
460 Genomic DNA Libraries, Construction and Applications

3.4 Another method was called CIS (cloning


IBD (Identical-by-descent) Fragments for of identical sequences). The scheme of the
Identification of Disease Genes CIS-procedure is shown in Fig. 9.
DNA A and B is digested with BamHI
Development of the methods permit- and ligated to special linkers containing
ting cloning of identical sequences (CIS) two recognition sites for MvnI. DNA A is
between two sources of DNA can be PCR-amplified in the presence of dUTP
very useful for many purposes, including and m5dCTP; thus, all cytosines will be
isolation of disease genes. Identical-by- methylated. DNA B is PCR-amplified in
descent(IBD) sequences refer to segments the presence of normal dCTP and biotiny-
of the human genome shared by two in- lated primers. The two DNA preparations
dividuals because they are inherited from are mixed in equal ratios, denatured, and
a common ancestor. Regions that are IBD hybridized. Subsequently, the DNA is di-
between individuals affected with a dis- gested with MvnI. This enzyme can digest
ease conceivably can contain the disease only dsDNA molecules without methylcy-
gene(s). Two approaches were suggested tosine and will digest all homohybrids B
to clone such IBD sequences. (they contain at least four sites for MvnI).
In GMS (genomic mismatch scanning), The DNA mixture is next treated with
each DNA preparation is digested with PstI mung bean nuclease. This nuclease de-
to yield fragments with protruding 3 ends. stroys all imperfect hybrids and ssDNAs.
The 3 protruding ends are protected from Thus, after this treatment we will have only
digestion by exonuclease III (ExoIII) in perfect homohybrids A and perfect (with-
later steps. One of the DNA preparations out any mismatches) heterohybrids. The
is fully methylated at all GATC sites DNA mixture is then treated with UDG
with E. coli Dam methylase (DAM+). (uracil-DNA glycosylase). This enzyme re-
The other DNA preparation remains moves the uracil base from the DNA and
unmethylated. The two DNA pools are thus destroys all DNA from individual A.
then mixed in equal ratios, denatured, As a result, there will be only ssDNA from
and allowed to reanneal. Digestion of individual B, which is identical to the DNA
the reannealed DNA with both DpnI and in individual A.
MboI, which cut at fully methylated and
unmethylated GATC sites respectively, 3.5
results in digestion of the homohybrids. Strategies to Map and Sequence Genomes;
The heterohybrids are resistant to both Hierarchical, Whole-genome, and Slalom
DpnI and MboI digestion and survive Sequencing Approaches
this treatment. Discrimination between
perfect, mismatch-free heterohybrids and During the last few years, impressive
those with mismatches is done by the progress has been made in mapping and
MutHLS enzyme. Only perfect duplexes sequencing whole genomes of various
will escape nicking during this step. organisms. Two basic strategies have so far
All DNA molecules, except mismatch- been employed for genome sequencing.
free ones, are degraded further with According to one scheme (hierarchical
ExoIII. Thus, the full-length, unaltered approach), the whole genome is mapped
heterohybrids are purified from the other using different types of markers, and a
DNA fragments. minimal set of large-insert clones, such
Genomic DNA Libraries, Construction and Applications 461

DNA A DNA B
x x

B B B B B B

B B Bam HI
Ligate linkers B B B B
B B
B B B B
x x
PCR amplification with
dUTP( ) and dCTP and biotinylated
Bm mB
d5mCTP(m) primers (b) B B
B B b
m Bm mB b
Bm m B b b B B
x x m b b
m
Homohybrids A Heterohybrids Denaturation,
hybridization
Bm mB Bm mB Homohybrids B
x x B B B B
b
m b b b
b
Bm mB B m mB
m b Digest with Mvnl
(destroy homohybrids B)
Bm mB Bm mB
x x B B
m b b
B m mB b
Bm mB
b Mung bean nuclease
m
(destroy imperfect hybrids
Homohybrids A and single-stranded DNA)
Heterohybrids
Bm B Bm mB Bm mB Bm m B
x x
m m b b

UDG (destroy dUTP containing DNA)

b b
Purification of biotinylated
molecules with streptavidin beads,
PCR amplification

Fig. 9 General scheme of the CIS, cloning of identical sequences, procedure. The same
enzymes are used as in the CODE method but in a different order, and the result
is opposite.

as cosmids, PAC, or BAC clones, is fragments of the large-insert clones, are


established. Subsequently, these large- constructed and sequenced.
insert clones are sequenced using a A variant of this strategy, the whole-
shotgun sequencing strategy: small-insert genome shotgun (WGS) sequencing strat-
libraries, containing randomly sheared egy, was developed later and has proved
462 Genomic DNA Libraries, Construction and Applications

valuable. This method involves end se- make it possible to locate virtually every
quencing of large- (PAC, BAC, cosmids) gene in a genome, for more detailed (com-
and small-insert (2 and 10 kb) clones. parative) study. Furthermore, since the
DNA fragments in the small-insert clones efficiency of contig assembly in the slalom
are generated by physical shearing of approach is virtually independent of se-
whole genomic DNA. All resulting reads quence read length, even short sequences,
are joined in one sequence with special as produced by rapid high-throughput se-
computer programs. The WGS method quencing techniques suffice to complete
requires the generation of sequences cov- a physical map and sequence scan of a
ering the whole genome 10 to 15 times. small genome. A combination of these new
If sequence coverage is less, then the con- sequencing techniques with the slalom ap-
tig assembly process cannot be completed, proach increases the power of the method
and sequences and clones will just repre- 10 to 50 times more and makes it an effi-
sent islands, without connection or order. cient tool for comparative genomics. The
Therefore, despite impressive technologi- main principle of the slalom libraries is
cal progress, mapping and sequencing of shown in Fig. 10.
even small bacterial genomes is expensive Two standard genomic EcoRI- and
and laborious. BamHI-digested libraries are constructed
After completion of the genomic se- and they will completely cover the whole
quence from one organism, there will genome. The problem is how to put EcoRI
be a great demand, in many cases, for and BamHI fragments in the correct order.
comparison with the genomes of other in- It can be solved using the connecting
dividuals, related species, pathogenic and library. The connecting library can be
nonpathogenic strains, and so on, in the constructed as follows: DNA isolated ‘‘en
growing field of comparative genomics. masse’’ from an EcoRI library is digested
Such comparisons are highly relevant to with BamHI, circularized in the presence
our understanding of human and animal of the Kanr gene, and plated on agar
health, evolution, and ecology. with kanamycin. The clones isolated in
An efficient strategy for simultaneous this manner will be identical in structure
genome mapping and sequencing was re- to the clones from an EcoRI jumping
cently developed. The approach is based on library prepared in the classical way.
slalom libraries, which combine features By comparing end sequences in these
of general genomic, jumping, and linking three slalom libraries, all clones can be
libraries. First experiments demonstrated positioned relative to each other and a
the feasibility of the approach, and showed minimal contig of overlapping clones may
that the efficiency (cost-effectiveness and be established.
speed) of existing mapping/sequencing
methods can be improved at least 5- to 3.6
10-fold. The slalom allows the establish- Restriction-site-tagged Microarrays to Study
ment of a physical map, with minimal CpG-Island Methylation
sets of overlapping clones, which will pin-
point differences in genome organization Methylation, deletions, and amplifications
between organisms. At the same time, con- of cancer genes constitute important
siderable sequence coverage of the genome mechanisms in carcinogenesis, and CGI
(about 50%) will be achieved. This will (CpG-island-containing) microarrays were
Genomic DNA Libraries, Construction and Applications 463

Slalom library
1 2 6 7

Eco RI
6 7
1 2

Connecting
5 8
3 4
Contig of
overlapping clones

Eco RI restriction sites

Genomic DNA

Bam HI restriction sites


Bam HI

3 4 5 8 9

Fig. 10 Simplified slalom library approach scheme. Numbers designate


identical end sequences in different libraries that can be joined by a
computer program in a contig of overlapping clones. Dashed lines show
genomic DNA sequences deleted in the connecting library.

suggested to study hypermethylation in size and complexity of the human genome,


cancer cells. These microarrays can de- (2) the number of repeat sequences, and
tect methylation changes in tumor DNA. (3) the comparatively small sizes of the in-
However, it is unclear whether these mi- serts in NotI clones (on average 6–8 kb).
croarrays can be used to detect hemizygous A special procedure was developed to am-
methylation or copy-number changes. As plify only regions surrounding NotI sites,
the whole human genome DNA is used the so-called NotI representations (NRs,
for labeling and the clones are small see Fig. 8). Other DNA fragments were
(0.2–2 kb), this creates a serious prob- not amplified. Therefore, only 0.1 to 0.5%
lem. Oligonucleotide-based microarrays of the total DNA is labeled. Interestingly,
also can be used to study methylation sequences surrounding NotI sites contain
changes in cancer cells; however, only a 10-fold fewer repetitive sequences than the
limited number of genes can be tested in human genome on average, and there-
such experiments. fore, these microarrays are not as sensitive
A rough estimation is that the human as other methods to the background hy-
genome contains 15 000 to 20 000 NotI bridization caused by repeats. The main
sites. Therefore, thousands of genes could idea of this application is clear from Fig. 8
be tested with NotI clone microarrays. The (tumor DNA). If a particular NotI site is
fundamental problems for genome-wide present in the DNA, then the circle will be
screening using NotI clones are (1) the opened with NotI and labeled. However, if
464 Genomic DNA Libraries, Construction and Applications

this NotI site is deleted or methylated, then to isolate new genes or uniquely describe
the NR will not contain the correspond- eukaryotic or prokaryotic genomes.
ing DNA sequences. The NotI microarrays These results led to the realization that
can simultaneously detect copy-number it would be possible to use short se-
changes and methylation and, therefore, quences surrounding NotI sites or, in
they allow the simultaneous study of ge- general, restriction-site-tagged sequences
netic and epigenetic factors. (RSTS) for the analysis of complex mi-
The technique underlying the prepa- crobial mixtures. The collection of NotI
ration and use of NotI microarrays is tags represents the NotI sequence passport
applicable to any restriction enzyme and or in short NotI passport. NotI passport-
represents a new type of microarray, re- ing means the process of creating NotI
ferred to as restriction-site-tagged (RST) tags/passports.
microarrays. Such RST microarrays can be The general design of the experiment
used for different purposes, for example, is as follows. Genomic DNA is digested
to study species composition of complex with NotI and ligated to a linker with
microbial systems. NotI sticky ends. This linker contains
the BpmI recognition sites. This restric-
tion nuclease cuts 16/14 bp outside of the
3.7
recognition site. The ligation mixture is
Restriction-site-tagged Sequences to Study
Biodiversity PCR-amplified with special primers and,
finally, 19 bp tags flanking NotI sites are
generated. DNA from, for instance, fe-
There is still much to learn about the
cal samples and surgical specimens is
human normal microflora. The human
digested with NotI, and a NotI passport
gut contains approximately 1 to 2 kg
for the particular specimen is gener-
of bacterial cells. The number of these
ated. A comparison of such passports
cells in the intestine is 10 to 100 times
from different individuals or from the
larger than the number of cells in the
same individual before and after drug
human body, but at best 10 to 15%
treatment will reveal the differences be-
of the microbial species are known. To tween them.
be able to analyze complex microbial Analysis of tags for NotI, PmeI, and
mixtures is of great importance for many SbfI for 70 completely sequenced bacteria
applications. For instance, differences revealed that more than 95% of tags are
between individual compositions of the species-specific and even different strains
normal flora will be instrumental for of the same species can be distinguished.
future analysis of the effects on the None of these tags matched human or
normal flora composition of diet, foods, rodent sequences. Therefore, the approach
geographical location, and medication. allows analysis of complex microbial
Conversely, the effects of gut microflora on mixtures such as those in the human gut
aging, autoimmunity, and colonic cancer and identification with high sensitivity of a
risk can be studied. particular bacterial strain on a quantitative
Analysis of human NotI flanking se- and qualitative basis.
quences (see Sect. 2.3) have demonstrated A similar approach can be used for
that even short sequences surrounding eukaryotic cells, for example, for analysis
NotI sites can yield information sufficient of cancer cells.
Genomic DNA Libraries, Construction and Applications 465

RSTS-passporting and RST-microarray in: Setlow, J.K. (Ed.) Genetic Engineering


approaches are mutually complementary. Principles and Methods, Vol. 10, Brookhaven
National Laboratory, Upton, New York
These two approaches are based on com- and Plenum Press, New York, London,
pletely different biochemical techniques pp. 169–193.
but aim to solve the same problems. Sambrook, J., Fritsch, E.F., Maniatis, T. (1989)
Molecular Cloning: A Laboratory Manual, 2nd
edition, Cold Spring Harbour Laboratory
Press, Cold Spring Harbour, New York.
4 Strachan, T., Read, A. (1999) Human Molecular
Summary Genetics, Wiley, New York co-published with
BIOS Scientific Publishers, Oxford.
While several different strategies are avail- Zabarovsky, E.R., Kashuba, V.I., Gizatullin, R.Z.,
Winberg, G., Zabarovska, V.I., Erlandsson, R.,
able to obtain and use DNA markers for
Domninsky, D.A., Bannikov, V.M., Pokrovs-
identifying and mapping DNA sequences kaya, E., Kholodnyuk, I., Petrov, N., Za-
in complex organisms, no single system is kharyev, V.M., Kisselev, L.L., Klein, G. (1996)
likely to suffice for obtaining a complete NotI jumping and linking clones as a tool
and accurate map and sequence of the hu- for genome mapping and analysis of chro-
mosome rearrangements in different tumors,
man genome. Rather, a combination of
Cancer Detect. Prev. 20, 1–10.
different approaches and vector systems Zabarovsky, E.R., Winberg, G., Klein, G. (1993)
is needed to corroborate data from differ- The SK-diphasmids – vectors for genomic,
ent sources. jumping and cDNA libraries, Gene 127,
1–14.

See also Body Expression Map of Primary Literature


Human Genome; Genetics, Molec-
Adorjan, P., Distler, J., Lipscher, E., Model, F.,
ular Basis of. Muller, J., Pelet, C., Braun, A., Florl, A.R.,
Gutig, D., Grabs, G., Howe, A., Kursar, M.,
Lesche, R., Leu, E., Lewin, A., Maier, S.,
Bibliography Muller, V., Otto, T., Scholz, C., Schulz, W.A.,
Seifert, H.H., Schwope, I., Ziebarth, H.,
Berlin, K., Piepenbrock, C., Olek, A. (2002)
Books and Reviews Tumour class prediction and discovery by
microarray-based DNA methylation analysis,
Ausubel, F.M., Kingston, R.E., Brent, R., Moore, Nucleic Acids Res. 30, e21, 1–9.
D.D., Sedman, J.G., Struhl, K., Smith, J.A. Allikmets, R.L., Kashuba, V.I., Bannikov, V.M.,
(1987–2003) Current Topics in Molecular et al. (1994) NotI linking clones as tools to join
Biology, Wiley, New York. physical and genetic mapping of the human
Bird, A. (2002) DNA methylation patterns and genome, Genomics, 19, 303–309.
epigenetic memory, Genes Dev. 16, 6–21. Bicknell, D.C., Markie, D., Spurr, N.K., Bod-
Brown, T.A. (1999) Genomes, Wiley, New York mer, W.F. (1991) The human chromosome
co-published with BIOS Scientific Publishers, content in human x rodent somatic cell hy-
Oxford. brids analyzed by a screening technique using
Collins, F.S. (1988) Chromosome Jumping, in: Alu PCR, Genomics 10, 186–192.
Davis, K.E. (Ed.) Genome Analysis: A Practical Bird, A., Taggard, M., Frommer, M., Miller, O.J.,
Approach, IRL Press, Oxford, pp. 73–94. Macleod, D. (1985) A fraction of the mouse
Mueller, R.F., Young, I.D. (2001) Emery’s genome that is derived from islands of
Elements of Medical Genetics, Churchill nonmethylated, CpG-rich DNA, Cell 40,
Livingstone, Edinburgh. 91–99.
Poustka, A., Lehrach, H. (1988) Chromosome Brenner, S., Johnson, M., Bridgham, J., et al.
Jumping: A Long Range Cloning Technique, (2000) Gene expression analysis by massively
466 Genomic DNA Libraries, Construction and Applications

parallel signature sequencing (MPSS) on scanning method and its various applications,
microbead arrays, Na. Biotechnol. 18, 630–634. Electrophoresis 14, 251–258.
Brookes, A.J., Porteous, D.J. (1991) Coincident Kunkel, L.M., Monaco, A.P., Middlesworth, W.,
sequence cloning, Nucleic Acids Res. 19, Ochs, H.D., Latt, S.A. (1985) Specific cloning
2609–2613. of DNA fragments absent from the DNA
Brown, P.O., Botstein, D. (1999) Exploring the of a male patient with an X chromosome
new world of the genome with DNA deletion, Proc. Natl. Acad. Sci. U.S.A. 82,
microarrays, Nat. Genet. 21(Suppl. 1), 33–37. 4778–4782.
Broder, S., Venter, J.C. (2000) Sequencing the Kutsenko, A., Gizatullin, R., Al-Amin, A.N., et al.
entire genomes of free-living organisms: the (2002) NotI flanking sequences: a tool for
foundation of pharmacology in the new gene discovery and verification of the human
millennium, Annu. Rev. Pharmacol. Toxicol. genome, Nucleic Acids Res. 30, 3163–3170.
40, 97–132. Lamar, E.E., Palmer, E. (1984) Y-encoded,
Carninci, P., Shibata, Y., Hayatsu, N., et al. species-specific DNA in mice: evidence
(2001) Balanced-size and long-size cloning of that the Y chromosome exists in two
full-length, cap-trapped cDNAs into vectors of polymorphic forms in inbred strains, Cell 37,
the novel lambda-FLC family allows enhanced 171–177.
gene discovery rate and functional analysis, Lander, E.S., Linton, L.M., Birren, B., et al.,
Genomics 77, 79–90. International Human Genome Sequencing
Cheung, V.G., Gregg, J.P., Gogolin-Ewens, K.J., Consortium (2001) Initial sequencing and
et al. (1998) Linkage-disequilibrium mapping analysis of the human genome, Nature 409,
without genotyping, Nat. Genet. 18, 225–230. 860–921.
Collins, F.S., Weissman, S.M. (1984) Directional Larsen, F., Gundersen, G., Prydz, H. (1992)
cloning of DNA fragments at a large distance Choice of enzymes for mapping based on
from an initial probe: a circularization method, CpG islands in the human genome, Genet.
Proc. Natl. Acad. Sci. U.S.A. 81, 6812–6816. Anal. Tech. Appl. 9, 80–85.
Collins, F.S., Drumm, M.L., Cole, J.L., et al. Li, J., Protopopov, A., Wang, F., et al. (2002) NotI
(1987) Construction of a general human subtraction and NotI-specific microarrays to
chromosome jumping library, with application detect copy number and methylation changes
to cystic fibrosis, Science 235, 1046–1049. in whole genomes, Proc. Natl. Acad. Sci. U.S.A.
Costello, J.F., Fruhwald, M.C., Smiraglia, D.J., 99, 10724–10729.
et al. (2000) Aberrant CpG-island methylation Li, J., Wang, F., Kashuba, V., et al. (2001)
has non-random and tumour-type-specific Cloning of deleted sequences (CODE): a
patterns, Nat. Genet. 24, 132–138. genomic subtraction method for enriching and
Cross, S.H., Charlton, J.A., Nan, X., Bird, A.P. cloning deleted sequences, Biotechniques 31,
(1994) Purification of CpG islands using a 788–793.
methylated DNA binding column, Nat. Genet. Li, J., Wang, F., Zabarovska, V.I., et al. (2000)
6, 236–244. COP – a new procedure for cloning single
Eads, C.A., Danenberg, K.D., Kawakami, K., et al. nucleotide polymorphisms, Nucleic Acids Res.
(2000) MethyLight: a high-throughput assay to 28, e1,1–5.
measure DNA methylation, Nucleic Acids Res. Lindblad-Toh, K., Tanenbaum, D.M., Daly, M.J.,
28, e32, 1–8. et al. (2000) Loss-of-heterozygosity analysis
Galm, O., Rountree, M.R., Bachman, K.E., et al. of small-cell lung carcinomas using
(2002) Enzymatic regional methylation assay: single-nucleotide polymorphism arrays, Nat.
a novel method to quantify regional Biotechnol. 18, 1001–1005.
CpG methylation density, Genome Res. 12, Lisitsyn, N., Lisitsyn, N., Wigler, M. (1993)
153–157. Cloning the differences between two complex
Gonzalgo, M.L., Liang, G., Spruck, C.H., et al. genomes, Science 259, 946–951.
(1997) Identification and characterization of Lucito, R., West, J., Reiner, A. (2000) Detecting
differentially methylated regions of genomic gene copy number fluctuations in tumor
DNA by methylation-sensitive arbitrarily cells by microarray analysis of genomic
primed PCR, Cancer Res. 57, 594–599. representations, Genome Res. 10, 1726–1736.
Hayashizaki, Y., Hirotsune, S., Okazaki, Y. Mirzayans, F., Mears, A.J., Guo, S.W., Pearce,
(1993) Restriction landmark genomic W.G., Walter, M.A. (1998) Identification of
Genomic DNA Libraries, Construction and Applications 467

the human chromosomal region containing Smith, C.L., Lawrance, S.K., Gillespie,
the iridogoniodysgenesis anomaly locus by G.A., et al. (1987) Strategies for map-
genomic-mismatch scanning, Am. J. Hum. ping and cloning macroregions of mam-
Genet. 61, 111–119. malian genomes, Methods Enzymol. 151,
Myers, E.W., Sutton, G.G., Delcher, A.L., et al. 461–489.
(2000) A whole-genome assembly of Droso- Snijders, A.M., Nowak, N., Segraves, R., et al.
phila, Science 287, 2196–2204. (2001) Assembly of microarrays for genome-
Nelson, S.F. (1995) Genomic mismatch wide measurement of DNA copy number, Nat.
scanning: current progress and potential Genet. 29, 263, 264.
applications, Electrophoresis 16, 279–285. Sugimura, T., Ushijima, T. (2000) Genetic and
Nelson, S.F., McCusker, J.H., Sander, M.A. epigenetic alterations in carcinogenesis,
(1993) Genomic mismatch scanning: a new Mutat. Res. 462, 235–246.
approach to genetic linkage mapping, Nat. Ushijima, T., Morimura, K., Hosoya, Y., et al.
Genet. 4, 11–18. (1997) Establishment of methylation-sensitive-
Nussbaum, R.L., Lesko, J.G., Lewis, R.A., Led- representational difference analysis and
better, S.A., Ledbetter, D.H. (1987) Isolation isolation of hypo- and hypermethylated
of anonymous DNA sequences from within genomic fragments in mouse liver tumors,
a submicroscopic X chromosomal deletion Proc. Natl. Acad. Sci. U.S.A. 94, 2284–2289.
in a patient with choroideremia, deafness, Velculescu, V.E., Zhang, L., Vogelstein, B.,
and mental retardation, Proc. Natl. Acad. Sci. Kinzler, K.W. (1995) Serial analysis of gene
U.S.A. 84, 6521–6525. expression, Science 270, 484–487.
Palmisano, W.A., Divine, K.K., Saccomanno, G., Venter, J.C., Adams, M.D., Myers, E.W., et al.
et al. (2000) Predicting lung cancer by (2001) The Sequence of the Human Genome,
Science 291, 1304–1351.
detecting aberrant promoter methylation in
Waterston, R.H., Lindblad-Toh, K., Birney, E.,
sputum, Cancer Res. 60, 5954–5958.
et al. (2002) Initial sequencing and
Pinkel, D., Segraves, R., Sudar, D., et al. (1998)
comparative analysis of the mouse genome,
High resolution analysis of DNA copy
Nature 420, 520–562.
number variation using comparative genomic
Worm, J., Aggerholm, A., Guldberg, P. (2001)
hybridization to microarrays, Nat. Genet. 20,
In-tube DNA methylation profiling by
207–211.
fluorescence melting curve analysis, Clin.
Poustka, A., Pohl, T.M., Barlow, D.P., Frischauf, Chem. 47, 1183–1189.
A.M., Lehrach, H. (1987) Construction and Yan, P.S., Chen, C.M., Shi, H., et al. (2001)
use of human chromosome jumping libraries Dissecting complex epigenetic alterations in
from NotI-digested DNA, Nature 325, breast cancer using CpG island microarrays,
353–355. Cancer Res. 61, 8375–8380.
Protopopov, A., Kashuba, V., Zabarovska, V.I., Zabarovska, V.I., Gizatullin, R.G., Al-Amin,
et al. (2003) An integrated physical and gene A.N., et al. (2002) Slalom libraries: a new
map of the 3.5-Mb chromosome 3p21.3 approach to genome mapping and sequencing,
(AP20) region implicated in major human Nucleic Acids Res. 30, e6, 1–8.
epithelial malignancies, Cancer Res. 63, Zabarovska, V., Kutsenko, A., Petrenko, L., et al.
404–412. (2003) NotI passporting to identify species
Ronaghi, M., Pettersson, B., Uhlen, M., Ny- composition of complex microbial systems,
ren, P. (1998) A sequencing method based Nucleic Acids Res. 31, e5, 1–10.
on real-time pyrophosphate, Science 281, Zabarovska, V., Li, J., Fedorova, L., et al. (2000)
363–365. CIS – cloning of identical sequences between
Rosenberg, M., Przybylska, M., Straus, D. (1994) two complex genomes, Chromosome Res. 8,
‘RFLP subtraction’: a method for making 77–84.
libraries of polymorphic markers, Proc. Natl. Zabarovsky, E.R., Allikmets, R.L. (1986) An
Acad. Sci. U.S.A. 91, 6113–6117. improved technique for the efficient
Shi, H., Maier, S., Nimmrich, I., et al. (2003) construction of gene library by partial filling-in
Oligonucleotide-based microarray for DNA of cohesive ends, Gene 42, 119–123.
methylation analysis: principles and applica- Zabarovsky, E.R., Boldog, F., Thompson, T., et al.
tions, J. Cell Biochem. 88, 138–143. (1990) Construction of a human chromosome
468 Genomic DNA Libraries, Construction and Applications

3 specific NotI linking library using a novel Zabarovsky, E.R., Winberg, G., Klein, G. (1993)
cloning procedure, Nucleic Acids Res. 18, The SK-diphasmids – vectors for genomic,
6319–6324. jumping and cDNA libraries, Gene 127,
Zabarovsky, E.R., Boldog, F., Erlandsson, R., 1–14.
et al. (1991) A new strategy for mapping the Zardo, G., Tiirikainen, M.I., Hong, C.,
human genome based on a novel procedure et al. (2002) Integrated genomic and
for constructing jumping libraries, Genomics epigenomic analyses pinpoint biallelic gene
11, 1030–1039. inactivation in tumors, Nat. Genet. 32,
Zabarovsky, E.R., Kashuba, V.I., Zakharyev, 453–458.
V.M., et al. (1994) Shot-gun sequencing Zoubak, S., Clay, O., Bernardi, G. (1996) The
strategy for long range genome mapping: first gene distribution of the human genome, Gene
results, Genomics 21, 495–500. 174, 95–102.

You might also like