Professional Documents
Culture Documents
185
186 Techniques | Genome Sequence Databases: Genomic, Construction of Libraries
Vectors Used in Genomic Library Features of and around the insert site also have an
Construction impact on stability and representation of the library. For
the creation of a representative genomic library, all DNA
Vector Selection
sections of the genome must be cloned, regardless of
The choice of backbone vector used for constructing a DNA secondary structure or the encoding of toxic gene
genomic library is highly dependent upon what studies products or strong promoters. Vectors that contain tran-
will be performed. When choosing a backbone vector, scription terminators flank the cloning site to prevent
decisions must be made about the desired copy number, transcription into and out of the cloned DNA and have
selective marker, size of genomic DNA insert, host been shown to be successful at mitigating cloning bias
range, and, if the cloned DNA is to be expressed, the against difficult DNA and increase vector stability. For
type of promoter and ribosomal binding site that should studies involving expression of the insert DNA, a variety
be upstream of the multiple cloning site. Also, the vector of vectors exist that offer options of using constitutive,
used for library construction should remain stable after inducible, titratable, or native promoters, as well as
genomic DNA fragments are inserted. options including ribosomal binding sites and start codons
For constructing a genomic library, it is important that upstream of the cloned insert.
the vector has the ability to replicate within the desired
host organism or organisms. The origin of replication, or Vector Preparation
the ori region, on a vector controls the host range and, to a
large extent, the copy number of the vector. Vectors with Once an appropriate vector has been selected, it must be
narrow host ranges such as those containing the ColE1 prepared for cloning. Purified vector should be free of
origin have only been found to replicate in Escherichia coli endonuclease contamination and chemicals such as phe-
and closely related bacteria. Broad host range vectors have nol or EDTA that may interfere with downstream
origins of replication that are recognized in a wide range of enzymatic reactions. There exist a number of standard
bacterial species. The origin of replication from the broad protocols and commercially available kits that are used
for purifying plasmid DNA. Plasmid purification proto-
host range plasmids RK2 and pBBR1 is functional across
cols take advantage of the smaller size of the plasmid
multiple Gram-negative species while plasmid RSF1010
DNA compared to the significantly larger chromosomal
has been found to be able to replicate within a number of
DNA. The method chosen is influenced in part by the size
both Gram-positive and Gram-negative species.
of the plasmid and the host strain. Many commonly used
The ori region (replicon) of a vector also affects the copy
cloning strains, such as E. coli DH5, have mutations
number of vector within the host. High copy number vec-
within their endA regions, which eliminate the nonspecific
tors offer the advantage of increased yield of purified vector
endonuclease activity of endonuclease I, thus allowing
from a volume of culture to be used for sequencing or
higher quality plasmid preparations from these strains.
molecular cloning purposes. Low copy number vectors
Other strains such as E. coli HB101 produce large amounts
are typically desired for studies involving expression of
of carbohydrates that can interfere with DNA extractions.
genomic DNA segments, especially those involving toxic
Large plasmids (>15 kb) are more fragile and thus have to
DNA segments. When a higher yield of a low copy number be treated with more gentle extraction methods than
vector is needed, low copy number vectors containing the small plasmids, which are less susceptible to damage.
ColE1 or pMB1 origin of replication, which allows for Once purified, a vector needs to be linearized prior to
plasmid number amplification prior to purification in the ligation. Using the sequence of the vector and the vector
presence of 170 mg l1 of chloramphenicol, may be used. map, an enzyme or a set of enzymes must be found that
Chloramphenicol inhibits protein synthesis and thus pre- cut only within the cloning site of the vector. Enzymes
vents chromosomal replication. The enzymes necessary for may leave blunt or sticky ends after digestion. When
replication of plasmids with the ColE1 or pMB1 origin selecting enzymes to digest the vector, it must be kept
require enzymes that are long lived and thus continue to in mind that the ends for the vector and the ends for the
replicate in the presence of chloramphenicol, reaching sev- fragmented genomic DNA that will be cloned must be
eral thousand copies per cell. compatible. The sticky ends of a digested piece of DNA
The ori region and the mechanism of vector replication may be converted to blunt ends by using particular DNA
have also been implicated to be responsible for the structural modifying enzymes that may fill in overhangs of sticky
stability of cloned vectors. Vectors with replicons using roll- ends or exhibit exonuclease activity against single-
ing circle mechanisms have frequently been found to be stranded DNA and thus degrade overhangs. T4 DNA
unstable for cloning purposes. This instability may be due polymerase or the Klenow fragment of E. coli DNA poly-
to the secondary structure formed by the lagging strand of merase I will remove 39 overhangs and fill in 59 overhangs
DNA during replication, which may be cleaved by nucleases when provided with deoxynucloside triphosphates.
or experience mutations or deletions during replication. Additionally, DNA end repair kits are available from
Techniques | Genome Sequence Databases: Genomic, Construction of Libraries 187
several commercial vendors that use proprietary enzymes PCR amplification may also be used to generate large
to generate blunt-ended DNA segments. amounts of linear dephosphorylated vector. Culture-
In an effort to prevent self-ligation and minimize the purified and linearized vector is PCR amplified with
number of clones without genomic DNA insert, the unphosphorylated primers designed to extend outward
59-termini phosphate groups required by ligase for ligation from the cloning site into the vector backbone.
reactions are removed from the linearized vector using an Proofreading polymerases such as Pfu or Pfx DNA poly-
alkaline phosphatase. The most commonly used alkaline merase will generate high-fidelity blunt-ended PCR
phosphatases within molecular biology are shrimp alkaline product. The choice of polymerase influences the types of
phosphatase from the Arctic shrimp Pandalus borealis, calf overhangs that will be generated as well as the fidelity of the
intestinal alkaline phosphatase from calf intestines, bacterial PCR product generated. The PCR product should be pur-
alkaline phosphatase from E. coli C4, and Antarctic ified away from the components in the PCR buffer prior to
Phosphatase from the psychrophilic bacterium strain ligation so as not to have extraneous nucleotides, primers,
TAB5. All of these phosphatases are effective at removing or salts that may interfere with enzymatic reactions. To do
59 phosphates from DNA, but vary in their activity, buffer so, the sample may be separated via agarose gel electro-
compatibility, and ability to be inactivated. Alkaline phos- phoresis followed by excision of the appropriate band
phatases bind tightly to DNA and thus may require containing the PCR product of linearized vector. DNA
aggressive methods to denature. To inactivate bacterial can then be extracted via a commercial gel extraction kit.
alkaline phosphatase and calf intestinal alkaline phosphatase
reactions, proteinase K is used to digest the phosphatase,
followed by a phenol–chloroform extraction and an ethanol Genomic DNA Preparation
precipitation. Alternatively, a commercial enzymatic reac-
Genomic DNA Purification
tion clean-up kit can be used to purify the
dephosphorylated DNA. Shrimp alkaline phosphatase and Genomic DNA must be isolated from proteins and other
Antarctic Phosphatase from TAB5 are thermolabile and can cellular debris prior to any enzymatic or mechanical
be completely heat inactivated. It desired to use an alkaline manipulation. Bacterial cells are lysed, typically through
phosphatase that is compatible with the restriction enzyme exposure to surfactants, such as sodium dodocyl sulfate or
buffer or buffers that were used in upstream preparation of Tween-20, or treatment with lysozyme to digest the
the vector and to have an alkaline phosphatase that is heat- polysaccharide component of cellular membranes and
labile to minimize the number of purification steps, which proteinase K for protease digestion. DNase-free RNase
may decrease the yield of vector DNA. may be added to the lysis step to minimize RNA contam-
The dephosphorylation step, which removes 59 phos- ination. Genomic DNA can be purified from cell lysate
phate groups from the linearized vector, may not go to using a phenol–chloroform extraction followed by an
completion and thus the amount of background from ethanol precipitation or commercially available silica col-
vector that can self-ligate may be significantly high. An umns. Commercially available kits for genomic DNA
optional step to help mitigate this problem is to perform a isolation are often desirable in that they avoid the use of
ligation reaction following the dephosphorylation step. phenol and chloroform, which are toxic and may interfere
Vector DNA that maintained the 59 phosphate groups with downstream enzymatic reactions. Many commer-
will self-ligate while linearized vector DNA that has had cially available kits avoid phenol by using buffers
its 59 phosphates successfully removed will be incapable containing the chaotropic agent guanidine hydrochloride
of self-ligating and thus remain linear. The circular self- to aid in cell lysis and to effectively denature proteins.
ligated vector DNA and the linear vector DNA can then Purified genomic DNA should be maintained in a nucle-
be separated by agarose gel electrophoresis. The linear- ase-free Tris buffer or in nuclease-free water.
ized vector can be extracted with a scalpel and purified Nuclease contamination is a frequent concern asso-
using traditional DNA extraction methods or commer- ciated with genomic DNA isolation. Nuclease activity
cially available kits for DNA extraction from agarose gels. will degrade DNA and can be easily mistaken for a
If the ligation step to remove vectors that had not been restriction enzyme digestion or the result of mechanical
successfully dephosphorylated is omitted, the linearized shearing. Nuclease contamination may be detected by
vector should still be purified from the restriction enzyme incubating an aliquot of purified DNA at 37 C for 18 h
digestion and the alkaline phosphatase reaction buffers and and then visualizing the DNA on an agarose gel. A control
enzymes. Phenol–chloroform purification followed by an aliquot of DNA that had been stored frozen should be
ethanol precipitation or a commercially available spin col- used for comparison. Following electrophoresis, if the
umn for cleanup of enzymatic reactions may be used for incubated aliquot appears to have degraded, nucleases
this purpose. Purifying the vector DNA will remove pro- may be contaminating the genomic DNA sample.
teins and buffer components that may lessen the efficiency Additionally, the DNaseAlert kit available from Ambion
of the ligation of the vector to genomic DNA insert. (Austin, TX) can be used to detect DNase contamination
188 Techniques | Genome Sequence Databases: Genomic, Construction of Libraries
Fragmentation
Once purified and established as free of contaminating
nucleases, genomic DNA must be fragmented to a desired
size and made compatible for ligation into a prepared
vector. Genomic DNA may be fragmented using enzy-
matic or mechanical means. Enzymatic fragmentation is
accomplished using either restriction endonucleases or
DNase I in the presence of manganese ions. Digestion
with DNase I offers the ability to generate a more random
pool of DNA segments compared to digestion with
restriction endonucleases, which are biased based on the
sequence of the genomic DNA and the recognition sites
of the enzymes. Despite this, DNA digestion with restric-
tion endonucleases is often preferred due to simplicity in
reaction set-up and controllability. Appropriate restric-
tion endonucleases for genomic DNA digestions are
chosen based on five factors:
1. Frequency of cutting.
2. Buffer compatibility.
3. Ability to be denatured. Figure 1 Restriction enzyme digestion of genomic DNA for
4. Methylation sensitivity. 5 min (middle) and 16 h (right) next to DNA standard ladder (left).
5. The type of overhang that is produced.
Restriction endonucleases have known recognition sites enzymes may be chosen based on the type of overhang that is
ranging from 4 to greater than 30 nucleotide bp. Their left after cleavage. It can be arranged to have blunt ends or
frequency of cutting within a genome may be predicted if overhangs (called cohesive ends) that would be compatible
information is known about the genome sequence and the with the prepared vector. This will eliminate a step to
recognition sequence of the enzyme. Enzymes that cut with modify the ends of the DNA fragments prior to ligation.
a high frequency in the genome, typically containing smaller Mechanical methods for DNA fragmenting offer the
recognition sequences, can be used to generate suitably advantage of being unbiased toward DNA sequence and
random fragments by using partial digestions (Figure 1), or thus are useful for creating a more random pool of
digestions that have not gone to completion. More than one fragments. The main disadvantage associated with
restriction enzyme may be used for the partial digestion of mechanically fragmenting DNA is the limitation on the
DNA to ameliorate bias based on recognition sequence. size of fragments that can be generated as well as the
Ideally, all of the restriction enzymes used in a digestion extensive treatment that is required to repair the ends of
should have similar activity within a common reaction buffer the DNA necessary before they can be cloned into back-
and ability to be denatured to minimize DNA purification bone vector. A French press, sonicator, clinical nebulizer,
steps. Additionally, the restriction enzymes selected may be small gauge syringe, and HydroShear (Genomic Solutions
desired to be insensitive to dam methylation (methylation of Inc., Ann Arbor, MI) are common tools used to fragment
the N6 position of the adenine in the sequence GATC) or DNA. Of these tools, the HydroShear was deliberately
dcm methylation (methylation of the cytosine at its C5 designed to shear DNA using hydrodynamic force and
found in the sequences CCTGG and CCAGG), which can be used to reproducibly create random fragments of
may render the DNA resistant to cleavage. Lastly, restriction DNA within a limited size range, independent of DNA
Techniques | Genome Sequence Databases: Genomic, Construction of Libraries 189
short wavelength UV light has been shown to damage with a different method. Fluorometric measurements of
DNA and decrease its cloning efficiency. In addition, DNA are more accurate than those obtained from UV
EtBr is a potent mutagen and moderately toxic, so safer spectroscopy and can detect smaller quantities of DNA.
alternatives or dyes that can be seen without the aid of To quantify DNA, DNA-specific fluorescent stains, such
UV light are often desired. Alternative dyes include crys- as PicoGreen or SYBR Green I, are added to a DNA
tal violet, methylene blue, SYBR Safe(Invitrogen, sample and the fluorescence of the sample is compared to
Carlsbad, CA), or Nile blue. the fluorescence of standards of known concentrations.
Accurate DNA quantification can also be achieved by
running an aliquot of sample along with a standard
DNA mass ladder on an agarose gel and comparing
Quantifying DNA and Determining Quality DNA band intensities. This method is effective when
quantifying distinctly sized pieces of DNA.
It is important to be able to determine the amount and
purity of DNA within a sample. DNA should be quanti-
fied following purifications steps to be sure of a sufficient Ligation Reactions
yield prior to any further manipulations. The quality of
DNA should also be monitored before initiating cloning A ligation reaction is required to bind fragmented geno-
steps to ensure that there are minimal contaminants that mic DNA into linearized vector. The most commonly
would interfere with the efficiency of cloning reactions. used ligase, T4 ligase, is derived from the T4 bacterioph-
The purity of a DNA sample can be accessed by age and requires ATP as a cofactor and an available DNA
calculating the ratio of absorbance at 260 nm to the absor- 59 phosphate group on at least one of the two ligating
bance at 280 nm measured by a spectrophotometer. DNA fragments. When setting up a ligation reaction, the
Nucleic acids have a higher absorbance at 260 nm than moles of insert to moles of vector ratio may be varied to
at 280 nm. The reverse is true for proteins, which display find optimal conditions. Lower ratios may result in ineffi-
higher absorbance at 280 nm than at 260 nm. The absor- cient ligation reactions while higher ratios increase the
bance at each individual wavelength is thus influenced by risk of ligating more than one insert per vector. Typically,
the presence of both proteins and nucleic acids. Based on insert to vector ratios are varied from 1:1 to 5:1. Blunt-
the extinction coefficients for both of these macromole- ended ligations may perform best with higher ratios. A
cules, pure samples of DNA would have a A260/A280 ratio control ligation containing vector without insert should
of close to 2.0 and pure protein samples would have a also be conducted to give an estimate of background
A260/A280 ratio of close to 0.6. Typically, a DNA sample clones that contain self-ligated vector. Ligases may or
with a A260/A280 ratio of greater than 1.7 is acceptable for may not be required to be inactivated or purified from a
molecular cloning reactions. reaction prior to transformation. Following the recom-
The quantity of DNA can be measured via UV spec- mendations of the supplier of the ligase generally will
troscopy, fluorometry, or by comparison to a standard give the best ligation and transformation results.
mass ladder on an agarose gel. Due to the simplicity, the
concentration of DNA within a sample is frequently
approximated based on the absorbance reading at Transformation of Library DNA into
260 nm. The concentration is found through application Bacterial Host Strains
of the Beer–Lambert law, which relates absorbance with
concentration through the relationship Naked DNA in solution can be transferred into a bacterial
host strain via transformation of competent cells. Bacterial
A ¼ "bC
transformations with plasmid DNA is accomplished
where A ¼ absorbance, b ¼ pathlength of the sample cuv- through heat shock of chemically competent cells or elec-
ette, in units of length, " ¼ absorption coefficient in units troporation of electrocompetent cells. Transformation of
of volume/mass/length, and C ¼ concentration in units of chemically competent cells usually achieve 105–109
mass per volume. colony forming units (CFU) per mg of supercoiled DNA
Using a standard spectrophotometer with a path length while electroporation of electrocompetent cells can yield
of 1 cm, an absorbance reading at 260 nm (A260) of 1 up to 1010 CFU mg1 of DNA.
equates to a concentration of approximately 1 ng ml1. Generally, the preparation of chemically competent
As mentioned above, other molecules besides DNA that cultures of E. coli involves treating exponentially growing
absorb at this wavelength, including proteins, RNA, and cells with a salt solution, such as 0.1 M CaCl2. Plasmid
salts within the sample, can influence absorbance at DNA is mixed with the cells and the plasmid DNA and
260 nm. Due to this phenomenon, the amount of cell suspension are heat shocked at 42 C for a brief
DNA within a sample is usually confirmed or measured period, during which the cells can uptake the DNA.
Techniques | Genome Sequence Databases: Genomic, Construction of Libraries 191
While the exact mechanism of DNA uptake by this most likely due to DNA topology. For a combination of
method is not fully known, it is believed that the swelling these reasons, the initial transformation step of transferring
of the cells following treatment with the salt solution and cloned library DNA to a host strain is usually conducted
the activation of heat shock genes are important in cells with E. coli, provided that the cloning vector used has an
taking up DNA from their environment. Factors that origin of replication that can be recognized by E. coli DNA
influence the frequency of transformation include the polymerases. If desired, after this initial transformation
purity of the reagents and water used, the viable cell step, the extracted supercoiled plasmid DNA can be pre-
density of the culture, and the trace contaminants that pared from transformed E. coli and then transformed into a
are found on glassware. Additionally, the number of times different desired host cell line (Figure 4).
a culture has been passaged influences transformation
efficiencies. Best results are obtained from cultures started
directly from cryogenic freezer stock as opposed to cells Increasing the Number of Transformants
that have been continuously passaged.
Electrocompetent cells are prepared by repeated When a large number of recombinant clones is required,
washing of cells in low conductivity solutions such as the ligation reaction can be precipitated in the presence of
10% glycerol or 300 mmol l1 sucrose to reduce the yeast tRNA. Precipitation of ligation reactions prior to
ionic strength of the cell suspension. Electroporation electroporation has been shown to give up to a 400-fold
works by using a transmembrane electric field pulse to increase on the number of transformants. It is believed
create small holes, referred to as electropores, within the that the yeast tRNA alters or stabilizes the topology of the
bacterial membrane through which DNA can pass. ligated DNA, increasing its efficiency of transformation.
Electroporation conditions, such as pulse amplitude and In this method, a 5 ml ligation reaction is mixed with 1 mg
duration, must be sufficient enough to generate electro- of yeast tRNA from a 1 mg ml1 solution, brought up to a
pores but not increased to the point at which the number total volume of 20 ml with ultrapure water, and then
and size of electropores detrimentally affect transforma- precipitated with twice the volume of cold absolute etha-
tion efficiency by causing cell damage or death. The nol. The DNA is pelleted by centrifugation, washed twice
number of pulses, along with the pulse duration and with cold 70% ethanol, and allowed to air dry prior to
amplitude, can be varied to empirically optimize condi- resuspension in 1 ml of ultrapure water. This sample can
tions for various cell lines. then be transformed into competent cells.
While many bacterial strains can be made competent,
the protocols for preparing and manipulating competent E.
coli are the most thoroughly worked out. Furthermore, Determining the Number
competent E. coli can be obtained from commercial sources. of Transformants Needed for Coverage
Commercially available competent cells tend to yield of an Entire Genome
transformation efficiencies several orders of magnitude
greater than those typically achieved by standard labora- The extent to which a library represents all sections of the
tory preparations. Additionally, ligated DNA tends to have genome can be statistically determined. The number of
lower transformation efficiencies than supercoiled DNA, necessary transformants to have sufficient coverage or
Select vector
Dephosphorylate vector Run DNA on gel and obtain desired fragment size
Remove residual enzymes and buffers Determine quantity and purity of DNA
high probability of containing any given section of the molecular biology experiments. To this end, a number
genome is dependent upon the genome size and the size of advances have been developed, particularly in the area
of genomic DNA inserts contained within a library. The of cloning vectors to improve cloning of genomic DNA
Clarke–Carbon equation, based on the assumption that fragments.
recombinant clones are distributed according to a Poisson Linear vectors based on the coliphage N15, available
distribution across the genome, can be used to determine commercially from Lucigen (Middleton, WI) such as
the number of transformants needed to have a high prob- pJAZZ vectors, have been shown to be stable for larges
ability of any given unique DNA sequence that would be DNA segments (up to 30 kb) or DNA with difficult to
present in a genomic library. The Clarke–Carbon equa- clone secondary structure. The stability of these vectors is
tion can be written as believed to be accredited to their lack of supercoiling and
differences in replication compared to standard cloning
lnð1 – P Þ vectors. Low copy number vectors, such as the pSMART
N¼ or broad host range pRANGER-BTB series of vectors
lnð1 – f Þ
(available from Lucigen), have features that block tran-
scription into and out of the multiple cloning site by the
where N ¼ number of recombinant clones required,
presence of transcriptional terminators and the lack of
P ¼ probability of finding a given unique DNA section,
constitutive promoters. These vectors have been shown
and f ¼ fraction of the total genome size that is contained
to be several times more stable for cloning random DNA
within a single insert of the genomic library, equal to the
fragments than pUC vectors thus minimizing cloning
size of the insert in bp per size of the genome in bp.
gaps caused by difficult-to-clone DNA. Another recently
For E. coli, K12 genome with a 4 639 221 bp sequence, a
developed vector intended to facilitate sequencing,
genomic library containing 12 000 bp inserts would need
pLEXX-AK (also available from Lucigen), is designed to
2667 transformants to have a probability of 99.9% of the
clone two inserts per vector, thus reducing the down-
library containing any given DNA sequence.
stream labor involved in processing clones prior to
While the Clarke–Carbon equation is the most com-
sequencing.
monly used formula for determining the number of
Additional advances in constructing genomic libraries
transformants, other equations, such as the Poisson dis-
come from a reduction in the amount of work required.
tribution-based Lander–Waterman equation may also be
Many molecular biology suppliers now offer kits to aid in
used. The number of recombinant clones required may
genomic library construction. These kits typically contain
also be influenced by the application of the genomic
pre-processed vector that has already been linearized and
library. Some applications requiring high amounts of
dephosphorylated, along with prepared competent cells,
overlap between DNA segments require higher numbers
reducing the amount of user time required to create a
of recombinant clones while applications that require
genomic library. Commercially prepared vectors typi-
only sections of genes to be present, such as some hybri-
cally promise much lower background empty vector
dization studies, may require less transformants.
than is usually obtained when cloning vectors are pre-
pared locally.
within the genomic library, including difficult to clone • Milliliter conical tubes.
DNA that contain secondary structures, AT- or GC-rich • Yeast tRNA (1 mg ml in ultrapure water) (Sigma-
regions, or DNA encoding strong promoters or toxic
gene products. Additionally, it is desired for all insert
• Aldrich, St. Louis, MO). 1
DNA to be stable within cloning vectors so that the • Ultrapure water (Invitrogen).
vector can be amplified or be available for other • 100% ethanol.
Techniques | Genome Sequence Databases: Genomic, Construction of Libraries 193
Sambrook J, Fritsch EF, and Maniatis T (1989) Molecular Cloning: A Relevant Websites
Laboratory Manual. 2nd edn. Cold Spring Harbor: Cold Spring
Harbor Laboratory Press. http://www.lucigen.com – Lucigen Corporation
Trinh TQ and Sinden RR (1991) Preferential DNA secondary structure
http://www.qiagen.com – Qiagen
mutagenesis in the lagging strand of replication in E. coli. Nature
352(6335): 544–547.
Zhu H and Dean RA (1999) A novel method for increasing the
transformation efficiency of Escherichia coli-application for bacterial
artificial chromosome library construction. Nucleic Acids Research
27(3): 910–911.